How to delete all but certain values from many JSON files?
I've used youtube-dl
to download all my videos from Youtube. With these, are created a JSON file for each video file. inside these JSON files is a LOT of junk. but there are 3 items/values/categories(?) that i want to keep, the rest can be deleted.
Those things are
{"upload_date": "value",
"fulltitle": "value",
"description": "value"}
So I either need to:
- delete all but these 3 items & save the file, or
- extract these 3 items to a new file & delete the old one.
How can I do this with Automator or with shell tools in Terminal?
Here is one of the ways I would do it...
The following was tested and worked for me under macOS Catalina 10.15.6.
I downloaded jq
from https://stedolan.github.io/jq/download/ under the OS X section:
- jq 1.6 binary for 64-bit
In Terminal, I ran the following commands:
cd ~/Downloads
xattr -d com.apple.quarantine jq-osx-amd64
sudo cp jq-osx-amd64 /usr/local/bin/jq
sudo chmod 0755 /usr/local/bin/jq
• Note: If /usr/local/bin
does not already exist, it must first be created with e.g.: sudo mkdir -p /usr/local/bin
The aforementioned steps have now set up jq
for use from the command line in Terminal, or from a Run Shell Script action in Automator, or a do shell script
command using AppleScript.
Downloading a video from YouTube with youtube-dl
using the --write-info-json
option, I then used the example shell script code, shown below, in a Run Shell Script action in an Automator workflow as a Service/Quick Action to process the JSON file so it would only have the keys you've mentioned.
Example shell script code:
for f in "$@"; do
[[ -f $f ]] || continue
[[ $f =~ .*\.json$ ]] || continue
fn="${f##*/}"
tmpfile="$(mktemp /tmp/"${fn}.XXXXXX")" || exit 1
/usr/local/bin/jq '{"upload_date": .upload_date, "fulltitle": .fulltitle, "description": .description}' "$f" > "$tmpfile"
mv "$tmpfile" "$f"
done
With this setup as shown in the image further below, I select the JSON file, created by youtube-dl
using the --write-info-json
option, in Finder and then right-click on it selecting Cleanup youtube-dl JSON from the context menu.
It then produced a JSON file with the following example structure, while overwriting the original JSON file:
{
"upload_date": "20080913",
"fulltitle": "Jerry Seinfeld returns to Comedy on the Letterman show",
"description": "Jerry Seinfeld returns to Comedy on the Letterman show"
}
Notes:
-
The example JSON file was created from the output of:
youtube-dl --write-info-json https://www.youtube.com/watch?v=8JOsxxm-RnQ
-
The example shell script code, as coded, can handle multiple selected JSON files in Finder at the same time.
-
While the example shell script code does contain some error handling, nonetheless it does not backup the original JSON file(s) before overwriting. Additional code is required to be added if that is something you need/want.
-
The example shell script code, as coded, does not contain any error handling in regards to the
jq
command used and is only meant to be used on JSON files created byyoutube-dl
using the--write-info-json
option with the assumption that the target keys always exist under the circumstances. Otherwise, additional error handling may be required. -
The format of the JSON files created by
youtube-dl
using the--write-info-json
option are flat formatted, meaning it is written all on one line. The output of thejq
command, as written, produces multi-line output. If you desire flat formatted output, you can use the-c
option, e.g.,:jq -c ...
-
The example shell script code can be used in a standard shell script, made executable, and run from the command line in Terminal.
-
The error handling can be removed from the example shell script code and formatted so it can also be use as a one-liner after changing directory the where the JSON files are located. e.g.:
for f in *.json; do jq '{"upload_date": .upload_date, "fulltitle": .fulltitle, "description": .description }' "$f" > "tmp"; mv tmp "$f"; done
• Note: This overwrites the original file without a backup.
-
NOTE: JSON files are not just ordinary plain text files per se, they are specially formatted, and one should not parse them with utilities such as
sed
,awk
, etc., and instead use a utility expressly designed to work with JavaScript Object Notation (JSON) files!jq
is a utility designed to work with JSON files.