How to delete all but certain values from many JSON files?

I've used youtube-dl to download all my videos from Youtube. With these, are created a JSON file for each video file. inside these JSON files is a LOT of junk. but there are 3 items/values/categories(?) that i want to keep, the rest can be deleted.

Those things are

{"upload_date": "value",
"fulltitle": "value",
"description": "value"}

So I either need to:

  • delete all but these 3 items & save the file, or
  • extract these 3 items to a new file & delete the old one.

How can I do this with Automator or with shell tools in Terminal?


Here is one of the ways I would do it...

The following was tested and worked for me under macOS Catalina 10.15.6.

I downloaded jq from https://stedolan.github.io/jq/download/ under the OS X section:

  • jq 1.6 binary for 64-bit

In Terminal, I ran the following commands:

cd ~/Downloads
xattr -d com.apple.quarantine jq-osx-amd64
sudo cp jq-osx-amd64 /usr/local/bin/jq
sudo chmod 0755 /usr/local/bin/jq

   Note: If /usr/local/bin does not already exist, it must first be created with e.g.: sudo mkdir -p /usr/local/bin

The aforementioned steps have now set up jq for use from the command line in Terminal, or from a Run Shell Script action in Automator, or a do shell script command using AppleScript.

Downloading a video from YouTube with youtube-dl using the --write-info-json option, I then used the example shell script code, shown below, in a Run Shell Script action in an Automator workflow as a Service/Quick Action to process the JSON file so it would only have the keys you've mentioned.

Example shell script code:

for f in "$@"; do
    [[ -f $f ]] || continue
    [[ $f =~ .*\.json$ ]] || continue
    fn="${f##*/}"
    tmpfile="$(mktemp /tmp/"${fn}.XXXXXX")" || exit 1
    /usr/local/bin/jq '{"upload_date": .upload_date, "fulltitle": .fulltitle, "description": .description}' "$f" > "$tmpfile"
    mv "$tmpfile" "$f"
done

With this setup as shown in the image further below, I select the JSON file, created by youtube-dl using the --write-info-json option, in Finder and then right-click on it selecting Cleanup youtube-dl JSON from the context menu.

It then produced a JSON file with the following example structure, while overwriting the original JSON file:

{
  "upload_date": "20080913",
  "fulltitle": "Jerry Seinfeld returns to Comedy on the Letterman show",
  "description": "Jerry Seinfeld returns to Comedy on the Letterman show"
}

Automator Service/Quick Action workflow


Notes:

  • The example JSON file was created from the output of:

    youtube-dl --write-info-json https://www.youtube.com/watch?v=8JOsxxm-RnQ
    
  • The example shell script code, as coded, can handle multiple selected JSON files in Finder at the same time.

  • While the example shell script code does contain some error handling, nonetheless it does not backup the original JSON file(s) before overwriting. Additional code is required to be added if that is something you need/want.

  • The example shell script code, as coded, does not contain any error handling in regards to the jq command used and is only meant to be used on JSON files created by youtube-dl using the --write-info-json option with the assumption that the target keys always exist under the circumstances. Otherwise, additional error handling may be required.

  • The format of the JSON files created by youtube-dl using the --write-info-json option are flat formatted, meaning it is written all on one line. The output of the jq command, as written, produces multi-line output. If you desire flat formatted output, you can use the -c option, e.g.,: jq -c ...

  • The example shell script code can be used in a standard shell script, made executable, and run from the command line in Terminal.

  • The error handling can be removed from the example shell script code and formatted so it can also be use as a one-liner after changing directory the where the JSON files are located. e.g.:

    for f in *.json; do jq '{"upload_date": .upload_date, "fulltitle": .fulltitle, "description": .description }' "$f" > "tmp"; mv tmp "$f"; done
    

       Note: This overwrites the original file without a backup.

  • NOTE: JSON files are not just ordinary plain text files per se, they are specially formatted, and one should not parse them with utilities such as sed, awk, etc., and instead use a utility expressly designed to work with JavaScript Object Notation (JSON) files! jq is a utility designed to work with JSON files.