How to extract a string from a json file and put into a variable (Linux)

Solution 1:

  1. Assume that the string you want to extract stands in the same position in every file you can use head, tail and cut commands using pipes.

  2. For example:

    $ head -6 file.json | tail -1 | cut -b 121-129
    db1.table
    
  3. And here is an example of a script setting the output into a variable:

    #!/bin/bash
    v1=$(head -6 file.json | tail -1 | cut -b 121-130)
    echo "$v1"
    

The output of the script will be db1.table4 which is the value of V1 varaible.

You can read more about those commands here:

  • head command
  • tail command
  • cut command

Of course you can use those commands to extract any other string from a file.

Solution 2:

Take a look at jq the command line JSON processor, install for example with:

sudo apt install jq

The string you want is not a JSON value, it's part of a JSON value. So I suggest you use jq to get the string you need to manipulate into a variable, for example:

my_var=$(jq -r .[$quer].args[0].args[0] file1.json)

This gets you a variable containing the SELECT statement:

select db1.table1 as tab1, db1.table2 as tab2, db1.table3 as tab3 from db1.table4 as tab4

Then you will need to use other tools like sed, awk, cut etc to get the substring you want from that variable. For your specific case this would work but of course may not work for a different SELECT statement. Cutting by space delimiter and returning the 12th value:

my_table=$(echo $my_var | cut -d' ' -f12)

Solution 3:

You should generally avoid using generic text parsing tools for structured data. Since you have a json file, it is safer and simpler to use a dedicated json parser. In your case, you would want to extract the value of the first element of the array args which is itself the first element of the top level array args, the child of the top level hash $quer:

$ jq '."$quer"."args"[0]["args"]' file.json
[
  "select\n      db1.table1 as tab1,\n      db1.table2 as tab2,\n      db1.table3 as tab3\n      from db1.table4 as tab4"
]

From here, you no longer have structured data and you need to resort to cruder methods. I don't know how you want to identify your target string, you didn't explain that. So, depending on what you actually want, you could do:

  1. Skip lines starting with [ or ] and then print the second word of the remaining lines:

    $ jq '."$quer"."args"[0]["args"]' file.json | awk '/^[^][]/{print $2}'
    db1.table1
    
  2. Print the second word of the second line

    $ jq '."$quer"."args"[0]["args"]' file.json | awk 'NR==2{print $2}'
    db1.table1
    
  3. Print the longest stretch of non-whitespace after the string "select\n:

    $ jq '."$quer"."args"[0]["args"]' file.json | grep -oP '"select\\n\s*\K\S*'
    db1.table1
    

If you explain exactly how we are supposed to know what string to extract, I could give you a more targeted answer.


For the sake of completion, in your specific example, and I stress that this will not be portable and is almost certain to fail if your input data change in any way, you can use simple text tools directly:

$ grep -oP '"select\\n\s*\K\S*' file.json 
db1.table1

$ awk '$1=="\"select\\n"{print $2}' file.json 
db1.table1

$ sed -nE 's/.*"select\\n\s*(\S+).*/\1/p' file.json 
db1.table1