I need to use sed/awk to get desired output

Order:479959,60=20130624-09:45:02.046|35=D|11=884|38=723|21=1|1=30532|10=085|59=0|114=Y|56=MBT|40=1|43=Y|100=MBTX|55=/GCQ3|49=11342|54=1|8=FIX.4.4|34=388|553=2453|9=205|52=20130624-09:45:02.046|

Order:24780,100=MBTX|43=Y|40=1|34=388|553=2453|52=2013062409:45:02.046|9=205|49=11342|54=1|8=FIX.4.4|55=/GCQ3|11=405|35=D|60=20130624-09:45:02.046|56=MBT|59=0|114=Y|10=085|21=1|38=470|1=30532|

Order:799794,55=/GCQ3|49=11342|54=1|8=FIX.4.4|34=388|553=2453|9=205|52=2013062409:45:02.046|40=1|43=Y|100=MBTX|38=350|21=1|1=30532|10=085|59=0|114=Y|56=MBT|60=20130624-09:45:02.046|35=D|11=216|

Order:72896,11=735|35=D|60=2013062409:45:02.046|56=MBT|59=0|114=Y|10=085|1=30532|38=17|21=1|100=MBTX|43=Y|40=1|553=2453|9=205|52=20130624-09:45:02.046|34=388|8=FIX.4.4|54=1|49=11342|55=/GCQ3|

I want to get the number after 38= and the number after 11= which should be renamed Clientid

The output should be:-

Orderid-479959 38= 723 Clientid=884
Orderid-24780 38= 470 Clientid=405
Orderid-799794 38= 350 Clientid=216
Orderid-72896 38= 17 Clientid=735

Any help will be appreciated.


Solution 1:

You can use

sed -nr 's/Order:([0-9]+),.*[,\|]38=([0-9]+)[,\|].*/Orderid-\1 38= \2/p' file | tee file2

Then

sed -nr 's/.*[,\|]11=([0-9]+)[,\|].*/Clientid=\1/p' file | tee file3

Then

paste -d ' ' file2 file3

You get your output on stdout - redirect as you please.

I can't get it in one line (although someone obviously can) since the 11= and 38= fields could be in either order - I have to read the file twice. You could roll it into a script like this:

#!/bin/bash
sed -nr 's/Order:([0-9]+),.*[,\|]38=([0-9]+)[,\|].*/Orderid-\1 38= \2/p' "$1" > file2
sed -nr 's/.*[,\|]11=([0-9]+)[,\|].*/Clientid=\1/p' "$1" > file3
paste -d ' ' file2 file3 > outfile
rm file2 file3

(this cleans up the files we write in the process and writes the final output to a file outfile)

Usage:

  • paste the script into an empty file and save it
  • give it execute permission: chmod u+x script
  • run it with the name of your input file as argument: ./script file
  • change file2 and file3 in the script if you have existing files with those names in the current directory!

Explanation

  • s/old/new replace old with new
  • -r use ERE
  • -n don't print until we ask (this is just going to take out empty lines)
  • [,\|] match , OR literal |
  • ([0-9]+) some digits to save for later
  • \1 backreference to saved pattern
  • tee write to a file and print to stdout too so you can check it
  • > somefile redirect output to somefile instead of stdout
  • paste -d ' ' file2 file3 paste columns of file3 after columns of file2 using a space as delimiter.
  • rm file2 file3 delete file2 and file3

Solution 2:

Using awk

Assuming your data is in a file called data.txt, create a file called script.awk and give it the following contents:

BEGIN { FS="[,|]" }
NF > 0 {
  for(i=1; i <= NF; i++) {
    split($i, f, "[:=]")
    map[f[1]] = f[2]
  }
  printf "Orderid-%s 38= %s Clientid=%s\n", map["Order"], map[38], map[11]
}

Then execute the following command to process the data and get output.

awk -f script.awk < data.txt

See also

  • Getting started with awk
  • BEGIN pattern
  • Associative arrays
  • split function
  • printf statement
  • NF variable
  • FS variable

In the above code, the map variable is an associative array. I called it map because it's typically called a map in other languages (HashMap in Java, Hash in Ruby, or Dictionary in Python).

Solution 3:

One liners aren't always nice:

$ sed 's/[|,]\(11=[^|]*\).*\(|38=[^|]*|\).*/\2\1|/; s/Order:\([0-9]*\).*|38=\([0-9]*\).*|11=\([0-9]*\)|.*/Orderid-\1 38= \2 Clientid=\3/' foo
Orderid-479959 38= 723 Clientid=884
Orderid-24780 38= 470 Clientid=405
Orderid-799794 38= 350 Clientid=216
Orderid-72896 38= 17 Clientid=735

Explanation

  • s/old/new/ replace old with new
  • [|,] match | or ,
  • \(11=[^|]*\) match any number of any characters except | after 11= and save 11=whatever for later use as \1
  • .* any number of any characters
  • \(|38=[^|]*|\) save |38=whatever| for later use as \2
  • \2\1| backreferences in replacement (this makes the fields consistent so we can deal with them in the next command)
  • ; separates commands, like in the shell
  • Order:\([0-9]*\).*|38=\([0-9]*\).*|11=\([0-9]*\)|.* match this pattern (now we've cleaned it up) saving the parts we want to reuse in \(parentheses\) again
  • Orderid-\1 38= \2 Clientid=\3 replacement with \1 \2 and \3 backreferences to the numbers we saved with \(\)