Add column from one .csv to another .csv file

With only awk command:

awk -F, '{getline f1 <"file2" ;print f1,$3,$4}' OFS=, file1

Get a line from file1 and store it into local variable f1, then print the line that stored in f1 and finally print the third($3) and forth($3) fields from file1 which delimited with comma , altogether, and change the OFS(output field separator [space by default]) to comma(,).


The short command would be like this:

paste -d, file2 <(cut -d, -f3- file1)
 A,B,C,D  
 A,B,C,D  
 A,B,C,D  
 A,B,C,D  

paste the file2, then cut and paste the third column to the next(-f3-) from file1.


With awk and paste (option A)

Below command also copies the last two columns (C,D) from file1 at the end of each line in file2:

paste -d',' file2  <(awk -F',' '{print $(NF-1)","$NF}' file1)

Above command paste the file2 content then print a comma delimiter(-d',') then paste the two last field(NF is the index of last field and $NF is the string which its index is NF. So $(NF-1) is the second field before last field ) from file1 when those index redefines or splits with comma spectator(-F',').

With awk and paste (option B)

This command also is the same as above($3 and $4 points to third and forth field of each line from file1 ):

paste -d',' file2  <(awk -F',' '{print $3","$4}' file1)

Or another solution with cut command:

paste -d, <(cut -d, -f1 file1) <(cut -d, -f2 file2) <(cut -d, -f3- file1)

cut command in above command first cut the first field(-f1 which indexed with comma delimiter(-d.)) from file1(cut -d, -f1 file1), then cut and paste the second field of file2(cut -d, -f2 file2) and finally cut and paste the third column(-f3) to the nexts(-) from file1(cut -d, -f3- file1) again.

This command also returns the same result:

paste -d, <(awk -F',' '{print $1}' file1) <(awk -F',' '{print $2}' file2) <(awk -F',' '{print $3","$4}' file1)

paste the second field from file1(awk -F',' '{print $1}' file1) then print a comma(-d,), then paste the second column from file2(awk -F',' '{print $2}' file2), finally paste the second and last column of file1(awk -F',' '{print $3","$4}' file1) again.


Here's a beauty (I think):

join -t, <(csvcut -c 1,3,4 file1.csv) <(csvcut -c 1,2 file2.csv)

Broken down in steps:

Step 1. Install csvkit:

sudo pip install csvkit
sudo apt-get install python-dev python-pip python-setuptools build-essential

Step 2. Use the join command with a comma as separator

join -t,

Step 3. Feed it the actual columns you want to. Note how you feed it the first column twice, because that is the one the join is actually performed on (default behavior of join).

join -t, <(csvcut --columns 1,3,4 file1.csv) <(csvcut --columns 1,2 file2.csv)

or in shorthand:

join -t, <(csvcut -c 1,3,4 file1.csv) <(csvcut -c 1,2 file2.csv)

You can redirect that standard output to a file (desiredOutput) if wanted.

Advantages

This method has several advantages over the others proposed.

First and foremost: it performs a real join. That means that it can be used for more complex data as well. It is very easy to do a join on another field, for instance. It does not simply look at the position of the field, but it really takes the column into consideration. It actually works with the format of the data (csv) and does not treat it like text.

Second, it uses the very powerful csv toolkit which also allows you to a) display statistics with one command (csvstats), b) check whether the data is clean (csvclean), but also to transform it into json, into sql, or even load it into python! This toolkit is heavily used in data science for data preparation.


Here is another beautiful one. I think it is the easiest of all suggestions, thus far.

csvtool pastecol 2 2 file1.csv file2.csv

If you have not installed csvtool already in the past, you have to sudo apt-get install csvtool.

From the docs:

pastecol <column-spec1> <column-spec2> input.csv update.csv

Replace the content of the columns referenced by in the file input.csv with the one of the corresponding column specified by in update.csv.

Example:

  csvtool pastecol 2-3 1- input.csv update.csv.csv > output.csv

Note how in our case we are replacing the second columns of the files.

Examples

file1.csv

A,,C,D
A,,C,D
A,,C,D
A,,C,D

file2.csv

A,B
A,B
A,B
A,B

Combining the two files:

csvtool pastecol 2 2 file1.csv file2.csv
A,B,C,D
A,B,C,D
A,B,C,D
A,B,C,D

What you essentially do is paste the column two of file2.csv as column 2 in file1.csv.

Note that this also works on the same document. If you want to swap two columns, you can do so by using the same file as input.csv and update.vsc.

csvtool pastecol 2 1 file2.csv file2.csv 
A,A
A,A
A,A 
A,A