Join two files, adding values in certain columns

How can I add another 2 values after matching the values from 2 columns in another file just like VLOOKUP?

Sample below. Value in column 6 & 7 from file1, when matched with column 1 & 2 from file2, will add column 9 & 10 in file1 with the value of column 3 & 4 from file2.

file1.txt

1 1 1 1 1 5 9 1

2 2 2 2 2 7 8 2

3 3 3 3 3 7 7 3

4 4 4 4 4 8 6 4

file2.txt

5 9 A B

8 6 E F

7 7 G H

7 8 C D

output.txt

1 1 1 1 1 5 9 1 A B

2 2 2 2 2 7 8 2 C D

3 3 3 3 3 7 7 3 G H

4 4 4 4 4 8 6 4 E F

Thanks,


Solution 1:

Use awk

awk 'NR==FNR{ seen[$1FS$2]=$3FS$4; next } { print $0, seen[$6FS$7] }' file2 file1

and to delete empty lines from output:

awk 'NR==FNR{ seen[$1FS$2]=$3FS$4; next } NF{ print $0, seen[$6FS$7] }' file2 file1

or a little whitespace and sensible variables names go a long way toward readability. Also, take advantage of using a comma in the array key

awk '
    NR == FNR {value[$1,$2] = $3 OFS $4; next} 
    {print $0, value[$6,$7]}
' file2.txt file1.txt

  • NR is set to 1 when the first record read by awk and incrementing for each next records reading either in single or multiple input files until all read finished.
  • FNR is set to 1 when the first record read by awk and incrementing for each next records reading in current file and reset back to 1 for the next input file if multiple input files.
  • so NR == FNR is always a true condition and the block followed by this will perform actions on first file only.

  • The seen is an associated awk array with the key combination of column$1 and column$2 with the value of column$3 and column$4.

  • The next token skips to executing rest of the commands and those will only execute actually for next file(s) except first.

  • NF; presetting Number of Fields in a record where fields are known and separated with a Field Separator FS; so FS between columns there is used to intact the fields separator or you could use comma , within array instead.

  • so this NF{ print $0, seen[$6FS$7] }, print the current record $0 in file1 and the value matched with column$6 and column$7 present in the array seen when that was not an empty line.

Solution 2:

I know you didn't ask for a database solution, but if you happen to have a MySQL server around, here is how to do it:

create table file1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int);
create table file2 (c1 int, c2 int, c3 char, c4 char);
load data infile 'file1' into table file1 fields terminated by ' ';
load data infile 'file2' into table file2 fields terminated by ' ';
select f1.*, f2.c3, f2.c4 from file1 as f1 
    join file2 as f2 
        on f1.c6 = f2.c1 and f1.c7 = f2.c2 
    order by f1.c1;

(I had to strip the blank lines as well)

Result:

+------+------+------+------+------+------+------+------+------+------+
| c1   | c2   | c3   | c4   | c5   | c6   | c7   | c8   | c3   | c4   |
+------+------+------+------+------+------+------+------+------+------+
|    1 |    1 |    1 |    1 |    1 |    5 |    9 |    1 | A    | B    |
|    2 |    2 |    2 |    2 |    2 |    7 |    8 |    2 | C    | D    |
|    3 |    3 |    3 |    3 |    3 |    7 |    7 |    3 | G    | H    |
|    4 |    4 |    4 |    4 |    4 |    8 |    6 |    4 | E    | F    |
+------+------+------+------+------+------+------+------+------+------+
4 rows in set (0,00 sec)

Solution 3:

Responding to @Jos's answer: sqlite

db=$(mktemp)
sqlite3 "$db" <<'END'
create table f1 (v1 text,v2 text,v3 text,v4 text,v5 text,v6 text,v7 text,v8 text);
create table f2 (v1 text,v2 text,v3 text,v4 text);
.separator " "
.import file1.txt f1
.import file2.txt f2
select f1.*, f2.v3, f2.v4 from f1,f2 where f1.v6=f2.v1 and f1.v7=f2.v2;
END
rm "$db"

or in a almost-one-liner way:

sqlite3 -separator " "  <<'END'
create table f1 (v1, v2, v3, v4, v5, v6, v7, v8 );
create table f2 (v1, v2, v3, v4);
.import file1.txt f1
.import file2.txt f2
select f1.*, f2.v3, f2.v4 from f1,f2 where f1.v6=f2.v1 and f1.v7=f2.v2;
END

Solution 4:

bash: I took the liberty of removing blank lines from the files.

declare -A keys
while read -r k1 k2 value; do 
    keys[$k1,$k2]=$value
done < file2.txt
while read -ra fields; do 
    key="${fields[5]},${fields[6]}"; 
    echo "${fields[*]} ${keys[$key]}"
done < file1.txt
1 1 1 1 1 5 9 1 A B
2 2 2 2 2 7 8 2 C D
3 3 3 3 3 7 7 3 G H
4 4 4 4 4 8 6 4 E F