Join two files, adding values in certain columns
How can I add another 2 values after matching the values from 2 columns in another file just like VLOOKUP?
Sample below. Value in column 6 & 7 from file1, when matched with column 1 & 2 from file2, will add column 9 & 10 in file1 with the value of column 3 & 4 from file2.
file1.txt
1 1 1 1 1 5 9 1
2 2 2 2 2 7 8 2
3 3 3 3 3 7 7 3
4 4 4 4 4 8 6 4
file2.txt
5 9 A B
8 6 E F
7 7 G H
7 8 C D
output.txt
1 1 1 1 1 5 9 1 A B
2 2 2 2 2 7 8 2 C D
3 3 3 3 3 7 7 3 G H
4 4 4 4 4 8 6 4 E F
Thanks,
Solution 1:
Use awk
awk 'NR==FNR{ seen[$1FS$2]=$3FS$4; next } { print $0, seen[$6FS$7] }' file2 file1
and to delete empty lines from output:
awk 'NR==FNR{ seen[$1FS$2]=$3FS$4; next } NF{ print $0, seen[$6FS$7] }' file2 file1
or a little whitespace and sensible variables names go a long way toward readability. Also, take advantage of using a comma in the array key
awk '
NR == FNR {value[$1,$2] = $3 OFS $4; next}
{print $0, value[$6,$7]}
' file2.txt file1.txt
-
NR
is set to 1 when the first record read by awk and incrementing for each next records reading either in single or multiple input files until all read finished. -
FNR
is set to 1 when the first record read by awk and incrementing for each next records reading in current file and reset back to 1 for the next input file if multiple input files. so
NR == FNR
is always a true condition and the block followed by this will perform actions on first file only.The
seen
is an associated awk array with the key combination of column$1 and column$2 with the value of column$3 and column$4.The
next
token skips to executing rest of the commands and those will only execute actually for next file(s) except first.NF
; presetting Number of Fields in a record where fields are known and separated with a Field SeparatorFS
; soFS
between columns there is used to intact the fields separator or you could use comma,
within array instead.so this
NF{ print $0, seen[$6FS$7] }
, print the current record$0
in file1 and the value matched with column$6 and column$7 present in the array seen when that was not an empty line.
Solution 2:
I know you didn't ask for a database solution, but if you happen to have a MySQL server around, here is how to do it:
create table file1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int);
create table file2 (c1 int, c2 int, c3 char, c4 char);
load data infile 'file1' into table file1 fields terminated by ' ';
load data infile 'file2' into table file2 fields terminated by ' ';
select f1.*, f2.c3, f2.c4 from file1 as f1
join file2 as f2
on f1.c6 = f2.c1 and f1.c7 = f2.c2
order by f1.c1;
(I had to strip the blank lines as well)
Result:
+------+------+------+------+------+------+------+------+------+------+
| c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c3 | c4 |
+------+------+------+------+------+------+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 | 5 | 9 | 1 | A | B |
| 2 | 2 | 2 | 2 | 2 | 7 | 8 | 2 | C | D |
| 3 | 3 | 3 | 3 | 3 | 7 | 7 | 3 | G | H |
| 4 | 4 | 4 | 4 | 4 | 8 | 6 | 4 | E | F |
+------+------+------+------+------+------+------+------+------+------+
4 rows in set (0,00 sec)
Solution 3:
Responding to @Jos's answer: sqlite
db=$(mktemp)
sqlite3 "$db" <<'END'
create table f1 (v1 text,v2 text,v3 text,v4 text,v5 text,v6 text,v7 text,v8 text);
create table f2 (v1 text,v2 text,v3 text,v4 text);
.separator " "
.import file1.txt f1
.import file2.txt f2
select f1.*, f2.v3, f2.v4 from f1,f2 where f1.v6=f2.v1 and f1.v7=f2.v2;
END
rm "$db"
or in a almost-one-liner way:
sqlite3 -separator " " <<'END'
create table f1 (v1, v2, v3, v4, v5, v6, v7, v8 );
create table f2 (v1, v2, v3, v4);
.import file1.txt f1
.import file2.txt f2
select f1.*, f2.v3, f2.v4 from f1,f2 where f1.v6=f2.v1 and f1.v7=f2.v2;
END
Solution 4:
bash: I took the liberty of removing blank lines from the files.
declare -A keys
while read -r k1 k2 value; do
keys[$k1,$k2]=$value
done < file2.txt
while read -ra fields; do
key="${fields[5]},${fields[6]}";
echo "${fields[*]} ${keys[$key]}"
done < file1.txt
1 1 1 1 1 5 9 1 A B
2 2 2 2 2 7 8 2 C D
3 3 3 3 3 7 7 3 G H
4 4 4 4 4 8 6 4 E F