SQL like group by and sum for text files in command line?
Solution 1:
In AWK, you could do something like this:
awk '($1 == last) || (last == "") {sum += $2}
($1 != last) && (last != "") {print last " " sum; sum = $2}
{last = $1}
END {print last " " sum}' huge_text_file.txt
Solution 2:
This type of queries are most easily and flexibly done in SQL. You can use SQLet and SQLite for that purpose:
sqlet.py -d' ' -A file1.txt 'select a1, sum(a2) from A group by a1' | sqlite3
Solution 3:
You could a use a few lines of Lua to acheive this. Lua is available on a wide range of platforms including Windows and Linux.
-- Quick and dirty - no error checking, unsorted output io.input('huge_text_file.txt') results = {} for line in io.lines() do for text, number in string.gmatch(line, '(%w+)%s+(%d+)') do results[text] = (results[text] or 0) + number end end for text, number in pairs(results) do print(text, number) end
You can sort the output using any sort utility or a few more lines of Lua.