How do I generate a running cumulative total of the numbers in a text file?
I have a text file with 2 million lines. Each line has a positive integer. I am trying to form a frequency table kind of thing.
Input file:
3
4
5
8
Output should be:
3
7
12
20
How do I go about doing this?
With awk
:
awk '{total += $0; $0 = total}1'
$0
is the current line. So, for each line, I add it to the total
, set the line to the new total
, and then the trailing 1
is an awk shortcut - it prints the current line for every true condition, and 1
as a condition evaluates to true.
In a python script:
#!/usr/bin/env python3
import sys
f = sys.argv[1]; out = sys.argv[2]
n = 0
with open(out, "wt") as wr:
with open(f) as read:
for l in read:
n = n + int(l); wr.write(str(n)+"\n")
To use
- Copy the script into an empty file, save it as
add_last.py
-
Run it with the source file and targeted output file as arguments:
python3 /path/to/add_last.py <input_file> <output_file>
Explanation
The code is rather readable, but in detail:
-
Open output file for writing results
with open(out, "wt") as wr:
-
Open input file for reading per line
with open(f) as read: for l in read:
-
Read the lines, adding the value of the new line to the total:
n = n + int(l)
-
Write the result to the output file:
wr.write(str(n)+"\n")
Just for fun
$ sed 'a+p' file | dc -e0 -
3
7
12
20
This works by appending +p
to each line of the input, and then passing the result to the dc
calculator where
+ Pops two values off the stack, adds them, and pushes the result.
The precision of the result is determined only by the values of
the arguments, and is enough to be exact.
then
p Prints the value on the top of the stack, without altering the
stack. A newline is printed after the value.
The -e0
argument pushes 0
onto the dc
stack to initialize the sum.
In Bash:
#! /bin/bash
file="YOUR_FILE.txt"
TOTAL=0
while IFS= read -r line
do
TOTAL=$(( TOTAL + line ))
echo $TOTAL
done <"$file"