How do I generate a running cumulative total of the numbers in a text file?

I have a text file with 2 million lines. Each line has a positive integer. I am trying to form a frequency table kind of thing.

Input file:

3
4
5
8

Output should be:

3
7
12
20

How do I go about doing this?


With awk:

awk '{total += $0; $0 = total}1'

$0 is the current line. So, for each line, I add it to the total, set the line to the new total, and then the trailing 1 is an awk shortcut - it prints the current line for every true condition, and 1 as a condition evaluates to true.


In a python script:

#!/usr/bin/env python3
import sys

f = sys.argv[1]; out = sys.argv[2]

n = 0

with open(out, "wt") as wr:
    with open(f) as read:
        for l in read:
            n = n + int(l); wr.write(str(n)+"\n")

To use

  • Copy the script into an empty file, save it as add_last.py
  • Run it with the source file and targeted output file as arguments:

    python3 /path/to/add_last.py <input_file> <output_file>
    

Explanation

The code is rather readable, but in detail:

  • Open output file for writing results

    with open(out, "wt") as wr:
    
  • Open input file for reading per line

    with open(f) as read:
        for l in read:
    
  • Read the lines, adding the value of the new line to the total:

    n = n + int(l)
    
  • Write the result to the output file:

    wr.write(str(n)+"\n")
    

Just for fun

$ sed 'a+p' file | dc -e0 -
3
7
12
20

This works by appending +p to each line of the input, and then passing the result to the dc calculator where

   +      Pops two values off the stack, adds them, and pushes the result.
          The precision of the result is determined only by the values  of
          the arguments, and is enough to be exact.

then

   p      Prints  the  value on the top of the stack, without altering the
          stack.  A newline is printed after the value.

The -e0 argument pushes 0 onto the dc stack to initialize the sum.


In Bash:

#! /bin/bash

file="YOUR_FILE.txt"

TOTAL=0
while IFS= read -r line
do
    TOTAL=$(( TOTAL + line ))
    echo $TOTAL
done <"$file"