The fastest way to read input in Python

numpy has the functions loadtxt and genfromtxt, but neither is particularly fast. One of the fastest text readers available in a widely distributed library is the read_csv function in pandas (http://pandas.pydata.org/). On my computer, reading 5 million lines containing two integers per line takes about 46 seconds with numpy.loadtxt, 26 seconds with numpy.genfromtxt, and a little over 1 second with pandas.read_csv.

Here's the session showing the result. (This is on Linux, Ubuntu 12.04 64 bit. You can't see it here, but after each reading of the file, the disk cache was cleared by running sync; echo 3 > /proc/sys/vm/drop_caches in a separate shell.)

In [1]: import pandas as pd

In [2]: %timeit -n1 -r1 loadtxt('junk.dat')
1 loops, best of 1: 46.4 s per loop

In [3]: %timeit -n1 -r1 genfromtxt('junk.dat')
1 loops, best of 1: 26 s per loop

In [4]: %timeit -n1 -r1 pd.read_csv('junk.dat', sep=' ', header=None)
1 loops, best of 1: 1.12 s per loop

pandas which is based on numpy has a C based file parser which is very fast:

# generate some integer data (5 M rows, two cols) and write it to file
In [24]: data = np.random.randint(1000, size=(5 * 10**6, 2))

In [25]: np.savetxt('testfile.txt', data, delimiter=' ', fmt='%d')

# your way
In [26]: def your_way(filename):
   ...:     G = []
   ...:     with open(filename, 'r') as f:
   ...:         for line in f:
   ...:             G.append(list(map(int, line.split(','))))
   ...:     return G        
   ...: 

In [26]: %timeit your_way('testfile.txt', ' ')
1 loops, best of 3: 16.2 s per loop

In [27]: %timeit pd.read_csv('testfile.txt', delimiter=' ', dtype=int)
1 loops, best of 3: 1.57 s per loop

So pandas.read_csv takes about one and a half second to read your data and is about 10 times faster than your method.

As a general rule of thumb (for just about any language), using read() to read in the entire file is going to be quicker than reading one line at a time. If you're not constrained by memory, read the whole file at once and then split the data on newlines, then iterate over the list of lines.

Split a column of concatenated comma-delimited data and recode output as factors

Is it possible to run graphical applications such as Firefox without installing a desktop environment?

How to prove the inverse of an inverse of a group element is the element itself without $a + a^{-1} = a^{-1} + a$?

There are 2 homomorphisms: $f(x)=(4x,6x,2x)$ and $g(x,y,z)=(5x-5y+5z,10x-10y+10z)$. Find a group $\ker(g) /{\rm im}(f)$.

Probability of one or more events occurring all with different probabilities [closed]

Tower of Hanoi sequence via eigendecomposition

Topological properties of $(0,1)$ with $B=\{ (1/n,1) \}$

Counter example for the limit comparison test [closed]

fundamental group of manifold, Lee's text topological manifold

is $\mathbb{R}^2\setminus \{(0,0)\}$ homeomorphic to $S^1$?

show that there is no a positive integer $n$ for which $\sqrt{n+1} + \sqrt{n-1}$ is rational

Find real and imaginary parts of $\cot(\frac{\pi}{4}-i\ln 2)$.