Finding smallest float in file then printing that and line above it

My data file looks like this:

3.6-band 
6238
Over
0.5678
Over
0.6874
Over
0.7680
Over
0.7834

What I want to do is to pick out the smallest float and the word directly above it and print those two values. I have no idea what I'm doing. I've tried

df=open('filepath')
  for line in df:
    df1=line.split()
    df2=min(df1)

Which is my attempt at at least trying to isolate the smallest float. Problem is it's just giving me the last value. I think that's a problem with python not knowing to start over with the iteration, but again...no idea what I'm doing. I tried df2=min(df1.seek(0)) with no success, got an error saying no attribute seek. So that's what I've tried so far, I still have no idea how to print the row that would come before the smallest float. Suggestions/help/advice would be appreciated, thanks.

As a side note: this data file is an example of a larger one with similar characteristics, but the word 'Over' could also be 'Under', that's why I need to have it printed as well.


Solution 1:

Store the items in a list of lists,[word,num] pairs and then apply min on that list of list. Use key parameter of min to specify the which item must be used for comparison of item.:

with open('abc') as f:
    lis = [[line.strip(),next(f).strip()] for line in f]
    minn = min(lis, key = lambda x: float(x[1]))
    print "\n".join(minn)
...     
Over
0.5678

Here lis looks like this:

[['3.6-band', '6238'], ['Over', '0.5678'], ['Over', '0.6874'], ['Over', '0.7680'], ['Over', '0.7834']]

Solution 2:

You could use the grouper recipe, izip(*[iterator]*2) to cluster the lines in df into groups of 2. Then, to find the minimum pair of lines, use min and its key parameter to specify the proxy to used for comparison. In this case, for every pair of lines, (p, l), we want to use the float of the second line, float(l), as the proxy:

import itertools as IT
with open('filepath') as df:
    previous, minline = min(IT.izip(*[df]*2), 
                            key=lambda (p, l): float(l))
    minline = float(minline)
    print(previous)
    print(minline)

prints

Over

0.5678

An explanation of the grouper recipe:

To understand the grouper recipe, first look at what happens if df were a list:

In [1]: df = [1, 2]

In [2]: [df]*2
Out[2]: [[1, 2], [1, 2]]

In Python, when you multiply a list by a positive integer n, you get n (shallow) copies of the items in the list. Thus, [df]*2 makes a list with two copies of df inside.

Now consider zip(*[df]*2)

The * used in zip(*...) has a special meaning. It tells Python to unpack the list following the * into arguments to be passed to zip. Thus, zip(*[df]*2) is exactly equivalent to zip(df, df):

In [3]: zip(df, df)
Out[3]: [(1, 1), (2, 2)]

In [4]: zip(*[df]*2)
Out[4]: [(1, 1), (2, 2)]

A more complete explanation of argument unpacking is given by SaltyCrane here.

Take note of what zip is doing. zip(*[df]*2) peels off the first element of both copies, (both 1's in this case), and forms the tuple, (1,1). Then it peels off the second element of both copies, (both 2's), and forms the tuple (2,2). It returns a list with these tuples inside.

Now consider what happens when df is an iterator. An iterator is sort of like a list, except an iterator is good for only a single pass. As items are pulled out the iterator, the iterator can never be rewound.

For example, a file handle is an iterator. Suppose we have a file with lines

1
2
3
4

In [8]: f = open('data')

You can pull items out of the iterator f by calling next(f):

In [9]: next(f)
Out[9]: '1\n'

In [10]: next(f)
Out[10]: '2\n'

In [11]: next(f)
Out[11]: '3\n'

In [12]: next(f)
Out[12]: '4\n'

Each time we call next(f), we get the next line from the file handle, f. If we call next(f) again, we'd get a StopIteration exception, indicating the iterator is empty.

Now let's see how the grouper recipe behaves on f:

In [14]: f = open('data')  # Notice we have to open the file again, since the old iterator is empty

In [15]: [f]*2
Out[15]: 
[<open file 'data', mode 'r' at 0xa028f98>,
 <open file 'data', mode 'r' at 0xa028f98>]

[f]*2 gives us a list with two identical copies of the same iterator f.

In [16]: zip(*[f]*2)
Out[16]: [('1\n', '2\n'), ('3\n', '4\n')]

zip(*[f]*2) peels off the first item from the first iterator, f, and then peels off the first item form the second iterator, f. But the iterator is the same f both times! And since iterators are good for a single-pass (you can never go back), you get different items each time you peel off an item. zip is calling next(f) each time to peel off an item. So the first tuple is ('1\n', '2\n'). Likewise, zip then peels off the next item from the first iterator f, and the next item from the second iterator f, and forms the tuple ('3\n', '4\n'). Thus, zip(*[f]*2) returns [('1\n', '2\n'), ('3\n', '4\n')].

That's really all there is to the grouper recipe. Above, I chose to use IT.izip instead of zip so that Python would return an iterator instead of a list of tuples. This would save a lot of memory if the file had a lot of lines in it. The difference between zip and IT.izip is explained more fully here.

Solution 3:

You can't use:

min(number)

You can only use:

min(num1, num2)

If your file looks like this:

6238
0.5678
0.6874
0.7680
0.7834

You can use this code:

Num1 = float(file.readline())

for line in file:
    Num2 = float(line)
    Num1 = min(Num1, Num2)

If you have the "Over"s then you can skip every second line.