ValueError: could not convert string to float: id
I'm running the following python script:
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
result=stats.ttest_ind(list1,list2)
print result[1]
However I got the errors like:
ValueError: could not convert string to float: id
I'm confused by this. When I try this for only one line in interactive section, instead of for loop using script:
>>> from scipy import stats
>>> import numpy as np
>>> f=open('data2.txt','r').readlines()
>>> w=f[1].split()
>>> l1=w[1:8]
>>> l2=w[8:15]
>>> list1=[float(x) for x in l1]
>>> list1
[5.3209183842, 4.6422726719, 4.3788135547, 5.9299061614, 5.9331108706, 5.0287087832, 4.57...]
It works well.
Can anyone explain a little bit about this? Thank you.
Obviously some of your lines don't have valid float data, specifically some line have text id
which can't be converted to float.
When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
try:
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
except ValueError,e:
print "error",e,"on line",i
result=stats.ttest_ind(list1,list2)
print result[1]
My error was very simple: the text file containing the data had some space (so not visible) character on the last line.
As an output of grep, I had 45
instead of just 45
.
This error is pretty verbose:
ValueError: could not convert string to float: id
Somewhere in your text file, a line has the word id
in it, which can't really be converted to a number.
Your test code works because the word id
isn't present in line 2
.
If you want to catch that line, try this code. I cleaned your code up a tad:
#!/usr/bin/python
import os, sys
from scipy import stats
import numpy as np
for index, line in enumerate(open('data2.txt', 'r').readlines()):
w = line.split(' ')
l1 = w[1:8]
l2 = w[8:15]
try:
list1 = map(float, l1)
list2 = map(float, l2)
except ValueError:
print 'Line {i} is corrupt!'.format(i = index)'
break
result = stats.ttest_ind(list1, list2)
print result[1]