Directly calling SeqIO.parse() in for loop works, but using it separately beforehand doesn't? Why?
Solution 1:
SeqIO.parse()
returns a normal python generator. This part of the Biopython module is written in pure python:
>>> from Bio import SeqIO
>>> a = SeqIO.parse("a.fasta", "fasta")
>>> type(a)
<class 'generator'>
Once a generator is iterated over it is exhausted as you discovered. You can't rewind a generator but you can store the contents in a list
or dict
if you don't mind putting it all in memory (useful if you need random access). You can use SeqIO.to_dict(a)
to store in a dictionary with the record ids as the keys and sequences as the values. Simply re-building the generator calling SeqIO.parse()
again will avoid dumping the file contents into memory of course.