Efficient way to generate all possibilities of string from characters [closed]
I am trying to randomly generate a string of n length from 5 characters ('ATGC '
). I am currently using itertools.product
, but it is incredibly slow. I switched to itertools.combinations_with_replacement
, but it skips some values. Is there a faster way of doing this? For my application order does matter.
for error in itertools.product('ATGC ', repeat=len(errorPos)):
print(error)
for ps in error:
for pos in errorPos:
if ps == " ":
fseqL[pos] = ""
else:
fseqL[pos] = ps
Solution 1:
If you just want a random single sequence:
import random
def generate_DNA(N):
possible_bases ='ACGT'
return ''.join(random.choice(possible_bases) for i in range(N))
one_hundred_bp_sequence = generate_DNA(100)
That was posted before post clarified spaces need; you can change possible_sequences to include a space if you need spaces allowed.
If you want all combinations that allow a space, too, a solution adapted from this answer, which I learned of from Biostars post 'all possible sequences from consensus':
from itertools import product
def all_possibilities_w_space(seq):
"""return list of all possible sequences given a completely ambiguous DNA input. Allow spaces"""
d = {"N":"ACGT "}
return list(map("".join, product(*map(d.get, seq))))
all_possibilities_w_space("N"*2) # example of length two
The idea being N can be any of "ACGT " and the multiple specifies the length. The map
should specify C is used to make it faster according to the answer I adapted it from.