Markov Chain from String

I am currently sitting on a problem considering Markov chains were an input is given in the form of a list of strings. This input has to be transformed into a Markov chain. I have been sitting on this problem already a couple of hours.

My idea: As you can see below I have tried to use the counter from collections to count all transitions, which has worked. Now I am trying to count all the tuples where A and B are the first elements. This gives me all possible transitions for A.

Then I'll count the transitions like (A, B). Then I want to use these to create a matrix with all probabilities.

def markov(seq):

    states = Counter(seq).keys()
    liste = []
    print(states)
    a = zip(seq[:-1], seq[1:])
    print(list(a))

print(markov(["A","A","B","B","A","B","A","A","A"]))

So far I can't get the counting of the tuples to work. Any help or new ideas on how to solve this is appreciated


To count the tuple, you can create another counter.

b = Counter()
for word_pair in a:
    b[word_pair] += 1

b will keep the count of the pair.

To create the matrix, you can use numpy.

c = np.array([[b[(i,j)] for j in states] for i in states], dtype = float)

I will leave the task of normalizing each row sum to 1 as an exercise.


I didn't get exactly what you wanted but here is what I think it is:

from collections import Counter

def count_occurence(seq):

    counted_states = []
    transition_dict = {}
    for tup in seq:
        if tup not in counted_states:
            transition_dict[tup] = seq.count(tup)
        counted_states.append(tup)
    print(transition_dict)
    #{('A', 'A'): 3, ('A', 'B'): 2, ('B', 'B'): 1, ('B', 'A'): 2}

def markov(seq):

    states = Counter(seq).keys()
    print(states)
    #dict_keys(['A', 'B'])
    a = list(zip(seq[:-1], seq[1:]))
    print(a)
    #[('A', 'A'), ('A', 'B'), ('B', 'B'), ('B', 'A'), ('A', 'B'), ('B', 
    #'A'), ('A', 'A'), ('A', 'A')]
    return a

seq = markov(["A","A","B","B","A","B","A","A","A"])
count_occurence(seq)