Pairwise alignment of long string sequences
I want to find the globally optimal (or close to optimal) pairwise alignment between two long (tens of thousands) sequences of strings, but the algorithm is expected to operate on any object sequences. I also want to use my own distance function implementation to compute the similarity of two objects. For shorter sequences, I could use the dynamic time warping (DTW) algorithm but the DTW algorithm needs to compute and store a n*m distance matrix (n,m are lengths of the sequences) which is not feasible for longer sequences. Can you recommend such algorithm? A working implementation would be a plus.
The following example clarifies what the algorithm needs to do:
Input: Sequence A: i saw the top of the mountain Sequence B: then i see top of the mountains Result: Sequence A: i saw the top of the mountain Sequence B: then i see top of the mountains
Solution 1:
I don't know, if I understood your requirements right, but it sounds to me like you are trying to solve a Stable Marriage Problem. The original solution of Gale and Shapley in the link is in O(n*m) time and O(n+m) space, if my memory serves me right. The implementation was pretty straightforward. There are also some later solutions with different variants of the problem.
You also could try to solve this problem by using Maximum Bipartite Graph Matching, e.g. here, for getting a different optimality criterion. This also can be done in O(n*m).