I've got assignement topic like "write a program that finds the longest subsequence common among n sequences of integers, where 1 < n < 1000". I've read on the topic on wikipedia, but can't understand it at all. Can someone explain me algorithm, I don't know where to even start.
Well for one start by putting all the data into a 2d array. (I will call it x[n][m].)
Then I would use exactly what is on the wikipedia page under the dynamic programming solution and the LCS function is defined pretty well. I started reading about where it says first property and second property. I'll work on some code to post though because this looks fun :)
// Compute the longest common subsequence between X and Y
// On return, C will contain the LCS table.
// C[m][n] will contain the length of the longest common subsequence.
template<typename RanIt> size_t **
LCSLength(RanIt X, RanIt Xend, RanIt Y, RanIt Yend)
{
size_t m = std::distance(X, Xend);
size_t n = std::distance(Y, Yend);
size_t **C = Allocate2DArray<size_t>(m+1, n+1);
for (size_t i=0; i<=m; ++i)
C[i][0] = 0;
for (size_t j=0; j<=n; ++j)
C[0][j] = 0;
for (size_t i=0; i<m; ++i)
for (size_t j=0; j<n; ++j)
if (X[i] == Y[j])
C[i+1][j+1] = C[i][j] + 1;
else
C[i+1][j+1] = std::max(C[i+1][j], C[i][j+1]);
return C;
}
How to find the sequence itself?
Can this algorithm work for n sequences, not only two?
Can you explain it to me how it works? I have to understand it.
After all the links I just gave you... You ask me to explain it? :( *cries* (Read what I linked to. I'm not explaining all that in text when there are cool videos and pretty diagrams already made up.)
Anyways I will explain how to get more than two strings LCSatized.
I would simply recursively compare every combination of 2 strings that I have, then take those answers and compare every combination of 2 strings from those, and keep going until only one string is left and that is you final LCS.
Example:
ABCDEFGH
UILKOMNPQRSTUV
BCDEFGH
BFGHJKLM
LCS the first two and you get the empty string (because they have no characters in common)
you can stop here because the longest common sequence is empty.
If you would like a more interesting example watch the video lecture.
@kevinkjt2000: Your algorithm of generalizing it into many sequences is incorrect, because the LCS from the first two might not be a supersequence of the final LCS. Besides, there can be many LCS of the same length. So you can't do that like this - you have to find LCS for all of them together, not one-by-one. The approach from Wikipedia has O(k^n) memory complexity for n sequences of length k, so it is impractical.
@original poster:
I would post a much better solution (~20 linest of Scala code) using dynamic programming approach, but I think posting Scala solutions on this forum is not appreciated... And I don't have time tot translate it to C++ (it would be 3-5 times longer). So, anyway, if the forum members don't mind, I can post a Scala solution and you can translate it on your own.
The idea behind my algorithm is to build subsequences out of the first sequence, adding one element at a time. It starts from the empty sequence. After each extension, the set of subsequences is pruned from the: invalid subsequences (a subsequence must be present in all the input sequences) and redundant subsequences (the ones that there is no point in extending, because they are subsequences of some other, longer subsequences).
Oh, c'mon, is my description so hard to understand?
Maybe try to build a program that explores all possibilities and in the next step try to improve it?
Anyway, here is the Scala solution, that can easily solve for 50 sequences, each 50 elements long (the algorithm from Wikipedia: 50^50 integers in memory, good luck).
It has also a polynomial pessimistic time complexity with respect to the length of sequences, just as the algorithm from Wikipedia, but much better average complexity with respect to the number of sequences (linear).
At least he can figure out the algorithm. Scala is a great language for expressing algorithms, and it is quite easy to learn and understand. Anyway, C++ solutions/translations are welcome. I posted this, because no-one posted right answer for quite a long, and I happened to solve a similar problem (but with ascending subsequences) quite recently.