Subsequences

Sep 18, 2010 at 2:49pm
Hi everyone!

I've got assignement topic like "write a program that finds the longest subsequence common among n sequences of integers, where 1 < n < 1000". I've read on the topic on wikipedia, but can't understand it at all. Can someone explain me algorithm, I don't know where to even start.
Last edited on Sep 18, 2010 at 2:54pm
Sep 18, 2010 at 4:08pm
Well for one start by putting all the data into a 2d array. (I will call it x[n][m].)
Then I would use exactly what is on the wikipedia page under the dynamic programming solution and the LCS function is defined pretty well. I started reading about where it says first property and second property. I'll work on some code to post though because this looks fun :)

Edit: XD Look what I found!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Compute the longest common subsequence between X and Y
// On return, C will contain the LCS table.
// C[m][n] will contain the length of the longest common subsequence.
template<typename RanIt> size_t **   
LCSLength(RanIt X, RanIt Xend, RanIt Y, RanIt Yend)
{
    size_t m = std::distance(X, Xend);
    size_t n = std::distance(Y, Yend);
    size_t **C = Allocate2DArray<size_t>(m+1, n+1);    
 
    for (size_t i=0; i<=m; ++i)
      C[i][0] = 0;
 
    for (size_t j=0; j<=n; ++j)
      C[0][j] = 0;
 
    for (size_t i=0; i<m; ++i)
      for (size_t j=0; j<n; ++j)
        if (X[i] == Y[j])
          C[i+1][j+1] = C[i][j] + 1;
        else
          C[i+1][j+1] = std::max(C[i+1][j], C[i][j+1]);
 
        return C;
}

It is from here: http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Longest_common_subsequence

Edit 2: Oh this is good too! http://videolectures.net/mit6046jf05_leiserson_lec15/
Last edited on Sep 18, 2010 at 6:27pm
Sep 18, 2010 at 7:23pm
How to find the sequence itself?
Can this algorithm work for n sequences, not only two?
Can you explain it to me how it works? I have to understand it.
Sep 18, 2010 at 7:42pm
After all the links I just gave you... You ask me to explain it? :( *cries* (Read what I linked to. I'm not explaining all that in text when there are cool videos and pretty diagrams already made up.)
Anyways I will explain how to get more than two strings LCSatized.
I would simply recursively compare every combination of 2 strings that I have, then take those answers and compare every combination of 2 strings from those, and keep going until only one string is left and that is you final LCS.
Example:
ABCDEFGH
UILKOMNPQRSTUV
BCDEFGH
BFGHJKLM

LCS the first two and you get the empty string (because they have no characters in common)
you can stop here because the longest common sequence is empty.
If you would like a more interesting example watch the video lecture.
Sep 19, 2010 at 8:15am
closed account (EzwRko23)
@kevinkjt2000: Your algorithm of generalizing it into many sequences is incorrect, because the LCS from the first two might not be a supersequence of the final LCS. Besides, there can be many LCS of the same length. So you can't do that like this - you have to find LCS for all of them together, not one-by-one. The approach from Wikipedia has O(k^n) memory complexity for n sequences of length k, so it is impractical.

@original poster:
I would post a much better solution (~20 linest of Scala code) using dynamic programming approach, but I think posting Scala solutions on this forum is not appreciated... And I don't have time tot translate it to C++ (it would be 3-5 times longer). So, anyway, if the forum members don't mind, I can post a Scala solution and you can translate it on your own.

The idea behind my algorithm is to build subsequences out of the first sequence, adding one element at a time. It starts from the empty sequence. After each extension, the set of subsequences is pruned from the: invalid subsequences (a subsequence must be present in all the input sequences) and redundant subsequences (the ones that there is no point in extending, because they are subsequences of some other, longer subsequences).



Sep 22, 2010 at 9:05am
So, does anyone know how to solve this for many sequences?
Sep 22, 2010 at 2:14pm
closed account (EzwRko23)
Oh, c'mon, is my description so hard to understand?
Maybe try to build a program that explores all possibilities and in the next step try to improve it?


Anyway, here is the Scala solution, that can easily solve for 50 sequences, each 50 elements long (the algorithm from Wikipedia: 50^50 integers in memory, good luck).
It has also a polynomial pessimistic time complexity with respect to the length of sequences, just as the algorithm from Wikipedia, but much better average complexity with respect to the number of sequences (linear).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def lcs[T](sequences: Iterable[Seq[T]]): Seq[T] = {
  
  case class SubSeq(data: Seq[T]) {
    val endVector = sequences map (s => rest(data, s) map (s.length - _.length))
    val isValid = endVector.forall(_.isDefined)
    val length = data.length
    def :+ (item: T) = SubSeq(data :+ item)   
  }
  
  def rest(s1: Seq[T], s2: Seq[T]): Option[Seq[T]] = s1 match {
    case Seq() => Some(s2)
    case Seq(x, xs @ _*) if (s2 contains x) => rest(xs, s2.dropWhile(_ != x).tail)
    case _ => None
  }
      
  def isDominated(s1: SubSeq, s2: SubSeq) = {
    val lenDiff = s2.length - s1.length    
    (lenDiff >= 0) && (s1.endVector zip s2.endVector).forall(x => x._1.get >= x._2.get + lenDiff)
  }
  
  def prune(set: List[SubSeq]) = {    
    def pruneRecursive(input: List[SubSeq], output: List[SubSeq]): List[SubSeq] = input match {
      case Nil => 
        output
      case x :: xs if output.exists(isDominated(x, _)) || xs.exists(isDominated(x, _)) => 
        pruneRecursive(xs, output)
      case x :: xs =>
        pruneRecursive(xs, x :: output)
    }
    pruneRecursive(set, Nil)    
  }    
  
            
  def find(s: SubSeq, result: List[SubSeq]): List[SubSeq] = s match {
    case SubSeq(Seq()) => result
    case SubSeq(Seq(x, xs @ _*)) => 
      find(SubSeq(xs), prune(result ::: result.map(_ :+ x).filter(_.isValid)))
  }
  
  find(SubSeq(sequences.head), SubSeq(Seq()) :: Nil).max(Ordering by {x: SubSeq => x.length}).data
}
Last edited on Sep 22, 2010 at 2:20pm
Sep 22, 2010 at 6:22pm
It isn't that anyone has any problems with "scala" but that this is a C++ forum. What good is a scala solution for someone with a C++ assignment?
Sep 22, 2010 at 8:59pm
closed account (EzwRko23)
At least he can figure out the algorithm. Scala is a great language for expressing algorithms, and it is quite easy to learn and understand. Anyway, C++ solutions/translations are welcome. I posted this, because no-one posted right answer for quite a long, and I happened to solve a similar problem (but with ascending subsequences) quite recently.
Last edited on Sep 22, 2010 at 9:01pm
Topic archived. No new replies allowed.