maximum sorted subsequence in a string problem

My assignment is:

Write an efficient algorithm to find the maximum sorted subsequence in a string. For example:
aaaaaaabbbbbb returns ab
aaaaaaabbbbbbccdd returns abcd
abacdefkabfhxy returns abcdefkxy
abacdefkabfhixy returns abcdefhixy
abclmzbnopsw returns abclmnopsw

right off the bat I came up with a ridiculously simple solution that works for the examples my professor gave me but has serious limitations:

#include <iostream>
#include <string>
using namespace std;

void sequence (string str1)
{
char recent = str1[0];
cout << recent;
for (int i = 1; i < str1.length(); i++)
if (str1[i] > recent)
{
cout << str1[i];
recent = str1[i];
}
}

int main()
{
cout << "Enter a string: ";
string johnson;
cin >> johnson;
cout << "The substring sequence is: ";
sequence(johnson);
return 0;
}


The problem is this only works if the longest substring starts at the first character like the examples I was given. I thought maybe i'd have it look for the smallest value then go from there but then if the smallest value is after a longer subsequence im screwed.

Can someone point me in the right direction? I'm not looking for code necessarily but just a concept on how to tackle this. My thanks in advance.
This is not a simple algorithm... and your response doesn't actually answer the problem. For example, it will fail with:

aiemckgobjfndlhp
--> acgjnp
--> aegjlp

Notice also how there may be multiple answers! One of the examples your professor gave you has multiple answers:

abacdefkabfhxy
--> abcdefkxy
--> abcdefhxy

You might want to read through
http://www.algorithmist.com/index.php/Longest_Increasing_Subsequence
and
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

(If this is an algorithm's class, though, you might want to just read the first link and spend more time thinking about how to get the longest common subsequence.)

Good luck!
its a 182 class, essentially a part 2 of c++.

Is there anyway you could help break down the info in the first link? It uses so many symbols and subscripts that unfortunately I dont follow.

This is a sudden jump in difficulty for this class for sure. Our last assignment was to use a recursive function that found the largest int in an array, lol.
I can't stand it when professors do that. Most of your class will be lucky if they can turn in any solution, let alone an efficient one.

It isn't too much -- I was able to play with it in about 40 lines of code (plus another 15 for #includes and about the same main() as you have above).


The first link basically says that the simplest way to solve it is to:

    1. make a sorted copy of A (call it B) containing only unique elements 
    2. use the Longest Common Subsequence problem on A and B


Finding the Longest Common Subsequence is a little more involved, and requires you to be able to think of a list of things in a recursive way. That is, a list is "the first item in the list" attached to "the rest of the list". In the Algorithmist and Wikipedia examples, we turn that around: we have the "rest of the list" and "the last item in the list".

The reason for thinking of lists this way is how we can handle it. If I can do something to the last item in the list and the rest of the list, then I can process the entire list.

Here is an example: sum a list of numbers.

(1 2 3 4 5 6)

Split the list into "first item" and "rest of the list":

1 : (2 3 4 5 6)

Now I can think of the sum as 1 + (sum of rest of list). This repeats until there is no more list. Make sense?

The notation for this is x:xs (first item and rest of list). The articles work backwards, so the notation is xs:x (rest of list and last item).


The LCS problem is very similar. As input we have two lists: xs:x and ys:y. We find the LCS by breaking it down into three possibilities, and choosing the best solution:

1
2
3
if x==y
  then return LCS(xs,ys):x
  else return longest_of( LCS(xs:x,ys), LCS(xs,ys:y) )

While this is all expressed in recursive terms, it can be done with just loops. You can optimize a bit by using the algorithms found in the Wikipedia article: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem#Code_for_the_dynamic_programming_solution

In it, you first build a table to find just the lengths of all the subsolutions, then you go backwards to reconstruct the actual solution by choosing the longer subsolutions.

Remember, you don't have to use recursion to do it -- the size of your lookup table will be a simple square array you can new and delete[], and the size of your result will be the value you find in your lookup array's last index [m][n].


I know this is a whole lot of stuff to wrap your head around; try to puzzle it out a bit and post back with specific problems..
Last edited on
hahah wow talk about more than I've ever dealt with... Ya it doesn't make sense to me, going to go over it a few more times. Thanks for the help I'll see if I can make some progress. For whatever reason my head just isn't wrapping around so to speak.
I just did a very naïve version for fun. (It is really stupid, though, and not efficient at all, so it takes a while to work when the test is long enough and/or convoluted enough.)

Here are my functions (in pseudocode):

function LCS (x:xs, y:ys)
  if (is empty-string x:xs) or (is empty-string y:ys)
    return empty-string

  if (x equals y)
    return x + LCS( xs, ys )

  return longest_of( LCS( x:xs, ys ), LCS( xs, y:ys ) )

function LIS( a )
  return LCS( a, unique_sort( a ) )

Here's my LIS in C++:

1
2
3
4
5
6
7
template <typename Container>
Container LIS( const Container& xs )
  {
  // std::set is used as unique_sort()
  std::set <typename Container::value_type> ys( xs.begin(), xs.end() );
  return LCS( xs, Container( ys.begin(), ys.end() ) );
  }

The dumbed down version is:

1
2
3
4
5
6
string LIS( const string& xs )
  {
  // std::set is used as unique_sort()
  set <char> ys( xs.begin(), xs.end() );
  return LCS( xs, string( ys.begin(), ys.end() ) );
  }

Anyway, I'm not going to show you my C++ version of LCS() yet, because it is so simple you should (try to) write it yourself.


Anyway, how are you doing with this assignment? Are any of your other classmates lost?
Last edited on
Topic archived. No new replies allowed.