Levenshtein Distance

May 10, 2014 at 7:02am
How do I backtrack and print out the alignment?
Here's my code and some conditionals to get my second matrix to store the path but it doesn't appear right.
Instructions on this assignment: http://www.cs.hunter.cuny.edu/~saad/courses/c++/hw/hw9.pdf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <iostream>
#include <string>
using std::string;
using std::cout;
using std::cin;
using std::min;

int main () {
string s; // ref
string t; // hyp
cout <<"\n \t" << "Enter first word." << "\n \t";
cin>>s;
cout << "\t" << "Enter second word." << "\n \t";
cin>>t;
int len=s.length();
int leen=t.length();
int* d=new int[len+1, leen+1];
int* b=new int[len+1,leen+1];
int i,j;
for(i=0; i<=len; i++)
    d[i,0]=i;
for(j=0; j<=leen; j++)
    d[0,j]=j;
for(i=1;i<=len;i++)
{
    for(j=1;j<=leen;j++)
    {
        if( s[i-1]==t[j-1] )
        {
            d[i,j]= d[i-1,j-1];
            if(d[i,j]==d[i-1,j]+1)
            {
                b[i,j]=d[i-1,j]+1;
            }
            else if(d[i,j]==d[i,j-1]+1)
            {
                b[i,j]=d[i,j-1]+1;
            }
            else
            {
                b[i,j];
            }
        }
        else
        {
             int m=min( d[i-1,j],d[i,j-1]+1);
            d[i,j]=min( m, t[i-1,j-1]+ (s[i-1] == t[j-1]));
            if( d[i,j]==d[i-1,j]+1 )
            {
                b[i,j]=d[i-1,j]+1;
            }
            else if( d[i,j]==d[i,j-1]+1 )
            {
                b[i,j]=d[i,j-1]+1;
            }
            else
            {
                b[i,j];
            }
        }
    }
}

cout<<"\t" << d[len,leen] << "\n";
cout <<"\t" << b[len,leen];
delete[] d;
delete[] b;
return 0;
}
Last edited on May 10, 2014 at 7:03am
May 10, 2014 at 2:39pm
This is not 2D:
1
2
int * d = new int [ len+1, leen+1 ];
d[ i, 0 ] = i;

It is 1D. Read about multi-dimensional arrays.


Is there a name for this alignment algorithm?
May 10, 2014 at 7:06pm
Hmm is this right then?
I googled edit distance and it came up as Levenshtein distance where it sees how much we have to edit the string to look like the other. http://rosettacode.org/wiki/Levenshtein_distance#C.2B.2B

Hmm is this done right for multidimensional arrays?
It crashes after inputting the two words now though.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <iostream>
#include <string>
using std::string;
using std::cout;
using std::cin;
using std::min;

int main () {
string s; // ref
string t; // hyp
cout <<"\n \t" << "Enter first word." << "\n \t";
cin>>s;
cout << "\t" << "Enter second word." << "\n \t";
cin>>t;
int len=s.length();
int leen=t.length();
int i,j;

int** d;
d=new int*[len];
for(i=0;i<len;i++){
    d[i]=new int[leen];
}
for(j=0;j<len;j++){
    for(i=0;i<len;i++){
        d[i][j]=0;
    }
}

for(i=0; i<len; i++){
    d[i][0]=i;
}
for(j=0; j<leen; j++){
    d[0][j]=j;
}
for(i=1;i<len;i++)
{
    for(j=1;j<leen;j++)
    {
        if( s[i-1]==t[j-1] )
        {
            d[i][j]= d[i-1][j-1];
        }
        else
        {
            int m=min( d[i-1][j],d[i][j-1]+1 );
            d[i][j]=min( m, d[i-1][j-1] + 1 );
        }
    }
}
cout << d[len][leen];



for(i=0;i<len;i++){
delete[] d[i];}
delete[] d;
return 0;
}
Last edited on May 10, 2014 at 7:48pm
May 10, 2014 at 7:39pm
One thing to remember when dealing with arrays is that valid indices are 0 to size-1 where size is the number of elements in the array. You get it right on line 21, but wrong on lines 24, 26, 29, 31, and 47.

May 10, 2014 at 7:46pm
Oops. Sorry slow person here. I changed the '<=' s to '<' s but it still crashes when I finish inputting both "kitten" and "sitting." I think it's when I'm trying to print it out. // Yeah when I comment out the cout, it runs without crashing.
Last edited on May 10, 2014 at 7:52pm
May 10, 2014 at 8:07pm
when I comment out the cout, it runs without crashing.


Are you referring to the cout on line 51 where you access outside the bounds of the arrays?
May 10, 2014 at 8:09pm
Yes. It seems to run fine until I try to access it.
Last edited on May 10, 2014 at 8:09pm
May 10, 2014 at 8:13pm
Let me put a little more emphasis there for you:

Are you referring to the cout on line 51 where you access outside the bounds of the arrays?


Neither len nor leen are values that are valid indices into said arrays.
May 10, 2014 at 8:20pm
It still crashes when I test d[i][j] which is why I'm still confused.
//
Whoops.
I just changed it to d[i-1][j-1] and it seems to work now with "kitten" and "sitting". Will test other words.
Now just to get the backtracking working.
I'm supposed to make another array to store the element that is being inserted and deleted, yes?
Last edited on May 10, 2014 at 8:28pm
May 11, 2014 at 7:29am
Websearch for "Needleman and Wunsch".
Last edited on May 11, 2014 at 7:29am
Topic archived. No new replies allowed.