Levenshtein Distance

How do I backtrack and print out the alignment?
Here's my code and some conditionals to get my second matrix to store the path but it doesn't appear right.
Instructions on this assignment: http://www.cs.hunter.cuny.edu/~saad/courses/c++/hw/hw9.pdf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <iostream>
#include <string>
using std::string;
using std::cout;
using std::cin;
using std::min;

int main () {
string s; // ref
string t; // hyp
cout <<"\n \t" << "Enter first word." << "\n \t";
cin>>s;
cout << "\t" << "Enter second word." << "\n \t";
cin>>t;
int len=s.length();
int leen=t.length();
int* d=new int[len+1, leen+1];
int* b=new int[len+1,leen+1];
int i,j;
for(i=0; i<=len; i++)
    d[i,0]=i;
for(j=0; j<=leen; j++)
    d[0,j]=j;
for(i=1;i<=len;i++)
{
    for(j=1;j<=leen;j++)
    {
        if( s[i-1]==t[j-1] )
        {
            d[i,j]= d[i-1,j-1];
            if(d[i,j]==d[i-1,j]+1)
            {
                b[i,j]=d[i-1,j]+1;
            }
            else if(d[i,j]==d[i,j-1]+1)
            {
                b[i,j]=d[i,j-1]+1;
            }
            else
            {
                b[i,j];
            }
        }
        else
        {
             int m=min( d[i-1,j],d[i,j-1]+1);
            d[i,j]=min( m, t[i-1,j-1]+ (s[i-1] == t[j-1]));
            if( d[i,j]==d[i-1,j]+1 )
            {
                b[i,j]=d[i-1,j]+1;
            }
            else if( d[i,j]==d[i,j-1]+1 )
            {
                b[i,j]=d[i,j-1]+1;
            }
            else
            {
                b[i,j];
            }
        }
    }
}

cout<<"\t" << d[len,leen] << "\n";
cout <<"\t" << b[len,leen];
delete[] d;
delete[] b;
return 0;
}
Last edited on
This is not 2D:
1
2
int * d = new int [ len+1, leen+1 ];
d[ i, 0 ] = i;

It is 1D. Read about multi-dimensional arrays.


Is there a name for this alignment algorithm?
Hmm is this right then?
I googled edit distance and it came up as Levenshtein distance where it sees how much we have to edit the string to look like the other. http://rosettacode.org/wiki/Levenshtein_distance#C.2B.2B

Hmm is this done right for multidimensional arrays?
It crashes after inputting the two words now though.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <iostream>
#include <string>
using std::string;
using std::cout;
using std::cin;
using std::min;

int main () {
string s; // ref
string t; // hyp
cout <<"\n \t" << "Enter first word." << "\n \t";
cin>>s;
cout << "\t" << "Enter second word." << "\n \t";
cin>>t;
int len=s.length();
int leen=t.length();
int i,j;

int** d;
d=new int*[len];
for(i=0;i<len;i++){
    d[i]=new int[leen];
}
for(j=0;j<len;j++){
    for(i=0;i<len;i++){
        d[i][j]=0;
    }
}

for(i=0; i<len; i++){
    d[i][0]=i;
}
for(j=0; j<leen; j++){
    d[0][j]=j;
}
for(i=1;i<len;i++)
{
    for(j=1;j<leen;j++)
    {
        if( s[i-1]==t[j-1] )
        {
            d[i][j]= d[i-1][j-1];
        }
        else
        {
            int m=min( d[i-1][j],d[i][j-1]+1 );
            d[i][j]=min( m, d[i-1][j-1] + 1 );
        }
    }
}
cout << d[len][leen];



for(i=0;i<len;i++){
delete[] d[i];}
delete[] d;
return 0;
}
Last edited on
One thing to remember when dealing with arrays is that valid indices are 0 to size-1 where size is the number of elements in the array. You get it right on line 21, but wrong on lines 24, 26, 29, 31, and 47.

Oops. Sorry slow person here. I changed the '<=' s to '<' s but it still crashes when I finish inputting both "kitten" and "sitting." I think it's when I'm trying to print it out. // Yeah when I comment out the cout, it runs without crashing.
Last edited on
when I comment out the cout, it runs without crashing.


Are you referring to the cout on line 51 where you access outside the bounds of the arrays?
Yes. It seems to run fine until I try to access it.
Last edited on
Let me put a little more emphasis there for you:

Are you referring to the cout on line 51 where you access outside the bounds of the arrays?


Neither len nor leen are values that are valid indices into said arrays.
It still crashes when I test d[i][j] which is why I'm still confused.
//
Whoops.
I just changed it to d[i-1][j-1] and it seems to work now with "kitten" and "sitting". Will test other words.
Now just to get the backtracking working.
I'm supposed to make another array to store the element that is being inserted and deleted, yes?
Last edited on
Websearch for "Needleman and Wunsch".
Last edited on
Topic archived. No new replies allowed.