Rules for soundex

Sep 7, 2008 at 10:09pm
Hello,
I am having some difficulty making soundex rules work in my code. I am able to get the first character and some of the following digits but am having difficulty getting them to work the way they should. For instance appending zero's to the end of the soundex code when it is less than 4 digits long (T50 for ten instead of T5000). This project requires the conversion take place in a function and the soundex code should be the first letter followed by 4 numerals. I am posting the code so please if you have any suggestions please post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include<iostream>
using namespace std;

void soundex_function(char&);
char input_names [10][100];  // holds the words
char output_codes[10 ][6];  // holds the soundex codes

int main()
{

    for(int i = 0; i < 10; i++)
        {
        //int putZero = 0;        
        cout << "Enter a word: ";
        cin >> input_names[i];        
        cout << endl;
        char firstletter = input_names[i][0];
        firstletter = toupper(firstletter);
        output_codes[i][0] = firstletter;
        //cout << firstletter;
         for (int c=1; c<100; c++)
            {
            char converter = input_names[i][c];
            converter = toupper(converter);           
            char encoder = input_names[i][c-1];            
            if (converter != encoder)
                {
                soundex_function(converter);
                
                //for(int oc = 0; oc<5; oc++)
                if (output_codes[i][c-1] == 0)
                    {
                    output_codes[i][c-1]= converter;
                    //putZero++;
                    }
                     
                }               
            else (c++);                
            }
        }    
    for(int j = 0;j<10;j++)
        {
        cout << input_names[j] << " => " << output_codes[j] << endl;
        }

}   
 

 //This is the character conversion function for the soundex codes
 void soundex_function(char& converter)        
   {
    if(converter=='P'||converter=='B'||converter=='F'||converter=='V') converter='1';
    else                                                  
    if(converter=='C'||converter=='S' ||converter=='Q' ||converter=='K' ||
       converter=='G'||converter=='J' ||converter=='X'||converter=='Z') 
       converter='2';
    else 
    if(converter=='M'||converter=='N' ) converter='5'; 
    else 
    if(converter=='D'||converter=='T' ) converter='3';
    else 
    if(converter=='L' ) converter='4';
    else 
    if(converter=='R' ) converter='6';
    else converter = '0';
  }  
 
Last edited on Sep 8, 2008 at 1:59am
Sep 8, 2008 at 3:14am
The SOUNDEX conversion won't work properly if you try to do more than one step at a time. (Again I refer you to the Wikipedia article.)

Example:
Higgins

Step 1: SOUNDEX consonant lookup
This is where you use your function to modify every letter (except the first) to have either its numeric value or something invalid. (I just responded to your other thread with an example.)
HI22I52

Step 2: Collapse adjacent identical digits.
HI2I52

Step 3: Remove all non-digits (except for the first letter)
H252

Step 4.1: Append zeros if needed (or just do it anyway)
H252000

Step 4.2: Return the first four characters (one letter + three digits):
H252


If I were you, I would make each step a function. You can have it work directly on a given character array if you want.
1
2
3
4
5
  // somewhere in main()
  char s[ 100 ];
  strcpy( s, input_names[ n ] );
  soundex( s );
  strcpy( output_codes[ n ], s );
1
2
3
4
5
6
void soundex( char* s )
  {
  step_one( s );
  step_two( s );
  ...
  }
1
2
3
4
5
void step_one( char* s )
  {
  for (; *s; s++)
    *s = soundex_function( toupper( *s ) );
  }
Etc.

Good luck!

[edit] Oh yeah, watch that line #65 in your soundex_function(). Invalid inputs should not return a valid output --remember, keep to one step at a time! :-)
Last edited on Sep 8, 2008 at 3:17am
Sep 8, 2008 at 3:27pm
This makes sense, I have begun implementing this in the code but when it compiles I get an error saying line 71 "could not convert 'toupper(int)((*s))' 'to char&'. I am thinking that if I change the soundex_function to (char* this would fix it. Does that make sense?
Sep 8, 2008 at 4:55pm
Sorry, I missed that soundex_function takes a reference and returns nothing. You can not take a reference to a function result... but that call to toupper() really belongs inside the function, not as part of its argument list...

[edit] Fixed the polarity of the bolded clause...
Last edited on Sep 8, 2008 at 8:16pm
Sep 8, 2008 at 11:55pm
Thanks Duoas, that helped a great deal. I now have another issue with the functions that do the conversions. Everything was working great until I added the step_three function and then too many characters are being removed. Any suggestions (any suggestions to other areas of improvement also please)? Thanks again. I guess what I am asking is what would the function look like that would remove the characters which are not numbers?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
#include<iostream>
using namespace std;

void soundex_function(char&);
void soundex( char* );
void step_one( char* );
void step_two( char* );
void step_three( char* );
char input_names [10][100];  // holds the words
char output_codes[10 ][6];  // holds the soundex codes

int main()
{

    for(int i = 0; i < 10; i++)
        {
        cout << "Enter a word: ";
        cin >> input_names[i];        
        cout << endl;
       
        char s[100];
        strcpy( s , input_names[i] );
        soundex( s );
        strcpy(output_codes[i], s );                           
           
        }
            
    for(int j = 0;j<10;j++)
        {
        output_codes[j][0] = toupper(input_names[j][0]);
        output_codes[j][6] = '\0';
	    cout << input_names[j] << " => " << output_codes[j] << endl;
        }

}
//send pointer to its functions in order
void soundex ( char* s )
{
 step_one( s );
 step_two( s );
 step_three( s );
}

//pass the pointer to the conversion function   
 void step_one( char* s )
 {
 for (; *s; s++)
   {    
    soundex_function( *s );    
   }
}

//step 2 remove adjacent like characters
void step_two (char* s)
{
for(int i=0; i<10; i++)
    {
    for(int c=1; c<6 ; c++)
       {
       if (output_codes[i][c] == output_codes[i][c-1])
    	output_codes[i][c] = output_codes[i][c+1];       
       }
    }
}
//step three remove none digit characters
void step_three (char* s)
{


 for(int i = 0; i < 10; i++)
   {
    for(int c = 1; c < 6; c++)
       {
       if (isalpha(output_codes[i][c]))
       output_codes[i][c] = output_codes[i][c+1];
       }       
   }
cout << s << endl;
}

 //This is the character conversion function for the soundex codes
 void soundex_function(char& s)        
   {
    s = toupper(s);
    
    if(s=='P'||s=='B'||s=='F'||s=='V') s='1';
    else                                                  
    if(s=='C'||s=='S' ||s=='Q' ||s=='K' ||s=='G'||s=='J' ||s=='X'||s=='Z')s='2';
    else 
    if(s=='M'||s=='N' )s='5'; 
    else 
    if(s=='D'||s=='T' ) s='3';
    else 
    if(s=='L' ) s='4';
    else 
    if(s=='R' ) s='6';
  }  

Last edited on Sep 9, 2008 at 3:46am
Sep 9, 2008 at 2:13pm
Take lines 9 and 10 and stick them inside main() (after line 13). Then figure out how to fix step_two() and step_three() to work.

Remember, only do the tiniest thing you can do at any one time. By breaking a big problem down into a bunch of little problems, you no longer have to think about the big problem at all (let alone all at once); you can now focus on solving just one little problem at a time without thinking or caring about any part of the big problem.

Heh :-, Well, give it a go...
Last edited on Sep 9, 2008 at 2:14pm
Topic archived. No new replies allowed.