String comparison

Aug 5, 2012 at 3:38pm

Hello. I am writing a chatbot and I am having trouble thinking of a function which would return a percentage of similarity for two strings. At first I though that it would be easy. I would use a forloop to iterate though each charcter of the two strings, add the number of matches and convert it into a percentage, but then I run into this problem. Lets say someone accidently misses out one letter for example:

String 1: Hello World!
String 2: Helo World!
Match (y = yes, n = no): YYYNNNNNNNNN (25% match)

Equally if someone accidently adds a letter:

String 1: Hello World!
String 2: Helllo World!
Match (y = yes, n = no): YYYNNNNNNNNNN (23% match)

In terms of human reading, these two strings are alot more similar than just 25% or 23%, so I was wondering if someone could help me come up with an idea for a function that would allow me to check the similarities of two strings without allowing diffrent lengths (expressed in the example) to sigifantly effect the percentage.

Thankyou.

Aug 5, 2012 at 3:49pm

Zephilinox (595)

compare it word-by-word

String 1: Hello World!
String 2: Helo World!

Match (y = yes, n = no): [Hello/Helo]
H = H; Y
E = E; Y
L = L; Y
L = O; N
L = N/A; N (maybe you could discount this as a comparison, improving it from 10/12 to 10/11 = 91%)
3/5

whitespace matches, Y
1/1

Match (y = yes, n = no): [World!/World!]
W = W; Y
O = O; Y
R = R; Y
L = L; Y
D = D; Y
! = !; Y
6/6

10/12 = 83.33 ~= 83% match

Last edited on Aug 5, 2012 at 3:51pm

Aug 5, 2012 at 4:53pm

Owain (149)

That would work very nicley :) Thankyou friend!

Aug 5, 2012 at 5:00pm

Zephilinox (595)

^^ no problem, glad I could help, it still isn't perfect because Helo World! vs Hello World! seems a lot more accurate than 83~91% to humans, but it should do for the moment.

Last edited on Aug 5, 2012 at 5:05pm

Topic archived. No new replies allowed.