String match algorithm that compares "al

Forum

Forum
Beginners
String match algorithm that compares "al

String match algorithm that compares "all at a time"

Just curious, do any well-known string matching algorithms search for matches in lists in this fashion? Edit: Not talking about regex as we are dealing with lists of strings not in a "match and replace" fashion... although a regex for this would be "(nan|xad)", I wonder if the regex algorithm will convert that to "(n|x)a(n|d)" Edit: Ah wait, maybe that's why regexes needs to be "compiled" so they can do these efficiency checks beforehand.

the list:
    nan
    ned
    xad
    zad

We shall match nan or xad...

match N or X
    'an
    'ed
    'ad
    ~~~ (will never match, so remove from list)

match A
    ''n
    ~~~
    ''d
    ~~~

match N or D
    ''' (MATCH FOUND)
    ~~~
    ''' (MATCH FOUND)
    ~~~

Edit: Sorry, lots of edits. More specifically, I'm asking if, in C++, not using regex, there are string functions that do this in the above efficient manner (not match-checking one at a time), e.g.:

vector<string>list = {"nan, "ned", "xad", "zad"};
vector<string>match = {"nan, "nad"};
string replace = "";

MatchAndReplace(match, list, replace);

Last edited on

helios (17575)

I couldn't understand anything.

elanhickler (15)

sorry, my example was messed up. Fixed now. Is it still difficult to understand?

cire (8284)

My question: What makes you believe this is efficient?

elanhickler (15)

Usually what happens is you have a list of string you want to match, then for each string in each list, you do a comparison.

compare nan, ned, xad, zad to xad, zad:

xad == nan ? no (+1 comparisons) x,n
zad == nan? no (+1 comparison) z,n

xad == ned ? no (+1 comparisons) x,n
zad == ned ? no (+1 comparisons) z,n

xad == xad ? yes (+3 comparisons) x,x a,a d,d
zad == xad ? no  (+1 comparisons) z,x

xad == zad ? no (+1 comparisons) x,z
zad == zad ? yes (+3 comparisons) z,z, a,a d,d

comparing nan or xad (precompiled to know that zad and xad share AD):

    nan
    ned
    xad
    zad
match X or Z (+7 comparisons) x,n x,n x,x x,z z,n z,n z,z
    ~~~
    ~~~
    'ad
    'ad
match A (+2 comparisons) a,a a,a
    ''d
    ''d
match D (+2 comparisons) d,d d,d
    '''
    '''

my example = 11 comparisons
other example = 12 comparisons

..ok that's really close. Also precompiling may have to be done during runtime which is +3 comparisons. But in the other example you could also say the same thing, +1 comparisons to see xad matched zed. Actually technically i'd be -1 comparisons. if Is it efficient? I couldn't tell you, we'd need better examples. But then I guess the purpose of this thread has fallen flat. Sorry! I can mark as resolved.

Last edited on

helios (17575)

The closest to what I think you're looking for would be a trie (https://en.wikipedia.org/wiki/Trie ). Tries have the lowest time complexity to determine whether a string is equal to any string from a set. O(n) for tries vs O(n*m) for the naive implementation. However, building the trie is fairly expensive, and traversing one is not cache-friendly. In order for this extra cost to be amortized, the set of strings needs to be really large, and the trie needs to be reused many times.

elanhickler (15)

thank you helios, that resolves my question!

Topic archived. No new replies allowed.