Just curious, do any well-known string matching algorithms search for matches in lists in this fashion? Edit: Not talking about regex as we are dealing with lists of strings not in a "match and replace" fashion... although a regex for this would be "(nan|xad)", I wonder if the regex algorithm will convert that to "(n|x)a(n|d)" Edit: Ah wait, maybe that's why regexes needs to be "compiled" so they can do these efficiency checks beforehand.
the list:
nan
ned
xad
zad
We shall match nan or xad...
match N or X
'an
'ed
'ad
~~~ (will never match, so remove from list)
match A
''n
~~~
''d
~~~
match N or D
''' (MATCH FOUND)
~~~
''' (MATCH FOUND)
~~~
Edit: Sorry, lots of edits. More specifically, I'm asking if, in C++, not using regex, there are string functions that do this in the above efficient manner (not match-checking one at a time), e.g.:
Usually what happens is you have a list of string you want to match, then for each string in each list, you do a comparison.
compare nan, ned, xad, zad to xad, zad:
1 2 3 4 5 6 7 8 9 10 11
xad == nan ? no (+1 comparisons) x,n
zad == nan? no (+1 comparison) z,n
xad == ned ? no (+1 comparisons) x,n
zad == ned ? no (+1 comparisons) z,n
xad == xad ? yes (+3 comparisons) x,x a,a d,d
zad == xad ? no (+1 comparisons) z,x
xad == zad ? no (+1 comparisons) x,z
zad == zad ? yes (+3 comparisons) z,z, a,a d,d
comparing nan or xad (precompiled to know that zad and xad share AD):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
nan
ned
xad
zad
match X or Z (+7 comparisons) x,n x,n x,x x,z z,n z,n z,z
~~~
~~~
'ad
'ad
match A (+2 comparisons) a,a a,a
''d
''d
match D (+2 comparisons) d,d d,d
'''
'''
my example = 11 comparisons
other example = 12 comparisons
..ok that's really close. Also precompiling may have to be done during runtime which is +3 comparisons. But in the other example you could also say the same thing, +1 comparisons to see xad matched zed. Actually technically i'd be -1 comparisons. if Is it efficient? I couldn't tell you, we'd need better examples. But then I guess the purpose of this thread has fallen flat. Sorry! I can mark as resolved.
The closest to what I think you're looking for would be a trie (https://en.wikipedia.org/wiki/Trie ). Tries have the lowest time complexity to determine whether a string is equal to any string from a set. O(n) for tries vs O(n*m) for the naive implementation. However, building the trie is fairly expensive, and traversing one is not cache-friendly. In order for this extra cost to be amortized, the set of strings needs to be really large, and the trie needs to be reused many times.