I am reading something from regex format, expands it and writing it out. This list can become huge while writing it out.
While writing it out, I do not have the original regex data. So, I will have to create regexes out of the strings which I have.
A couple of cases while reading and writing:
Say, read regular expression is:
abc/*
Since 'abc' can have only 'A', 'B', 'C', 'D'(Have this list with me), Above would be translated to list of strings as
"abc/A", "abc/B", "abc/C", "abc/D" -- 1
Say, another read regular expression is:
def/*/A
Since 'def' can have only 'x', 'y', 'x'(Have this list with me), Above would be translated to list of strings as
"def/x/A", "def/x/A", "def/x/A" -- 2
I have already said that I do not have original regular expressions now. All I have is list of strings. I will have to create regexes out of statements number 1 and 2.
From number 1, I should get
abc/*
From number 2, I should get
def/*/A
which are the original.
Question: Which data structure would be efficient to solve this problem. I have thought of using tries & Aho–Corasick algorithm but could not get a clear solution at top of my head till now.
I would be happy to expand the question more in case it is not clear. Please consider that * will not match /, //, or anything except characters.
It looks like you're trying to output the set of strings that will match the expression. That's impossible since you can easily form an expression that matches zero or more characters.
It looks like you're trying to output the set of strings that will match the expression
I think he's doing the opposite. Given a set of strings, determine the single "regular expression" that describes them. The regexes are really just expressions using the shell/command prompt * character, though.