so im using the re [_|a-z|A-Z]+[_|a-z|A-Z|0-9]+ to find names for my lexer, such as _Name or system or varName34, ie it allows for the same naming conventions as c/c++. however using this makes it so it will find it twice. ie lets say a variable name was Counter, it would find that Counter, and then the same Counter. could someone please tell me what is wrong with my re?
@duoas: i was using the pipes for or. i am still relatively new to re's and didnt know that i could do without them. and yeah its copying it because its doing it for words that i know there is only one of.
@everyone: is it my code then? i switched the re to one suggested and its still doing it. could it be something in my code?
ok that makes sense. so then i have a few more questions
a) how can i stop that from happening?
b) why does this happen if i pass the rest of the string past the first match?
c) why doesn't this happen with the exact same code, except the re is \"[^\"]+\" (for strings) and import|function|var|println|end (for keywords)?
Believe it or not, the reason it happens is so that people can get at the pieces of their advanced regexes. You are actually using some pretty simple ones.
i cant tell if its the first element though. its just pushing them back. and what if its something like this:
_Name _Name in which case i would want it to be two
Why would you want to treat "_Name _Name" specially? Is it supposed to be a single lexeme?
Also, you know full well how to just get one thing from a list. The only difference is that your list comes as a series of function calls instead of an indexable array. So how many times do you need to call the function to get the first element?