so i am refactoring this function: http://pastebin.com/mv85KMTk because it was incredibly messy before, and have gotten to the set of cases at lines 20-22. originally it was a bunch of loops just parsing the data, but i want something more proficient. i first tried for_each, and then a range based for loop, but neiter worked. the goal is to capture (decimal) numbers. so for example:
add(10, 4); would capture add, (, 10, ,, 4, ), and ;. how could i do this?
When I did something similar to this years and years ago... I had a sort of "state based" system.
You'd examine each character, and depending on which state you were in, it would have a different effect. Each time you exit a state, you would export a token.
For simplicity sake, let's use your example string:
add(10, 4);
Start in "neutral" state... or "null" state or whatever. Then just start looking at characters.
a is alphabetic. So we enter "identifier" state. While in "identifer" state, each alphanumeric character is simply added to the identifer name. Once we run into any other character, we exit this state.
d is alphanumeric, so we stay in "identifier"
d ditto
( is not alphanumeric. So we exit "identifier" state. The accumulated string 'add' is output as a token of type "identifer". Now, '(' is, let's say, an operator. So we enter operator state. Anything that isn't an operator will exit this state.
1 is numeric, so we exit operator state... which means exporting '(' as an operator token.
0 is numeric, so we stay in numeric state
, is an operator, so we exit numeric state.. export '10' as a numeric token.. and enter operator state