This kind of string parsing is obnoxious, and a
whole lot of CS literature is dedicated to it. I don't think it is well-taught in universities either; professors tend to ask people to figure it out.
Remember, user input and parsing is
not easy.
Break It Down
Each formula is a string of atoms. Each atom has the form:
name count?
The name is a single majuscule, optionally followed by a single minuscule.
The count is optional. If present, it is a list of digits bracketed by parentheses:
'(' digits ')'
If not present, the count is taken to have value = 1.
Accept / Expect Idiom
There is a basic parsing algorithm wherein you behave depending on the next character in the input.
• accept --> get the next character IFF it is what is desired
(and report whether it was accepted)
• expect --> the next character MUST be what is desired
(failure if it is not)
Your code should look to see if it can accept the next character in a string for each part of the atom. Your parsing routine should therefore look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
|
// pseudocode
function get_next_atom( string, index, &name, &count ):
name = expect an uppercase letter
if (accept a lowercase letter):
name += that lowercase letter
if (accept an open parenthesis):
expect a digit
accept more digits
count = string to integer( all digits )
expect a close parenthesis
else:
count = 1
|
That is very high-level pseudocode.
Your assignment (at this level) does not say it explicitly, but you can presume that the input is well-formed. A lot of the expect --> error stuff can be ignored. (The only one NOT to ignore would be the first because you have hit the end of the string == no more atoms.)
You can translate that into C++ code fairly simply:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
|
bool get_next_atom( string s, unsigned n, string& name, unsigned& count )
{
// expect an uppercase letter
if ((n >= s.size()) or (!isupper( s[n] )) return false;
n += 1;
// if (accept a lowercase letter):
if (islower( s[n] ))
{
// name += that lowercase letter
name += s[n];
n += 1;
}
...
}
|
It does look kind of messy, but you are spelling out exactly what to expect or accept at each step and how to behave on success or failure. There isn't much you can do to clean it up prettier.
Hope this helps.