To start, you need to store the line and position number with the current token. If it fails to parse then you can complain and print the offending line with the token marked. This assumes that the you only need to look ahead by 1 token to parse the language (LL1? it's been a long time...) |
Yes, I'm already storing the line and column numbers of each token. It is a quite general parser, so it accepts the following types of rules ('Ter' stands for terminal elements and 'Non' for non-terminal elements):
1 2 3 4 5 6 7 8
|
Non --> Ter
Non --> Non
Non --> Ter Ter
Non --> Ter Non
Non --> Non Ter
Non --> Non Non
Non --> Ter Non Ter
Non --> Non Ter Non
|
The difficulty is that a failing rule does not imply a mistake. So I cannot just stop as soon as a rule fails and report an error. For example, consider the following grammar:
1 2 3 4
|
Program --> Instruction Program | Instruction
Instruction --> Expression ;
Expression --> Expression + Expression | Number
Number: real number
|
With this grammar, the parser recognizes all programs that consist of expressions separated by semicolons, where an expression is either a real number or a sum of real number. Suppose that I give it the following input that contains an
error on the fourth line:
We could naively claim that for the program to be correct, all instructions must be correct. Thus, as soon as an 'Instruction' rule fails, I can report the error. But it is not as simple, because in order to start with the first rule 'Program', the parser has first find an 'Instruction', and to do this, it has to to try the 'Instruction' rule from the very first token while increasing the current token until it finds a match (that is, the current token falls on the first semicolon). This means that the "Instruction" rule will fail several times, even though there is no mistake on the first instruction. So I cannot report the error at that time.
Of course, if the grammar was known at coding time, I could solve the problem by searching first for a semicolon, and then applying the "Instruction" rule.