LALR Parsers

Hoorah. I just finished my first LALR parser (which parses some simple math expressions containing numbers and operators (+,-,*,/,%,^)), based on Albatross' article (you should check it if you haven't already).

The code is below:

/* 
 * File:   LALR_Calculator.cpp
 * Author: Ties
 *
 * Created on 7 december 2011, 20:01
 */

#include <iostream>
#include <deque>
#include <cmath> // Contains: double pow(double,double)

/*
    Error checks:
    ERROR: EMPTY DATA STRING (in preprocesser)
    ERROR: UNMATCHED (begin) PARENTHESIS (in lexer)
*/

using namespace std;

struct data {
    data() : char_count(0), priority_count(0), max_priority(0) {}
    deque<float> value;
    deque<int>   priority;
    deque<bool>  is_operator;
    int char_count;
    int priority_count;
    int max_priority;
};
int preprocesser(string& s) {
   int found = s.find(" ");        // Find the first space
    while(found!=string::npos)      // While there are spaces
    {
        s.erase(s.begin()+found);   // Remove the space
        found = s.find(" ");        // Find next space
    }
    return bool(s.size());
}
int lexer(string& s, data& d) {
    while(s.size())                     // While we still have unprocessed text
    {
        while(s.size() && s[0]=='(')
        {
            d.priority_count += 3;
            s.erase(0,1);
        }

        bool sign = (s[0]!='-');        // If the current strings with '-', negate the next value.
        if (!sign) s.erase(0,1);        // If there was a '-', remove it.

        unsigned pos = s.find_first_of("+-*/%^()");     // Find the first operator and check it's position
        if (pos == string::npos) pos = s.size();        // If there is no next operator, set the position at the end of the string
        d.value.push_back( (2*sign - 1) * atof(s.substr(0,pos).c_str()));   // Push the value (possibly negated)
        d.priority.push_back(d.priority_count);                             // Push 0 as the priority
        d.is_operator.push_back(false);                                     // Push false, since this is a number
        s.erase(0, pos);                               // Erase the number we just added

        while(s.size() && s[0]==')')
        {
            d.priority_count -= 3;
            s.erase(0,1);
            if (d.priority_count < 0) return 0;        // ERROR: UNMATCHED PARENTHESIS
        }

        if(s.size())                        // While we still have unprocessed text
        {
            d.value.push_back(s[0]);        // Push the operator
            d.priority.push_back(d.priority_count);    // Push 0 as the priority
            d.is_operator.push_back(true);  // Push true, since this is an operator
            s.erase(0,1);                   // Erase the operator we just added
        }
    }
    return 1;
}
int parser(data& d) {
    for(size_t i = 1; i<d.value.size(); i+=2) // Determine priority of operators and operands
    {
        unsigned prior = ((d.value[i]=='*' || d.value[i]=='/' || d.value[i]=='%')?1:((d.value[i]=='^')?2:0));
        if(d.priority[i-1]==d.priority[i]) d.priority[i-1]+=prior;
        if(d.priority[i+1]==d.priority[i]) d.priority[i+1]+=prior;
        d.priority[i]+=prior;
    }  
   
    while(d.value.size()>1)
    {
        d.max_priority = 0;
        for(size_t i = 0; i<d.value.size(); i++) if(d.priority[i]>d.max_priority) d.max_priority = d.priority[i];
        for(size_t i = 0; i<d.value.size(); i++)
        {
                   if((d.priority[i]==d.max_priority) && (((i==0) || (d.priority[i-1]<d.max_priority)) && ((i==d.value.size()-1) || (d.priority[i+1]<d.max_priority)))) {d.priority[i]-=1;}
        }
        for(size_t i = 0; i<d.value.size(); i++)
        {
            if(d.priority[i]==d.max_priority && d.priority[i+1]==d.max_priority)
            {
                     if (d.value[i+1] == '+') d.value[i] += d.value[i+2];
                else if (d.value[i+1] == '-') d.value[i] -= d.value[i+2];
                else if (d.value[i+1] == '*') d.value[i] *= d.value[i+2];
                else if (d.value[i+1] == '/') d.value[i] /= d.value[i+2];
                else if (d.value[i+1] == '%') d.value[i] = int(d.value[i])%int(d.value[i+2]);
                else if (d.value[i+1] == '^') d.value[i] = pow(d.value[i], d.value[i+2]);
                else continue;
                d.value.erase(d.value.begin()+i+1);
                d.value.erase(d.value.begin()+i+1);
                d.priority.erase(d.priority.begin()+i+1);
                d.priority.erase(d.priority.begin()+i+1);
                d.is_operator.erase(d.is_operator.begin()+i+1);
                d.is_operator.erase(d.is_operator.begin()+i+1);
            }
        }
    }
    return 1;
}

int main() {
   while(true) {
      data processed;
      string expression;
      getline(cin, expression);
      if(preprocesser(expression) && lexer(expression, processed) && parser(processed)) cout << processed.value.front();
      cout << endl << endl;
   }
   return 0;
}

In my code, there's only one known bug, expressions in which an attributative minus-sign is directly followed by parenthesis cannot be evaluated. Examples: 3--(2+1) or -(5^2).

But I'm not here to discuss about that, I'm here to talk about parsers. What have you done so far and what good advices can you share with me, also, where do I go from here? What should I try to create next?

Last edited on

tition (887)

Check out also helios's parser implementation:

http://www.cplusplus.com/forum/lounge/11845/

I ended up modelling my own LALR parser after it.

One shouldn't check for closed - open braces - this was my first implementation too. However, helios' stack-based approach is far more elegant, and very easy to maintain (I have been expanding the syntax of my calculator for the past one year without running into any major problems).

P.S.
I am like 60% done with a Mathematica-inspired symbolic calculator. When done, I will surely post its code (it should end up being no more than 1.5k lines of code). I am already done with the parser part, and am halfway done with the trickiest part of the interpreter >:)

Last edited on

Cubbi (4774)

I've been using boost.spirit whenever I have to parse anything that's more complicated than a simple regex, check it out if you haven't: http://www.boost.org/doc/libs/release/libs/spirit/doc/html/index.html (it's a header-only library, you won't need to link against anything)

Last edited on

helios (17607)

Hurm. Embedding the grammar directly into the parser isn't very nice. On the other hand, making the lexer do syntax checking and precedence assignment is even worse.
Does it handle the right associativity of ^ properly?

hamsterman (4538)

I know they are limited*, but I love recursive descent parsers (using parser combinators). They're so simple to work with. And my parsing needs aren't too fancy.

_{*to be honest, I don't really know where their limit is..}

Kyon (912)

Well, I'll be sure to check Helios' implementation, it seems really dynamic and neat (but also rather hard to understand, imo). I'll also be sure to check on Boost.Spirit and any available alternative libraries too. Hamsterman, I believe the limit on recursive descent parsers is that their are recursive. The level of depth of recursion depends on the amount of memory available. (I'm not an expert in the subject though.)

Also, Helios, you are right, right and right again, sadly. This was just some simple parser to get me started in the subject. I want to do two main things at this point (I'll be going further after those):
1. Make a parser more like your implementation (more dynamic).
2. Make a parser to create a simple implementation of some programming language.

chrisname (7395)

I don't know a thing about parsing, but when rapidcoder posted that challenge I found an easy way was to convert from infix to postfix notation. It makes parsing expressions much easier.

I don't have the code anymore but I might have posted it on that thread.

hamsterman (4538)

@Kyon, no. Well, technically, yes, but you don't really ever have that much depth, and even if you did, this wouldn't be too hard to fix (using a stack of your own).
Wikipedia says something about only allowing LL(k) grammars. What I don't know is what grammar is not LL(k). Also, I don't really know to classify my naive parser combinator - it does no explicit lookup, it just tries every rule it can. I don't know how much of a difference is there. Maybe I just need to find myself a good book on it..

@chrisname, do you mean http://en.wikipedia.org/wiki/Shunting-yard_algorithm

Topic archived. No new replies allowed.