Parsing logical expressions

Sep 8, 2016 at 5:54pm
Hello. I'm trying to write an experimental database and stuck with parsing logical expressions. For example:

 
col_1 = 10 and (col_2 = 'test' or col_2 = 'test2')


My initial idea was to write a stack and keep track of the current location in the stack. The part where I'm having an issue is the transaction between precedences created by parentheses. This is what I have at the moment:

1
2
3
4
5
6
7
8
9
10
11
12
struct Statement
{
	struct Condition
	{
		char column_name[32];
		UniValue compare; // union of various types
		
		struct Condition *next;
		int next_operation; // 0 = and , 1 = or
	} condition_stack[5];
	unsigned int stack_level;
};


Am I on right track here or is there a better way of parsing such logical expressions?
Last edited on Sep 8, 2016 at 5:57pm
Sep 8, 2016 at 6:49pm
What you have in the first snippet is a combination of conditional expressions (=) and relational expressions (and,or). When parsing the expression such as the one you supplied, I find it helps to think of it in Backus-Naur Form (BNF).
https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form

1
2
3
4
5
6
7
8
9
10
11
12
relational_expr ::= relational_term relational_op relational_term
relational_op ::= AND
                ::= OR
relational-term ::= conditional_expr
                ::= ( relational_expr )
conditional_expr ::= conditional_term conditional_op conditional_term
conditional_op  ::= =
//  What about other conditional operators such as !=, <, >, <=, >= 
conditional_term ::= column_name 
                 ::= integer
                 ::= string
                 ::= ( conditional_expr )
Last edited on Sep 8, 2016 at 9:00pm
Sep 8, 2016 at 7:58pm
Thanks for the answer it's somewhat more clear now. Before I get onto implementation, would that translate into something like this in C?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
union CondTerm
{
	char column_name[32];
	int integer;
	char *string;
	struct CondExpr *cond_expr;
};

struct CondExpr
{
	union CondTerm terms[2];
	int op; // =, !=, <, etc.
};

union RelTerm
{
	struct CondExpr cond_expr;
	struct RelExpr *rel_expr;
};

struct RelExpr
{
	union RelTerm terms[2];
	int op; // AND, OR
};
Last edited on Sep 8, 2016 at 7:58pm
Sep 8, 2016 at 9:13pm
You're on the right track.

One suggestion I would make is to use tagged unions.
https://en.wikipedia.org/wiki/Tagged_union

For example, you have 4 possible types inside CondTerm. You need to keep track of which type a CondTerm represents. Same for RelTerm.

1
2
3
4
5
6
7
8
9
10
11
enum condterm_enm { CT_COL, CT_INT, CT_STRING, CT_CONDEXPR };

struct CondTerm
{  condterm_enm    ct_type;
    union
    { 	char column_name[32];
	int integer;
	char *string;
	struct CondExpr *cond_expr;
    };
};
Sep 9, 2016 at 6:26am
Thanks for the answer. I'll get onto implementation then.
Sep 9, 2016 at 7:34am
Hi. Just an opinion. Trying to build a parser is not an easy taks. Why don't you use an existing one? (boost p.e.)
Topic archived. No new replies allowed.