Writing a lexical analyzer/parser

Forum

Forum
General C++ Programming
Writing a lexical analyzer/parser

Writing a lexical analyzer/parser

Oct 6, 2012 at 2:00pm

Hi, i'm trying to learn more about how Lexical Analyzers/Parsers work. I haven't coded any classes yet, because i'm not really sure how the entire process from a Lexer to working code goes.

My goal is to write a simple made up programming language and translate that to another language, like Javascript. The first thing i have to do is give the code to a Lexical Analyzer.

The lexer will split the source into tokens and assign a label to it. So suppose i have the following code:

def myVar = 10;

The lexer will split that into token like this:

def -> keyword
myVar -> Identifier
= -> Operator
10 -> Number

That's basically as far as i understand what a lexer does. How can i translate the tokens to a language like Javascript?? From what i understand i need to write a Parser class. But i couldn't find any info on what that class exactly does.

So what is exactly the next step i have to take?

Last edited on Oct 6, 2012 at 2:02pm

Oct 6, 2012 at 2:55pm

aquaz (170)

Parsing is a big subject, I suggest you to read about grammars, LL and LR parsing.

A grammar is what describe your language. More infos:
http://en.wikipedia.org/wiki/Formal_grammar

A parser is a programm which transform tokens into an AST(Abstract Syntax tree) given a specific grammar.

LL parser can be written by hand but tools exists to generate them for complex grammars:
http://en.wikipedia.org/wiki/LL_parser

LR parsers are too dificult to write by hand so they are generated by tools which transform a grammar into C/C++/Java... code:
http://en.wikipedia.org/wiki/LR_parser

Oct 6, 2012 at 3:13pm

AbstractionAnon (6954)

A parser is responsible for applying the semantics (meaning) of your language.

A simple command line parser generally parses one token at a time.
A more complex language needs to build parse trees and generally uses a state machine to reduce sequences of tokens into intermediate productions, and then reduce the intermediate productions into a single (final) production; i.e. the complete program.

You will frequently find languages defined in Bacus-Naur Form (BNF).
http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form

Expressing your language in BNF is usefull exercise in understanding how to parse the language.

If you language is non-trivial, you might take a look at Bison, which is an open source parser generator. Given the BNF for a language, it wiill generate a parser for that language. You still have to write the code that "applies" each production using the tokens on the right side and produces the left hand side of each BNF statement.

Last edited on Oct 6, 2012 at 3:22pm

Oct 7, 2012 at 9:08am

vivendi (2)

Thanks, i see i still have to go through alot of resources before i can fully understand how the entire process goes. Eventhough my parser won't be that complex in the end. But its still a good idea to know what possibilities are out there.

I also started reading the Dragon Book. It's an old book, but still recommended by alot of others.

Oct 7, 2012 at 9:27am

closed account (z05DSL3A)

Take a look at:

Parsing Techniques: A Practical Guide
by Grune and Jacobs

I would suggest finding it a library, it is quite expensive.

Topic archived. No new replies allowed.