Lexical Analyzer

Hello all I am studying "Data structures and algorithms" these days at college. As per suggested by most senior fellows, I am thinking about proposing "lexical analyzer" as my term project. Although I am yet to study it in detail.

Before I propose my project can you guys please fill me up with all possible data structures which will be required to design this project.

And what is it's difficulty level at "Data structures and algorithms" level?

Also provide me with any useful resource/books if you got.
I'll do my research too, but it's always good to discuss with someone who knows the business inside out.

Looking forward to your response, many thanks.
How sophisticated do you intend to make this? Is it going to analyze a text file, or the syntax of something like code? This really all depends on how specific the parser would be and would in turn determine the data structures as well as difficulty level.

Are you wanting resources for how to design Lexical Analyzers in C++, or just general information about them?
all possible data structures


With something like a lexical analyzer, you can probably find useful applications for any of them.

You could look at the flex/flex++/flexc++ source code for inspiration.
cantide5ga I am looking forward to build a analyzer which analyzes C++ code for me but on efficient basis. I'll appreciate all sorts of resource regarding design as well as the general information thanks.

hanst99 Thanks for the reply, can you please mark it's difficulty level on a scale of 10. For a "Data structures and algorithms" student?
Lol. Good luck with that. To analyze C++ code? That would require a lot more knowledge than just data structures and algorithms.
Thanks for the reply, can you please mark it's difficulty level on a scale of 10. For a "Data structures and algorithms" student?


Nope, I can't. Building lexical analyzers for one specific task manually is usually a lot more busywork than actually being hard though.

Just to make sure - we are talking about the same kind of lexical analyzers here? The one that you feed some text and which then spits out a list of tokens?
@Vlykarye

Really? I mean, lots of symbols to parse, some with double meanings depending on use. Other then consuming a lot of time, I figured structs and algorithms is all there is to it.

How would you go about it?
I wouldn't go about it. I would have to agree with hanst here. I don't rightly know what a lexical analyzer is. But I bet he does, and from what he says, it sounds like a lot of work.
The OP is most likely speaking of parsing a document into tokens to check for syntactical correctness. It is tedious, but not necessarily difficult. Hopefully they get back.
http://en.wikipedia.org/wiki/Lexical_analysis

Essentially, a lexical analyzer would read something like this:

 
abc = 3 + i;


and spit out something like:


IDENTIFIER(abc), ASSIGNMENTOPERATOR, INTEGER(3), PLUSOPERATOR, IDENTIFIER(i)


(roughly. Of course it depends on what kind of tokens you actually want to generate).

EDIT: Just noticed, the example is redundant because the article has pretty much the same one.
Last edited on
That's interesting. Are they used exclusively for documentation? Or do compilers use lexical analysis?

... perhaps that is a dumb question. Obviously compilers use the same idea, but what is the main difference between a lexical analyzer and syntax checker?
Last edited on
You run a lexical analyzer to generate input for a parser (that eats a series of tokens and assigns meaning to it). So yep, lexical analysis is part of any compiler (or interpreter for that matter). Sometimes there is no strict distinction between the lexical analysis and the parsing, but I think in most larger systems it is made.
Last edited on
Oh ok. So do people normally use this phrase when talking about parsers and compilers?
Ah. Alright. I have been pretty bad about remembering what terms are used for what. So I'm trying to break that habit. And now I know what this is. Sounds very useful. I feel ashamed on wanting to make a small language without knowing this stuff ><
Don't be. I personally have the habit of using 'lexer' and 'parser' very loosely myself.
Topic archived. No new replies allowed.