Making a programming language

Pages: 1 234

There is no difference, provided that a local object doesn't have the same name as the member. If you were wondering why I never obviate this, it's because I've seen enough of other people's code where I couldn't tell whether a call was to a global or a member function.

Kyon (912)

Just one thing, when I compile your parser, it gives me this error:
main.cpp: In constructor ‘Rule::Rule(const Token&, const Token&, unsigned int, ...)’:
main.cpp:44: warning: ‘Token::TokenType’ is promoted to ‘int’ when passed through ‘...’
main.cpp:44: note: (so you should pass ‘int’ not ‘Token::TokenType’ to ‘va_arg’)
main.cpp:44: note: if this code is reached, the program will abort
(I'm compiling with the current version of G++)

EDIT:
(When I change va_arg(..., Token::TokenType) to va_arg(..., long int), it compiles, but it always aborts.)

Last edited on

helios (17607)

Odd. It's not really doing anything weird. You can get rid of the variadic parameters by doing some operator overloading, if you want:

Rule(const Token &to,const Token &la){
	this->reduces_to=to;
	this->lookahead=la;
}
Rule &operator<<(const Token::TokenType &t){
	this->constraints.push_back(t);
	return *this;
}

Kyon (912)

I also had to add cstdlib, because the function atol was undefined. After changing the suggested things, it works a charm, except for the fact that the ^ operator has wrong precedence.

I'm still confused on what you define in the second argument of every rule's constructor.

Finally, if I were to use a similar concept to convert eC code to C++, I could only use this for expression handling, because the rest is done by the lexer, right? And if I were to add string-support, I would have to create a new token and a few new rules as to how to use them, or not?

helios (17607)

I'm still confused on what you define in the second argument of every rule's constructor.

That's the lookahead.

Finally, if I were to use a similar concept to convert eC code to C++, I could only use this for expression handling

You're not thinking about putting this code to practical use, are you?

Kyon (912)

That's the lookahead.

Then what would you define if you'd say:
Rule(..., Token::MUL,...
And in what way would it differ from Token::DIV, Token::POW or Token::null?

You're not thinking about putting this code to practical use, are you?

I'm still kind of new to the subject, and it's the only practical thing I've seen so far. In the end, the only thing the code has to do is convert some text to C++ code and send that to G++.

Last edited on

helios (17607)

And in what way would it differ from Token::DIV, Token::POW or Token::null?

The first two define rules that require a / or a ^ to be the lookahead for the rule to be applicable. The last one makes the rule applicable or not regardless of what the lookahead is.

I'm still kind of new to the subject, and it's the only practical thing I've seen so far. In the end, the only thing the code has to do is convert some text to C++ code and send that to G++.

This is useful for parsing expressions because it's rather small, but for a full fledged language, Yacc/Bison are better and easier to use.

Last edited on

Kyon (912)

I've done a quick google search on Bison and Yacc, but I can't seem to find any tutorials that I actually understand.

Albatross (4553)

I recommend checking out from your library a book titled "Lex and Yacc", I forget by whom. That has some good topics for beginners and more advanced users alike.

EDIT: Did I seriously just type "checking out from book titled"?

-Albatross

Last edited on

helios (17607)

Or read this manual: http://www.gnu.org/software/bison/manual/bison.html
That's what I used when I didn't know the first thing about parsers.

kfmfe04 (788)

I am old school, too, so I used to reach for lex/yacc and then flex/bison, and of course, the Dragon books, but recently, it looks like Antlr is a pretty good tool for parsers.

http://www.antlr.org/

Even though it's Java based, it can emit human-readable C code and it has a great tool called ANTLRWorks (Java app so runs everywhere) which will help you debug/test your grammar, including viewing your grammar in a tree diagram and step through your parse, etc...

Last edited on

Kyon (912)

That tool seems nice too, I'll look into it with a bit more detail later on.

As of now, yacc and Antlr seem to be good options (better than hand-written parsers, anyways) to go with.

Duthomhas (13290)

/me jumps in after reading just the first page...

helios wrote:
Off-topic: Although I said it as a joke, I think that idea I mentioned the other day in a different about a program being able to modify the compiler's parsing algorithm from its own code is fairly interesting. I don't think I've ever seen it implemented, probably because it's a recipe for abuse, but it might be useful as a didactical tool for teaching compiler theory.

Did you know that Tcl does that? Google around the "unknown" command in Tcl.
:-)

helios (17607)

Actually, later I realized that's (IIRC) more or less what Lisp macros are.

Duthomhas (13290)

Oh yes, I forgot about those. (I've never used LISP, but I've spent a lot of time with Scheme.)

Kyon (912)

For this same project me and my project-partner are going to look into different programming languages (which will vary in paradigm and abstraction level). We are sure that we are going to look into the following languages:
C/C++
Common Lisp
Python
BASIC
ASM
Scala
The languages are supposed to vary in purpose, paradigm, functionality or the application of it or abstraction level. Any suggestions on what to add to the list?

helios (17607)

BASIC
ASM

I already see two things that don't need to be there.

Kyon (912)

Could you make a suggestion on what languages to compare, then? I really want to put some effort into this.

We have looked into different compiler/parser generators and (partially by recommendations by a university student I spoke a week ago) we chose ANTLR.

The definitive project will consist out of two parts:
Theoretical research, where we look at the definition of a programming language and paradigms and what the actual difference is between them. The factors we are going to compare include syntax, functionality, execution speed and compiler output.
Practical research, where we create our own programming language / compiler. We will use, as said before, ANTLR for the compiler and C++ / Qt for the IDE.

Kyon (912)

I've got a few problems with ANTLR, how do I change the output language to C++? It works for C, but with C++, Cpp or any other it complains about a non-existent file.
Next, what do I do with the outputted .c(pp) and .h files to do a test run?
Does GCC allow for compiling C files?
Does Code::Blocks allow C files rather than C++?

Kyon (912)

I hate my teacher ( well, not really my teacher at the moment, but he soon will be, since my current teacher is leaving :/ ) for still not having replied as to the approval of this project. I figured out the C-files problems, but I still don't know what to do with the .cpp and .h files.

I thought that I'd do something (for the translator) among:

Make a string
Run the lexer/parser and add every (translated) statement to the string
When no errors occurred
    Save the string to a file
    Compile the file using a compiler
Else
    Display an error
Delete the string

My project-partner and I will have to fiddle with the actual translation from eC to C++ (which will be hacky in some cases, but we have a very clear view on most statements at the moment).

The adding of every statement part is all embedded C or C++ code (depending on our choice of translator language) that will be added via ANTLR to keep things simple from our side.

EDIT:
Offtopic:
Where has helios been all these days? Haven't "seen" him 'round lately.

Last edited on

Pages: 1 234