Hey everyone, I'm trying to make a simple lexer that splits up a given string into tokens. As you can see below, I have a Token class, and a Lexer class.
I want the Lexer class to interact with the Token class and create new objects. But when I try to return the object and use the returned object to create a new one in main, I get a weird error. If someone could help guide me to the right direction, that would be great. Thanks!
In lexer.cpp on line 54 you create a variable named token that goes out of scope (and is destroyed) on line 55 ( at the next } ).
Either create the variable outside the loop and assign to it inside the loop, or return the object directly from inside the loop. Think about what to do if the keyword is not found. Do you return some kind of "empty" token, or throw an exception, or something else?
How would I directly create an object outside the loop and assign it inside the loop if you don't mind me asking?
1 2 3 4 5 6 7 8 9 10
Token token; // assuming there is a default constructor
if(...) {
for(...) {
if(...) {
token = Token(¤tCharacter, TokenType[i].c_str());
break; // no need to continue searching since we have already found what we're looking for
}
}
}
return token;
But in this case I suspect it's better to simply return the object from inside the loop. Then there would be no need for a default constructor.
But this begs the question what to do if currentCharacter is not equal to the newline character or if the loop finishes without returning? You probably should do something because reaching the end of a non-void function leads to UB.
Thanks once again for the answer, I really appreciate it.
I'm building a simple BASIC to C compiler as a exercise to understand C/C++ a bit more.
What you see above is extremely early in development.
There'll be way more statements to check if the current character equals something else, if it won't equal to anything, it'll return a error and stop the program, other wise if the current character reaches the end of the source, it'll return a '\0' and stop the program.
1) token.h L11,12 This will copy the pointer only - not the contents pointed to. This means that the passed value and type must exist and be valid for the lifetime of the token object. Is this what you mean - or do you really intend to copy the contents?
2) lexer.h . As there's only one TokenType used by all instances of Lexer class, this can be made static const.
3) When parsing, it is common to use an integer value for token type - rather than a string literal. So for each of token parsed, you assign a numeric token type. eg one value for each of the keywords, one for a string literal, one for a numeric. one for each special symbol (, ; [ etc), one for eol, one for unknown etc etc etc. These are often set as an enum. You have something like (example only):
So .getToken() returns a type Tokens and a program (eg the Basic program) is then parsed into a sequence of Tokens. When parsing you have getToken(), peekToken() etc