Yeah, I know... I've made this mistake many times... The space is necessary because if you don't put it the compiler thinks it's the >> operator... -.-
Yeah, I know... I've made this mistake many times... The space is necessary because if you don't put it the compiler thinks it's the >> operator... -.-
This is known as the maximum munch principle, applied during lexical analysis. It basically states to match each token greedily (the most possible valid adjacent characters).
So, since > and >> are both valid tokens, the lexer will grab >> anytime it can over 2 separate > tokens. The whitespace differentiates this situation.
I just recently wrote a C++ lexer in Perl for my website matching valid tokens with regular expressions and I had to be sure that it would prefer the longer tokens over the shorter ones... It kind of makes sense if you think about it.
I thought about it a little more...it's an intriguing process.
The lexer's sole purpose is to tokenize the language constructs. It does not know which tokens would be valid in any given context. In order to make it smart enough to know better, the lexer and the parser would have to be merged. So that explains why it isn't just fixed to know better...it would make the already ridiculously complex parser even more difficult to maintain.