Regular expressions

I was going to make a basic text editor, with syntax highlighting (sorry, ed - you still hold a place in my heart). I have a function to create a text file using getchar() and a loop, searching for an ESC press (0x1B and it works on Ubuntu :)) and I'm working on a function to edit files, too, however I know barely anything about regular expressions. I did read a couple of tutorials but they seemed not to make much sense.

How could I approach syntax highlighting? Would regular expressions be the way forward? I don't really care about auto indent, but I'd like to be able to highlight brackets and things like that.

Oh and probably I should add "Sorry for hogging the entire beginners page"; I've been trying to help other people, too...
Last edited on
Well, I'm not sure if regexes can count brackets, but they sure help to find identifiers. A valid standard identifier has to match the regular expression [A-Za-z_][A-Za-z_0-9]*

The way I'd match brackets is this:
If the user is not standing on a bracket do nothing.
Store the current bracket and its opposite.
Increment the counter by one.
If the current bracket is opening, count forward, otherwise count backwards.
When counting forward:
Until the counter is zero or a number of characters have been read or we're at the EOF, move to the next character, if the character is the opening bracket, increment the counter, if it's the same as the closing bracket, decrement the counter, otherwise do nothing.
When counting backwards:
Until the counter is zero or a number of characters have been read or we're at the BOF, move to the previous character, if the character is the closing bracket, increment the counter, if it's the same as the opening bracket, decrement the counter, otherwise do nothing.

Notice that the same algorithm can be used for (), {}, and [] just by using an array.
Last edited on
Ok, thanks, I'll try that.
Well, have of the functionality of syntax highlighters is to handle malformed input.

That aside, helios your algorithm does not handle cases where string literals
contain matching characters.

You can download NEdit. It implements syntax highlighting using regular expressions.
Although being in theory the most flexible, there are some limitations to this approach.
For example, you can't really highlight function names differently than variables, part
because of the nature of C++, and part because knowing what are functions and
what are variables requires context, which regular expressions don't provide.
Sigh... I always forget about them.
I'm calling mine cedit.

Given that the second letter of what the c stands for is 'r', can you guess what it means?
Topic archived. No new replies allowed.