escape sequences and C++11 regular expressions

Hello forum,

I am extracting a line with the getline function and it contains the following

 
  \r # Aloha \n  


From the above pool of characters I only need to match if the sample above contain the sub string of pattern "# space alpha-numeric character"

Will regular expression do the job ?


Thanks
Yes, sure. That should work: # \w+ if you really will feed it line-by-line, or else you will have to have more complex expression
https://regex101.com/r/pC9qV7/1
Last edited on
Hi

I am trying the following regular expression and I am getting the exception:

1
2
terminate called after throwing an instance of 'std::regex_error'
  what():  regex_error


The regular expression is :

 
  std::regex reg("[[:cntrl:]]*#[[:space:]][[:alnum:]]+([[:space:]]|[[:alnum:]])*[[:cntrl:]]*");



Any hint to debug this issue?


Thanks
WHich compiler do you use?
If you use MinGW, make sure that GCC version is at least 4.9.0
I am running on linux with gcc 4.8.4

Is there any specific requirement for regular expression to work ? I have made sure that the compiler that I am using is supporting the C++11 standard.

I have set the compiler flag as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CC = g++
CFLAGS = -O2 -g -std=c++11

INCLUDE =

LDFLAGS =
COBJS = $(patsubst %.cpp,%.o,$(wildcard *.cpp))

EXE= regularexpression

all: $(COBJS) 
	$(CC) $(CFLAGS) -o $(EXE) $(COBJS) $(LDFLAGS)

%.o : %.cpp 
	$(CC) $(CFLAGS) -o $@ -c $< $(INCLUDE) 


clean:
	rm -f $(EXE) *.o *~


Is the expression correct ?

I wanted track the following scenarios:

may start with zero or escape character + must a '#' character + must a space + must alnum + may space or alnum + may zero or escape character

Thanks
Is there any specific requirement for regular expression to work ?
To have GCC 4.9.0 or newer. Regular expression were not implemented properly until that version.
Thanks for the suggestion!

I updated the compiler and regular expression is working fine. I have defined the following expression :

1
2
3

std::regex reg("^[[:cntrl:]]*#[[:blank:]][[:alpha:]]+([[:blank:]]+[[:alpha:]]+[[:blank:]]*)*[[:cntrl:]]*$");


to match the following patterns

1
2
3
4
5
# printer
# aloha nevada
# huston we have a problem
# fdf  aflfjlf llljfslfls l   


Do you think that the expression I have written is over-complicated to accomplish this trivial stuff ?

Thanks
So patterns
# Hello !
  # stuff

Should not be matched?
The first pattern is not matched. I checked the second one though. Please try to break the following expression:

1
2
3

  std::regex reg("^([[:cntrl:]])*([[:blank:]])*#([[:blank:]])+[[:alpha:]]+([[:blank:]]*[[:alpha:]]+[[:blank:]]*)*([[:cntrl:]])*([[:blank:]])*$");


Thanks
It ws the question. I stillnot get exact rules by which you need to mat patterns. A larger sample of what should and what should not be matched would be good.

By the way, do you really need to match control characters? There should be none in normal text.
This is simplified regex, it should work like yours:
^[[:cntrl:]]*\s*#\s+[[:alpha:]]+[\w\s]*$
I am reading from a file and getline function captures the escape characters that may sit at the beginning or end of the line.

I am still confused between the usage () and []. I am not sure when to use which one .
captures the escape characters that may sit at the beginning or end of the line.
Which characters? Technically there should not be any in text file. I just want examples of such lines.


() is a capture group. Used if you want to extract something from string and not simply check if it follows a pattern
[] is a class. Inside it you would group all characters wich should be alternatively matched
https://regex101.com/ Check the quick reference in lower-right corner
I have a source that behaves differently in different os while reading files. I have a file in the following format:

1
2
3
4
5
6
# Size x
124.0
# Size y
124.0
# Size z
125.0


When I read the lines using getline function, I found that \n \r characters are read along. They are taken care of at visual studio, but not in Linux. I really do not know why . For example, while reading the first line using getline() function , I get "# Size x\r" inside the string. I found it while debugging.

This is one of the reasons I have included the control characters in the regular expression.


Any better way to address this issue ?
I really do not know why
WIndows uses \r\n to denote end of line, Unix uses \n. Old Macs uses \r. In your case you might want to just check if last character in line is \r after getline and remove it if so. No other changes needed.

This regex is enough to match your line with or without /r at the end: $\s*#\s+[a-zA-Z] . (line start, any amount of whitespace characters, #, at least one whitespace character, one alphabet characters, unspecified) If there another constraints on line, you might introduce them in regex.
you started with a $ sign.

Matching the beginning or end of the string ?
My mistake, should be ^ here.
Hello

I checked your expression

 
^\s*#\s+[a-zA-Z] 


And it did not match the pattern I want to match at https://regex101.com/
There should be no space at the end of regex.
Topic archived. No new replies allowed.