Using regex with a uint8 array

Hi,

I have a uint8 array which contains the output from a serial port. I'd like to parse the array to look for particular messages - which I thought I'd do using a regex.

I've written a short example which appears to work when I define the array as a character array to begin with (note - I'm using google test to check the output).

1
2
3
4
5
6
7
8
9
10
11
12
std::regex rx ("x");

const char data_char[5] =  { 'H', 'e', 'l', 'l', 'o' };
std::string str(data_char);

int test1 = 4;
if (std::regex_match(str.begin(), str.end(), rx)) 
{
test1 =5;
}

EXPECT_EQ(test1, 4);


This appears to pass or fail as I expect (either changing the 4 to a 5 in EXPECT_EQ or changing the rx regex to "Hello". However if I try doing a similar approach with a uint8 array I getting unusual results. The regex doesn't appear to work.

1
2
3
4
5
6
7
8
9
10
11
12
13
std::regex rx ("x");

uint8_t data_uint8[5] = {0x48, 0x65, 0x6C, 0x6C, 0x6F};
const char* data_ptr = reinterpret_cast<char*>(*data_uint8);
std::string str2(data_ptr,5);

int test2 = 4;
if (std::regex_match(str2.begin(), str2.end(), rx)) 
{
test2 =5;
}

EXPECT_EQ(test2, 5);


I was wondering if anyone can explain what I'm doing wrong? Also if there was a more efficient way of applying the regex to the raw data without having to cast or convert to a string?
Last edited on
Thanks @George P. That seems to have fixed the problem - a slight issue with my casting. The following works:

1
2
3
4
5
6
7
uint8_t *raw_ptr = data_uint8;

char *data_ptr = reinterpret_cast<char*>(raw_ptr);

Though I'd like to understand why the below doesn't work:

char *data_ptr = reinterpret_cast<char*>(*data_uint8);
Last edited on
I think you're getting confused with the use of *.

raw_ptr is a pointer to type uint8_t - hence reinterpret_cast is casting a type of pointer to uint8_t to type pointer to char. OK.

*data_uint8 doesn't do what you're thinking it does. I takes data_uint8 as a pointer and de-references this. This value is then cast to type pointer to char. Uh oh!!

What you're after is:

 
char *data_ptr = reinterpret_cast<char*>(data_uint8);


Note no * before data_uint8

Consider:

1
2
3
4
5
6
7
8
9
10
#include <cstdio>
#include <iostream>

int main() {

	uint8_t data_uint8[5] = {0x48, 0x65, 0x6C, 0x6C, 0x6F};
	const char* data_ptr = reinterpret_cast<char*>(data_uint8);

	std::printf("%p %p\n", data_uint8, data_ptr);
}


which on my system displays:


000000000019FA00 000000000019FA00


So both data_uint8 and data_ptr point to the same data.



Last edited on
which on my system displays

64-bit addressing for the pointers, right? :)

32-bit isn't so many extra zeros.
Yep - compiled as 64 bit.
Thanks both!
I forgot to mention this earlier....
PLEASE learn to use code tags, they make reading and commenting on source code MUCH easier.

http://www.cplusplus.com/articles/jEywvCM9/
http://www.cplusplus.com/articles/z13hAqkS/

HINT: you can edit your post and add code tags.

Some formatting & indentation would not hurt either

Topic archived. No new replies allowed.