SOCK_RAW or SOCK_STREAM when sending embedded \0x00?

Until now, I, I used a SOCK_STREAM for reading and writing to a serverscocket.
I used this code for creating the socket:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
BasexSocket& BasexSocket::CreateSocket (std::string host, std::string port) {
	// std::cout << __PRETTY_FUNCTION__ << std::endl;
	if (host.empty() || port.empty()) {
		Master_sfd = -1; return *this;
	}

	struct addrinfo hints;
	struct addrinfo *result = NULL, *rp;
	// Initialize hints
	memset(&hints, 0, sizeof(struct addrinfo));
	// hints.ai_family   = AF_UNSPEC; 
	hints.ai_family   = AF_INET;   AF_INET and AF_INET6
	// hints.ai_socktype = RAW;
	hints.ai_socktype = SOCK_STREAM; 
	hints.ai_flags    = AI_NUMERICSERV;                      // Port must be specified as number
	int rc;
	rc = getaddrinfo( host.c_str(), port.c_str(), &hints, &result);
	if (rc != 0) perror(gai_strerror(rc));

	for (rp = result; rp != NULL; rp = rp->ai_next) {
		Master_sfd = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
		if (Master_sfd == -1) continue;
		if (connect(Master_sfd, rp->ai_addr, rp->ai_addrlen) != -1) break; // Try to connect. Return the first successfull connect or abort
		close(Master_sfd);
	}
	set_nonblock_flag( Master_sfd, 1);
	if (rp == NULL) {
		warnx("Can not connect to Basex server"); }

	freeaddrinfo(result);
	return *this;
}

This is my write function:
1
2
3
4
5
6
7
8
9
10
11
12
int BasexSocket::writeData(const std::string & input) {
/*	std::cout << __PRETTY_FUNCTION__ << std::endl;*/
	int send_len = input.size();
	int bytes_sent = send(Master_sfd, input.c_str(), send_len, 0);
	if (bytes_sent != send_len) {
		debug_dump(input);
		perror("Error writing to socket");
		warnx("Writing data failed");
		return -1;
	}
	return bytes_sent;
};

When input does not contain embedded \0x00's there are no problems.
With embbedded zero's, the following output is produced:

Error writing to socket: Bad file descriptor
5 bytes:  [ 49 6e 66 6f 00 ]
libBasexTest: Writing data failed

In R, I had to use a RAW socket so I tried to change the socktype from STREAM into RAW (lines 13-14). But now even reading from the socket produces an error.

Question:
Is it possible to read/write embedded zero's from stream sockets?
What are the implecations from change the socket type to SOCK_RAW?

Beej's Guide to socket programming provides no information on using RAW sockets. Does anybody know usefull resources?
Last edited on
I am not familiar with these tools and not even 100% sure I understand your question. But on the off chance that you did not know, or that it helps:

the zero character is often used for 'end of string' for text processing.
even if the data IS text or mostly, if it has zeros in it, you want to somehow set up your communications for 'binary' data transfer. That will allow you to send not only zeros but other unprintable codes to the target. The target has to be able to accept data this way, though. If it assumes text and dumps it to the screen, its going to choke.

It could be that raw is what you need, that it means binary for your library. I am not sure about that part. If so, then some other configuration is not quite right, and I would seek a working example of a binary transfer that you can compile as-is, get that to demonstrate that it works, then see how they did it.
Last edited on
The return value from send() is the number of bytes sent or SOCKET_ERROR (-1). The number of bytes sent can be less than that specified. This is not an error and should be handled correctly. An error has only occurred if the return value from send() is SOCKET_ERROR.

With every call to a MS function (eg send() ), it's important that the return value is handled properly. No assumption should be made.

In writeData(), you first need to check that the return value is SOCKET_ERROR and if it is then display the error message and return -1. If it's not SOCKET_ERROR and the return value is less than the specified bytes to send then you need to send the remaining byte(s) again until the return value is either SOCKET_ERROR or the same number as the specified bytes to send.
Maybe something like this (NOT tried):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int BasexSocket::writeData(const std::string& input) {
	int send_len = input.size();
	const char* cinput = input.c_str();

	do
		if (int bytes_sent = send(Master_sfd, cinput, send_len, 0); bytes_sent == SOCKET_ERROR) {
			debug_dump(input);
			perror("Error writing to socket");
			warnx("Writing data failed");
			return -1;
		} else {
			send_len -= bytes_sent;
			cinput += bytes_sent;
		}
	while (send_len);

	return input.size();
}

Last edited on
@seeplus,
Ok,
I see the logic behind your suggestion (I'll try to implement it one off these days. And probably this will also mean that I'll also have to adapt my readfunction in a similar way).

I saw that when using a SOCK_STREAM socket, introducing an embedded zero caused a SOCKET_ERROR (line 7 perror("Error writing to socket"); resulted in "Error writing to socket: Bad file descriptor")

Changing to a SOCK_RAW introduced another error (bad socket file descriptor) so I guess that I have to adapt the way in which I create a RAW socket. I have already done a lot off Googling but untill now I have not found much information on how to set up or use RAW sockets. (Maybe that I am not using correct search terms).

@jonnin,
I know that the communication between client and server will contain a lot of embedded zero's, \FF, \0d and lots of other non-printable data. This is also influenced by the OS that is used (Windows and Linux handle CRLF differently).
In R I have already figured out what logic has to be used to deal with problems. The main problem I have with C++ is that I have to learn a new syntax and that I have to find my way through all the new library's.
When input does not contain embedded \0x00's there are no problems.
SOCK_STREAMs are stream protocols. That means, the data is a contiguous stream of bytes with no delimiters. The sender and receiver must agree to what's being exchanged.

In this case, attempting to send "" sends nothing. It's the same as not calling send. The receiver must be able to handle that case.

One fix in the protocol would be to have the sender send the terminating null, and the receiver chop it from the string. However, that stops you from sending strings with \00. Also, the receive function must do a linear search on the bytes read to find the end of the string. Remember that later sends from one side can turn up in the recv, (because of that contiguous stream thing I mentioned), so you need to check what you've read if you're passing untagged data like that.

So a better fix would be to send the length as a 4byte unsigned, then the data. The receiver just needs to check the length of the received block to be sure it's got everything, rather than having to peek into the message.

Is it possible to read/write embedded zero's from stream sockets?
yes

What are the implications from change the socket type to SOCK_RAW?
In R, I had to use a RAW socket so I tried to change the socktype from STREAM into RAW (lines 13-14). But now even reading from the socket produces an error.
Raw sockets are for implementing your own protocol in user space, or reimplementing another protocol.

In the context of what you've posted, you definitely do not want to use raw sockets.
Last edited on
One fix in the protocol would be to have the sender send the terminating null, and the receiver chop it from the string. However, that stops you from sending strings with \00.

The Basex client/server protocol (https://docs.basex.org/wiki/Server_Protocol) is already used for many years and they probably won't change it because someone who is just programming for fun runs into problems ;-). And as I have learned from R, once you know how to handle those \0x00 and \0xFF, it is quite easy to implement the protocol.

Is it possible to read/write embedded zero's from stream sockets?
yes

How?

I already saw one rather stupid error in my current approach. I am sending std::string's to the socket. A C-string is always \0-terminated so by definition I will not be able to send embedded zero's.
I will have to use something else.
Is it correct that using vector <unsigned char> instead of std::string as the basic object that is transferred, will solve my problems?

Ben
(I may be using the wrong search terms, but I have been searching a lot for code examples for sending / receaving binary contant to or from a stream. I haven't found any examples which explicitly learn how to do this. However, I have found a lot of examples of people asking how to send embedded zero's where most answers only say that this is possible without saying how. Or in the answer it is demonstrated how you can print binary data.
I learn a lot from reading those answers but sometimes it is quite frustrating that they dont't prvide an answer to the original question ;-(.
You can embed \0 in a std::string. This is allowed. .data() (or .c_str() ) returns a pointer to the beginning of the underlying array serving as character storage. .size() returns the number of chars. There will always be a \0 following the last char stored. .data() + .size() will point to this terminating \0. If you require a pointer and number_of_chars for a function (eg send() ), then .data()/.c_str() and .size() is what is required. \0's in the actual std::string data will be processed.
\FF is prefixed to all the embedded \00 or \FF in the command that is send to the server or the return value that is returned by the server.

According to the protocol this is an example of a complete command that can be processed by the server:
"abc\0Def\FF\0Ghi\00\00".
After receiving \00\00, the server will handle this as a 2-part command:
- "abc", The first \00 terminates the first part of the command.
- "Def\00Ghi". In the second part the \00 is prefixed by a \FF. This means that the \00 should be treated as an embedded zero.

A possible return is "Z\00Y\FF\00X\00W\00\00" or "A\00\01". Basex is a database. You can never tell how many tupples are returned. But the protocol shows that every tupple ends with \00.
When the return ends with \00\00, the client interprets this as succes and will handle the return as
Z, Y\00, X, W
Ending with \00\01 means that the command could not be executed so "A" is the error message.

My first attempt in R to use this protocol was a disaster. Handling all those embedded zero's in the separated command was very error-prone and made the client very slow.
Once I found out that in R it is very easy to prefix all the embedded zero's or \FF's in the command with an '\FF'. Removing the \FF that are inserted in the return is a little bit trickier but neither is very difficult.

It is also possible to store binary data in the BaseX database. \00 and \FF that might exist in that data are treated in the same way.

This is why I am trying to present the command as a binary array to the server. (And I know that it was not very smart of me to use the std::string for this...)
There's nothing in this to indicate that std::string isn't appropriate. writeData() and readData() don't need to know anything about the protocol or \0 or \ff or anything else. They should just send the specified std::string/receive data to the specified string. The handling of the protocol really should be performed by another set of functions which IMO might say for makeDataProt() has an argument of std::vector<std::string> and produces a std::string in the required format. eg

 
std::string makeDataProt(const std::vector<std::string>);


and for obtain protocol data something like:

 
std::vector<std::string> obtainDataProt(const atd::string&);


Then you'd use these to format to/from std::string to use with writeData() and readData().

So for obtainDataProt() with "abc\0Def\FF\0Ghi\00\00" (obtained from readData() ) the returned std::vector would be:

element [0] - abc
element [1] - Def\00Ghi

readData() would return only when a \00\00 or \00\01 sequence had been received without appending these to the returned std::string.
Last edited on
For sending, consider something like (NOT tried):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
ptrdiff_t writeData(const std::string& input) {
	auto send_len { std::ssize(input) };
	const char* cinput { input.c_str() };

	do
		if (const auto bytes_sent { send(Master_sfd, cinput, static_cast<int>(send_len), 0) }; bytes_sent == SOCKET_ERROR) {
			debug_dump(input);
			perror("Error writing to socket");
			warnx("Writing data failed");
			return -1;
		} else {
			send_len -= bytes_sent;
			cinput += bytes_sent;
		}
	while (send_len);

	return std::size(input);
}

std::string convTo(std::string str) {
	for (size_t pos {}; (pos = str.find('\0', pos)) != std::string::npos; pos += 2)
		str.insert(pos, 1, static_cast<char>(0xff));

	return str;
}

std::string makeDataProt(const std::string& str) {
	auto str1 { convTo(str) };

	str1.append(2, '\0');
	return str1;
}

std::string makeDataProt(const std::vector<std::string>& text) {
	std::string str;

	for (const auto& s : text)
		str += convTo(s) + '\0';

	return str + '\0';
}


where the std::string returned from makeDataProt() would be the param for writeData().

For a test:

1
2
3
4
5
6
7
8
9
10
11
12
13
int main() {
	std::vector<std::string> text { "qwery", "foobar" };

	text[1].insert(3, 1, '\0');

	const auto str1 {makeDataProt(text) };

	for (const unsigned char c : str1)
		if (std::isprint(c))
			std::cout << char(c);
		else
			std::cout << '\\' << std::hex << std::setw(2) << std::setfill('0') << unsigned(c);
}


displays:


qwery\00foo\ff\00bar\00\00

Last edited on
I've explained the networking stuff.

Using base-x protocol, the strings are encoded. Here are an encode and decode function that convert a string to a bytestream and back. You should use them directly in your send/recv code to simplify it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <stdexcept>
#include <string>

#include <assert.h>

// 0x00 is encoded as 0xff 0x00, and 0xff is encoded as 0xff 0xff
// there's always a terminating 0x00

std::string decode(const std::string& encoded, std::size_t* end = 0) {
	if (encoded.empty())
		throw std::runtime_error("an encoded string is never empty");

	std::string decoded;

	for (std::size_t i = 0; i < encoded.size() - 1; ++i) {
		char ch = encoded[i];
		char next = encoded[i + 1];

		switch (ch) {
			case '\xff':
				if (next == '\x00' || next == '\xff') {
					decoded.push_back(next);
					++i;
				} else
					throw std::runtime_error("expecting \00 or \ff after \ff");
				break;
			case '\x00':
				// end of string
				if (end)
					*end = i; // note where we stopped
				return decoded; // we're done
			default:
				decoded.push_back(ch);
		}
	}

	// test here becaue we may terminate early
	if (encoded.back() != '\x00')
		throw std::runtime_error("expecting \00 at end of string");

	if (end)
		*end = encoded.size() - 1;
	return decoded;
}

std::string encode(const std::string& str) {
	std::string encoded;

	for (auto&& ch : str) {
		if (ch == '\xff') {
			encoded.push_back('\xff');
			encoded.push_back('\xff');
		} else if (ch == '\x00') {
			encoded.push_back('\xff');
			encoded.push_back('\x00');
		} else {
			encoded.push_back(ch);
		}
	}
	encoded.push_back('\x00');

	return encoded;
}

int main() {
	assert(std::string("hello", 5) == decode({"hello\x00", 6}));
	assert(std::string("hello\x00", 6) == decode({"hello\xff\x00\x00", 8}));
	assert(std::string("hello\xff", 6) == decode({"hello\xff\xff\x00", 8}));

	assert(encode({"hello", 5}) == std::string({"hello\x00", 6}));
	assert(encode({"hello\x00", 6}) == std::string({"hello\xff\x00\x00", 8}));
	assert(encode({"hello\xff", 6}) == std::string({"hello\xff\xff\x00", 8}));
}
Last edited on
@kbw,

I checked my code for RbaseX. I forgot to code the exception for the case that there is a \ff that is not followed by a \ff or a \00. I'll have to add a test in my package for that case.

In R I implemented the encode function in the following way:
1
2
3
4
5
6
7
8
9
10
11
12
add_FF <- function(cache_in) {
  FF <- which(255 == cache_in)
  Z  <- which(0 == cache_in)
  addFF <- c(FF, Z)
  if (length(addFF) > 0) {
    val <- c(cache_in, rep(as.raw(255), length(addFF)))
    id  <- c(seq_along(cache_in), addFF-0.5)
    val <- val[order(id)]
    return(val)
  } else
    return(cache_in)
}


cache_in <- "He\FFll\00o"
Line 2 returns an array with indices of occurence of \FF {3}. Line 3 returns the same for \00 {6).
addFF = {3, 6}
val = {H, e, FF, l, l, 00, 0, FF, FF}
id = {1, 2, 3, 4, 5, 6, 7, 2.5, 5.5}
Order val based on value of id
val = {H e FF FF l l FF 00 o)

To my knowledge in R this is the most efficient way to insert values into an array.

I'm just curious to know if C++ has a similar function for inserting values into an array?

PS,

What's the use of the check in line 29? At that point 'end' has value 0 which is interprated as false so the next line will never be executed?
Last edited on
In c++ to insert char(s) into a std::string there is .insert():
https://en.cppreference.com/w/cpp/string/basic_string/insert

There is also replace:
https://en.cppreference.com/w/cpp/string/basic_string/replace

See details of all the member functions available:
https://en.cppreference.com/w/cpp/string/basic_string
To my knowledge, all these functions are related to a (1) specific index that is known. My function scans a string for the occurence of a searchitem and inserts a char bedore all occurences.

Ben
(The sites of cppreference and cplusplus are visited daily. They provide a lot information for learning C++)
cppreference isn't a great resource for learning C++ from scratch, it is after all a reference.

Arguably a better online resource for learning is Learn C++, and it is free as well.

https://www.learncpp.com/

Networking isn't a beginners C++ topic, though, so it isn't addressed at Learn C++.
> I'm just curious to know if C++ has a similar function for inserting values
>> scan a string for the occurrence of a search item and insert a char before all occurrences.

This is one way:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <string>
#include <regex>

// prefix '\xff' to all null or '\xff' characters in str
std::string encode( const std::string& str )
{
    static const std::regex null_or_ff( "[\\x00,\\xff]" ) ;
    // https://en.cppreference.com/w/cpp/regex/regex_replace
    // $& - the entire matched string (ie. the matched null or '\xff' character)
    return std::regex_replace( str, null_or_ff, "\xff$&" ) ; 
}

// remove prefix '\xff' characters
std::string decode( const std::string& str )
{
    static const std::regex ff_prefixed_char( "\\xff(.)" ) ;
    // $1 - the substring matched by capture 1 (ie. the character following \xff)
    return std::regex_replace( str, ff_prefixed_char, "$1" ) ; 
}

http://coliru.stacked-crooked.com/a/c3ea396b92af92e3
Last edited on
all these functions are related to a (1) specific index that is known


Yes. These are basically 'primitive' functions upon which others can be built. If you don't know the index then .find()/.find_first_of() etc can be used.

The Algorithm functions can also be used with std::strings - although they can't change the size of the std::string.
https://en.cppreference.com/w/cpp/algorithm

scans a string for the occurence of a searchitem and inserts a char bedore all occurences.


For a non-regex version, see my convTo() function above. It will insert xff before each occurrence of x00 - although it's easy to change the function to have these chars specified in the function definition.

1
2
3
4
5
6
std::string insBefore(std::string str, char toFind = 0, char toIns = static_cast<char>(0xff)) {
	for (size_t pos {}; (pos = str.find(toFind, pos)) != std::string::npos; pos += 2)
		str.insert(pos, 1, toIns);

	return str;
}


Last edited on
Topic archived. No new replies allowed.