How to convert data in char array to hex

Forum

Forum
General C++ Programming
How to convert data in char array to hex

How to convert data in char array to hex?

Pages: 12

I have a null terminated char array (recvbuf) which contains 517 characters (charLength). I need this data in hex so this is what i've tried so far.

Solution 1:

for (size_t i = 0; i < charLenght; i++)
{
    
cout << hex << setw(2) << setfill('0') << (int) recvbuf[i] << ' ';
                
}

This outputs:

16 03 01 02 00 01 00 01 fffffffc 03 03 ffffff9e 3b 08 ffffffc2... And so on.

My understand of hex is limited, so because every field didn't become 2 chars wide I suspected that it might be wrong somewhere.

So I tried another solution (solution 2) which gave me the desired "look", but I dont understand the solution or even know if it's correct. I just read somewhere that this is how it's done.

Solution 2:

for (size_t i = 0; i < charLenght; i++)
{
    
cout << hex << setw(2) << setfill('0') << (int) static_cast <unsigned char>(recvbuf[i]) << ' ';
                
}

This outputs:

16 03 01 02 00 01 00 01 fc 03 03 7d 4e a0 cb... And so on.

I was hoping I could get some feedback and clarity on all of this.
What I wonder is why the solutions give different results and which is correct. If there is something else to think about as well, I will be very happy if someone brings it up.

Have a good one

V.

Last edited on

seeplus (6476)

char is signed. You're casting to int which is signed - so the sign bit is extended from the char if the sign bit of the char is set. Hence when displaying the int as hex you get the leading ffff which is the 2's complement signed representation as hex.

In the second version, you are first casting to unsigned char (no signed bit) and then cast to int (which now doesn't extend the sign bit as it's casting from unsigned). Hence you get the display required.

Note that you're mixing c style casts and c++ casts.

It's easier if recvbuf is of type unsigned char rather than char.

volang (292)

Thanks seeplus, but it didn't help me understand this much better. Maybe you can give me another example?

Here is another thing I wonder about:

This creates a field which is: 4294967292

cout << (unsigned int) recvbuf[i] << ' ';

22 3 1 2 0 1 0 1 4294967292

This creates a field which is: -4

cout << (int) recvbuf[i] << ' ';

22 3 1 2 0 1 0 1 -4

So how is it possible when these values are different, that the output is the same when I convert them to hex?

cout << hex << (unsigned int) recvbuf[i] << ' ';

16 3 1 2 0 1 0 1 fffffffc

cout << hex << (int) recvbuf[i] << ' ';

16 3 1 2 0 1 0 1 fffffffc

Should I cast to signed or unsigned int?

Last edited on

TheIdeasMan (6797)

Hi,

Read up on twos complement:

https://en.wikipedia.org/wiki/Two%27s_complement

As seeplus says, it's easier if the buffer is unsigned char from the start. What happens when you do that?

Good Luck !! :+)

Edit:

I am sure it says in the C++ standard that char values are bijectively mapped to the values 0 to 255. This means there is a 1 to 1 mapping of these values. So a cast from char to unsigned char should be fine.

Last edited on

volang (292)

As seeplus says, it's easier if the buffer is unsigned char from the start. What happens when you do that?

Thanks for your answer but I can't make the buffer unsigned from start, because the function parameter that it's used in expects char *

I'll check the link meanwhile :)

seeplus (6476)

The problem you're seeing is that you're using a signed char and then casting to a larger type. You're still getting signed bit extension. First cast to unsigned char and then cast to what you want.

The output representation of the same set of bits depends upon how those bits are interpreted - which is what the casts are saying. If the sign bit is extended and then displayed as unsigned then you'll get a very large positive number. If displayed as signed then you're get a negative number.

You need to read up on integer bit representations and the difference between signed and unsigned.

jlb (4973)

It would also be helpful to see the contents of the input character array without any manipulation.

jonnin (11348)

So how is it possible when these values are different, that the output is the same when I convert them to hex?

this is critical, if you do not understand it, you need to stop until you do.
if you look up a table of ranges (what numbers the variables of different types in c++ can be)
you will see something (now that you are looking for it!): the unsigned version's biggest value is about twice that of the signed value. eg for a simple byte, you have -127 to 127 or 0-255. The same bits exist for both (8 bits of 1s and 0s). But when you tell the compiler that it is signed, it knows that 128 is really a negative number, or that 253 is a negative number, etc. Because you told it to do that.
here you are seeing a combined effect of using different sized types and using signed and unsigned. if you promote a negative value to a larger, negative type, some of the high order bits get set in the process of making it negative, and that makes it a gigantic value in your case. They are the same number, but if you look at the ranges, the large value DOES NOT EXIST in the range for signed values for that type -- so it can only be one of the two, the large unsigned value, or the small negative value --- not both. You can force cast the bits from one to the other, to see both values, but at that moment in time, its signed or not, and has one of the 2 values at that moment.

there are a number of ways to fix this. There should be a way clear the high order bits with an & (add &255) for example, but oddly, I was unable to get the casting happy due to cout being cout.
or go old school and avoid the insane mess cout makes of simple tasks.

#include <cstdio>
using namespace std;
int main()
{
   char c = -4;
   printf("%X " , (unsigned char)c);  
}

Last edited on

volang (292)

Lets see if I understand this correctly now.

The char is 8 bit. After I cast it to (unsigned int) the returned value is > 255/127, which I can take as a indication that I should made the cast to (int) instead? (Because 4294967292 can't be stored in 8bit, but -4 can)

They are the same number

You mean the binary stored in memory?

Last edited on

jonnin (11348)

when you cast it to int it is no longer 8 bits, it is 32 (which I can see by the value of -4 which is about 2^32).
lets see if I can explain it.
first, the algorithm:
the algorithm for 2's complement, which is how negative numbers are stored, is binary not +1.
so, 1 negative, in 8 bits:
0000 0001 //1
not:
1111 1110
add 1:
1111 1111 //255 unsigned is -1 in signed!

now, lets add those.
1111 1111
0000 0001
1+1 is 0 carry the 1, .... all the way up, but it overflows:
the answer is 9 bits:
1 0000 0000
but the 1 is lost in the overflow, its ... zero! -1 +1 is zero! (ok, its not that exciting but this is why it is done the way that it is).

now closer to home.
-4.
4 is
0000 0100
not:
1111 1011
add 1:
1111 1100 (-4)
0000 0100 add clearly overflows to 0 again.
with me so far?
but now, one more step.
-4 cast to 32 bits:
1111 1111 1111 1111 1111 1111 1111 1100 //-4 in 32 bits
0000 0000 0000 0000 0000 0000 0000 0100 //4 in 32 bits see the overflow add again?
it must be this, or the overflow to zero and negation algorithm will not work in 32 bits.
but what is that value above in unsigned? well its 2^32 -1 - 3. (3 is 11 in binary).
(note the -1 ... eg 8 bits is 2^8 -1 max value, 256-1 -> 255 same for 16 and 32 and 64 bit ints)

when you force -4 to be unsigned bit pattern, that is why you get a large value. they actually go backwards, if you noted it above.. -1 is 2^bits-1 (255 for 1 byte, etc), -2 is 255 - 2, -3 is 255 -3 ... and so on all the way counting back down.

if you are going to fool with bitset, bits, bytes, hex or other low level number format / bitwise algorithms, you need to know this stuff. The need to do those things is greatly reduced in modern code, but you will see enough of it to make it worth your while to learn it.

all that to say: yes, the same binary in memory is multiple values. It could mean one value in signed numbers, another value as unsigned numbers, and yet another as a float or double, and yet another as a group of characters. Putting a type on it in code tells the machine what it means, enough for it to do the right things and give you meaningful results, at least. due to how integers are actually stored, it does not matter (its done the same way on the chip) -- only the print statements really 'know' that it is 'signed or unsigned', the math / cpu stuff just does it all alike and it works for many things (eg add, multiply, subtract, all that stuff). Other things pay closer attention but most of those work off doubles.

Last edited on

volang (292)

This helped alot, thanks.

But in your previous example:

1
2

char c = -4;
printf("%X " , (unsigned char) c);

I dont understand the real purpose of the cast, why is it necessary? What's wrong without it?

jonnin (11348)

try it and see :)
%X takes an integer arg, and at least in the online compiler, it promoted it to int again and we got FFFFFFC or whatever it was again. It may not be required on all compilers, not sure. You can see what yours does if you want to go that route... using C is frowned upon, but, for a handful of cases, its a lot easier to deal with. This is one of them.

i actually got so frustrated with this I rolled a lookup table for all 256 bytes to get what I wanted. From dropping the standard leading 0 in hex to upper/lower case of the hex digits to other nonsense, it was just a pain to get every compiler to do what I wanted exactly. That is an option too, but its kind of a nuclear one!

Last edited on

dutch (2548)

I dont understand the real purpose of the cast, why is it necessary?

If you convert a signed value to a larger type, whether that larger type is signed or unsigned, the sign bit it "extended", i.e., it is copied to all the extra bits in the larger type. If you don't want that to happen, then you need to ensure that the original value is unsigned before converting it to a larger type. In this case, converting char (whose signedness depends on your system) to unsigned char stops the sign extension from happening.

volang (292)

If you convert a signed value to a larger type

Thanks for your reply, but where did it get converted to a larger type? In printf?

jonnin (11348)

yea %X is for int, not char.

you can force X to be a byte (I clearly forgot this earlier, I don't use it much anymore)

#include <cstdio>
using namespace std;
int main()
{
   char c = -4;
   printf("%hhX " , c);  
}

and you *still* need to fool with it if you want leading 0s.
that looks like this gem
printf("%02hhX " , c); which is almost as incomprehensible as the original cout, but at least its smaller. I think that one will do everything you wanted without casting or any foolishness, but its worth a comment on what the parts do if you use it, even vet coders may not read that one easily. 02 gives the leading zeros, hh forces it to a byte, x is hex.
if you are doing c++ you can make a function to hide the ugly, even use sprintf instead to put it back into a c++ string if it matters, or find a way to do it in c++ that works consistently.

Last edited on

dutch (2548)

but where did it get converted to a larger type? In printf?

It's not converted by the printf function but instead by the compiler itself when it compiles the printf function call. It is adding some hidden casts.

This is due to the printf function being "variadic", i.e., it takes a variable number of arguments and has a signature something like

int printf(const char *format, ...);

For the variadic values, integral values are promoted to int for anything smaller than int (with the requisite sign extension if it's from a signed value).
long and long long are passed as is.
Floats are promoted to doubles. That's why you can print a double with just %f even though it should be %lf as would be needed in scanf. In printf, %f is the same as %lf because printf never sees a float!

helios (17511)

It's not converted by the printf function but instead by the compiler itself when it compiles the printf function call. It is adding some hidden casts.

The compiler might be doing that at the call site as a special case for printf(), but it's just as possible undefined behavior is being triggered that just happens to be equivalent to sign extension.
Never ever lie to printf() about what you're passing.

againtry (2313)

#include <iostream>
#include <sstream>
#include <iomanip>

int main()
{
    uint64_t word;

    const char* sample = "This sample 12! @#$9 stream of characters.\0";
    std::stringstream strm;
    strm << sample;

    uint64_t position{0};

    // STREAM SIZE
    strm.seekg(0, std::ios::end);
    uint64_t size_word;
    size_word = strm.tellg();
    std::cout << "File size: " << strm.tellg() << '\n';

    // HEADING
    std::cout
    << "----------------------\n"
    << "  POS  DEC  HEX  CHAR \n"
    << "----------------------\n";

    // READ STREAM AND DISPAY CONTENTS
    while(position < size_word)
    {
        strm.seekg(position, std::ios::beg);

        word = strm.get();

        std::cout
        << std::setw(5) << std::right << std::dec << position
        << std::setw(5) << std::right << std::dec << word
        << std::setw(5) << std::right << std::hex << word << "  ";

        std::cout << std::setw(4) << std::right << (char)(word) << '\n';
        position++;
    }
    return 0;
}


File size: 42
----------------------
  POS  DEC  HEX  CHAR 
----------------------
    0   84   54     T
    1  104   68     h
    2  105   69     i
    3  115   73     s
    4   32   20      
    5  115   73     s
    6   97   61     a
    7  109   6d     m
    8  112   70     p
    9  108   6c     l
   10  101   65     e
   11   32   20      
   12   49   31     1
   13   50   32     2
   14   33   21     !
   15   32   20      
   16   64   40     @
   17   35   23     #
   18   36   24     $
   19   57   39     9
   20   32   20      
   21  115   73     s
   22  116   74     t
   23  114   72     r
   24  101   65     e
   25   97   61     a
   26  109   6d     m
   27   32   20      
   28  111   6f     o
   29  102   66     f
   30   32   20      
   31   99   63     c
   32  104   68     h
   33   97   61     a
   34  114   72     r
   35   97   61     a
   36   99   63     c
   37  116   74     t
   38  101   65     e
   39  114   72     r
   40  115   73     s
   41   46   2e     .
Program ended with exit code: 0

seeplus (6476)

#include <iostream>
#include <iomanip>
#include <iterator>

int main()
{
	constexpr char sample[] {"This sample 12! @#$9 stream of characters."};

	std::cout
		<< "Size: " << std::size(sample) << '\n'
		<< "----------------------\n"
		<< "  POS  DEC  HEX  CHAR \n"
		<< "----------------------\n";

	for (size_t position {}; const unsigned char ch : sample)
		std::cout
			<< std::setw(5) << std::right << std::dec << position++
			<< std::setw(5) << std::right << std::dec << static_cast<unsigned>(ch)
			<< std::setw(5) << std::right << std::hex << static_cast<unsigned>(ch) << "  "
			<< std::setw(4) << std::right << ch << '\n';
}

Last edited on

againtry (2313)

lol

Pages: 12