Big Endian vs Little Endian Char Array

Hello,

I'm reading a file where some parts of the file are Little Endian and others are Big Endian. I am having no problems reading Big Endian; however, I am having problems reading Little Endian.

I am reading my file into a character array.

It looks something likes this:

(Just assume the file is open and readable too.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <fstream>
#include <iostream>

int main (void)
{
/* open the file */
std::ifstream file("filename", std::ios::binary);

/* buffer to store file contents */
unsigned char buffer[4];

/* read the file into the buffer */
file.read(buffer, 4);

/* output the buffer to the console */
std::cout << buffer << std::endl;

return 0;
}


Since the data is in Little Endian, I expect that this wouldn't show the correct value. However, how would I convert that to Big Endian? When I convert it to integer and then into Little Endian it seems to return zero.

Here is the function that I used to convert the integer. I found it on Stack Overflow:

1
2
3
4
5
6
7
void endian_swap(unsigned int& x)
{
    x = (x>>24) | 
        ((x<<8) & 0x00FF0000) |
        ((x>>8) & 0x0000FF00) |
        (x<<24);
}


I also tried to do something like this:

1
2
3
4
5
6
char swap[4];

swap[0] = buffer[3];
swap[1] = buffer[2];
swap[2] = buffer[1];
swap[3] = buffer[0];

Last edited on
How was this file written and how do you determine which is which?
"endian swap" is a bit of a confusing way to think of it, IMO. Especially since whether or not you need to swap depends on the endianness of the system that your program is running on... making this more difficult to port.

My advice:

If the data is little endian, read it as little endian.
If it's big endian... read it as big endian.

Don't bother with swapping, just read it the right way from the get-go.

For 32-bit signed integers, you can do this (untested, but I'm pretty sure it'll work):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// note:  include <cstdint> for fixed-sized types like int32_t, uint8_t, etc.

// read a 32-bit little-endian signed integer
int32_t read32_le(std::istream& stream)
{
    uint8_t b[4];
    stream.read(b,4);
    
    return static_cast<int32_t>(
        (b[0])      |
        (b[1] << 8) |
        (b[2] << 16)|
        (b[3] << 24) );
}

// read a 32-bit big-endian signed integer
int32_t read32_be(std::istream& stream)
{
    uint8_t b[4];
    stream.read(b,4);
    
    return static_cast<int32_t>(
        (b[3])      |
        (b[2] << 8) |
        (b[1] << 16)|
        (b[0] << 24) );
}
Last edited on
It's a WAV file and I'm reading the header from it. I believe that the diagram that I am using to look at the pieces of the file are correct. All of the other information has been correct.

It's on this page:
https://ccrma.stanford.edu/courses/422/projects/WaveFormat/

Technically the part that I am trying to read is after the first four bytes. So it would be:

1
2
3
4
5
/* seek to the chunk */
file.seekg(4, std::ios::cur);

/* read the file into the buffer */
file.read(buffer, 4);


Edit:

@Disch,

That makes a lot more sense. I tested your functions within my program. It worked very well! I only had to change two lines (very slightly).

I just changed:
stream.read(b,4);

to:
stream.read((char*)b, 4);

Thank you so much!
Last edited on
The default byte ordering assumed for WAVE data files is little-endian. Files written using the big-endian byte ordering scheme have the identifier RIFX instead of RIFF.


It is not mixed, it either one or the other.
@Admkrk

I believe that only applies to the data section inside the WAV file. I'm using a file exported by VLC player. (It was originally an MP3, but I don't think that matters.)

It is possible that the WAV file that I am using isn't formatted correctly. The header is formatted as RIFF, but it has a mixture of both Little Endian and Big Endian.

When I used Dischs functions, they retrieved the right size.

However, I think that it is more likely that they were only talking about the "data" portion of the wave file. I haven't analyzed this as I'm reading that almost directly into OpenAL.
Last edited on
It could be a corrupted wave file or just the header. Its been a long time since I dug into wave files (before I started programming), but there is no way it should be mixed. Some file types are one or the other and some can be either. bmp is little endian and jpeg is big endian for example, while wave can be both.

If I remember right, the header describes the data part so if they don't match, something is wrong.
Ah, I did some reading and found out that character arrays (8-bit) aren't changed depending on what endianness they are stored in. This is because they are only stored in 1 byte.

I believe that you are correct, in that the file can only be of one endianness. I think that since the other data that I was reading was character data ("RIFF", "wave", "fmt ") it didn't matter.

It only came down to where the integer was stored that I wasn't getting the right values.
Last edited on
Topic archived. No new replies allowed.