Pattern Searches in Binary File

I have loaded a binary file into memory (using ios::binary). I need to search for 10101 (or 0x15), which I did by searching each value returned by file.get(). What I have just realised is that 10101 may be part in the first 8 bits of file.get and part in the next 8 bits of file.get, ofcourse I will miss this by my search method. Does anyone know of a better search method?

Any help is appreciated.
Last edited on
You'll have to do some fancy bit-shifting.
You'll need an unsigned short (two bytes) so that you can check for matches that cross byte boundries:

[byte 1][byte 0] (see 4 below)
1100010101001101 10101 match

Then check for each possible bitmask by bitshifting, &&ing, and comparing. Once you've checked the first byte through, move byte 1 to byte 0 and read the next byte into byte 1, rinse and repeat.

A couple notes:
1. You don't need to check any patterns that fall completely in byte 1's bits -- as you'll do that on the next loop.

2. Be careful of the termination condition. I suggest that when you hit EOF you just put a zero in byte 1, do your tests, then quit.

3. If you've found a match, the next possible match is two-bits away:
10101001101 10101 match

10101001101 10101 next possible match

4. Remember that the most-significant bit of byte 0 should be adjacent to the least-significant bit of byte 1. (Hence the 'backwards' representation of the bytes above.)

Good luck!
move byte 1 to byte 0 and read the next byte into byte 1

Wait. What if he needs to use big endian? In that case he should do
1
2
number<<=8;
number|=nextbyte;

Be careful if the bytes are signed chars, though. If they are and you find a negative value, C++ will convert it to short repeating the 8th bit on bits 8-15 (so 10000001 becomes 1111111110000001, instead of 0000000010000001, which is what you need). If that happens, just and it with 0x00FF.
Topic archived. No new replies allowed.