C++ code to delete specific hex bytes from file and insert other bytes without overwriting data

Sep 2, 2011 at 2:52pm
I've been roaming the internet for days now in search of something that would help me achieve what I need but so far I've been unable to... I'm trying to "sanitize" content inside a binary file. I'm getting a very hard time in achieving it: I need to locate several streams of HEX bytes and replace them with other bytes (mostly carriage return and linefeed). I have two main problems: 1. The replacement stream has less bytes than the original and leaves the rest as garbage, because I can't find a way to delete the extra bytes; 1. The replacement stream has more bytes than the original and overwrites following bytes, because I can't find a way to insert bytes without overwriting existing ones;

So this code, may make you laugh, it might even be the wrong approach to what I want, but here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>

using namespace std;

int main() 
{
    typedef istream_iterator<unsigned char> input_iter_t;
    typedef ostream_iterator<unsigned char> output_iter_t;


    // First replacement stream
    const off_t SIZE_001 = 13;
    char search_001[SIZE_001] = { 0x00, 0x2E, 0x2E, 0x2E, 0x00, 0x2E, 0x2E, 0x2E, 0x00, 0x2E, 0x2E, 0x2E, 0x00 };
    char replace_001[SIZE_001] = { 0x0D, 0x0A, 0x0D, 0x0A };

    // Second replacement stream
    const off_t SIZE_002 = 2;
    char search_002[SIZE_002] = { 0x00 };
    char replace_002[SIZE_002] = { 0x0D, 0x0A };    


    string filein;

//    cout << "Enter the file name:\n";   //Tells user to input a file name
//    cin >> filein;                      //User inputs incoming file name

    fstream f("test.exe", ios::binary | ios::in | ios::out);
//    fstream f(filein.c_str(), ios::binary | ios::in | ios::out);


    while ( !f.eof() )  //read the file again until we reach end of file
    {
          // First replacement
          if (search(input_iter_t(f), input_iter_t(), search_001, search_001 + SIZE_001) != input_iter_t())
          {
             f.seekp(-SIZE_001, ios::cur);
             f.write(replace_001, SIZE_001);
             f.seekp(0, ios::beg); // set put pointer to beggining of file
          }
    }

    f.clear();
    f.seekg(0, ios::beg); // return to beggining of file

    while ( !f.eof() )  //read the file again until we reach end of file
    {
          // Second replacement
          if (search(input_iter_t(f), input_iter_t(), search_002, search_002 + SIZE_002) != input_iter_t())
          {
             f.seekp(-SIZE_002, ios::cur);
             f.write(replace_002, SIZE_002);
             f.seekp(0, ios::beg); // set put pointer to beggining of file
          }
    }

}


As an example, in the first replacement I try to replace 13 bytes with 4, and I wanted to delete the extra 9 bytes, because it turns: 00 2E 2E 2E 00 2E 2E 2E 00 2E 2E 2E 00 into: 0D 0A 0D 0A 00 00 00 00 00 00 00 00 00 instead of just: 0D 0A 0D 0A

I know that I can't expect this code to do what I want but I need some help because I can't seem to find the right approach to this. The second problem is having to replace n bytes with more than n, it undesirably overwrites data. As an example, when trying to replace: 00 with: 0D 0A I will be overwriting a byte that was not intended, so I really needed to insert without overwriting in this case. The same applies, this method cannot work for the final purpose I want.

I'm desperately in need of help because I cannot figure out a way to insert delete bytes.

Thank you for your time.
Last edited on Sep 2, 2011 at 2:52pm
Sep 2, 2011 at 3:15pm
I can't find a way to insert bytes without overwriting existing ones
And you won't find a way. It's impossible.

Situation: You have a file name "file.bin". You want to apply to it a function that produces a file of a different size.
Solution: Instead of writing the results of the function back to the input file, create a new file named, for example, "file.bin.temp" and write the results there. Once the processing is done, delete "file.bin" and rename "file.bin.temp" to "file.bin".
Sep 2, 2011 at 3:50pm
Thanks helios.

I figured that it would be the wrong approach since the only references I found on the net suggested what you just said, although I'm having a hard time finding references to the use of hexadecimals. Anyway, I guess the best way would be to do it while outputting the data to the output file instead of allocating it in a memory buffer.

I'm not even a beginner when it comes to programming and I won't be able to write it on my own, so I wanted to ask for some help. How could I apply what you said in your reply?

Sorry for asking, but if you or anyone could share with me a small example for that I would appreciate:
- Read the input file (binary);
- Write to the output file (binary) while detecting the intended byte sequence(s) so that it would add the extra byte(s) at the start of the detected sequence(s) position and/or filter-out (skip) the undesirable byte(s);


Thank you.
Last edited on Sep 2, 2011 at 3:54pm
Sep 2, 2011 at 4:03pm
Read the input file (binary)
1
2
3
4
std::ifstream file("file.bin",std::ios::binary|std::ios::ate);
std::vector<char> buffer(file.tellg());
file.seekg(0);
file.read(&buffer[0],buffer.size());
If the file doesn't fit in memory, you'll have to read it in chunks. This snippet should give you enough information to be able to do that.
You may have problems if the file is too large (>=2 GiB), depending on your platform.

Write to the output file (binary)
1
2
3
std::ofstream file("file.bin.temp",std::ios::binary);
//some_buffer must be of type const char * or char *
file.write(some_buffer,size_of_some_buffer);
Sep 2, 2011 at 4:05pm
Thank you very much.

I'll give it a try.

Last edited on Sep 2, 2011 at 7:34pm
Sep 2, 2011 at 7:35pm
...and after a while it turns out I am unable to do anything.
I got the above snippet compiled but I don't really know what to do with it...
- Where and how is declared the hex stream to search for and it's replacement?
- How to search for the hex stream and if it is found how is it replaced?
- How do I know the buffer size for the output file?

Let's say that the input file/buffer contains this (in hex):
00 2E 2E 2E 00 2E 2E 2E 00 2E 2E 2E 00 20 61 30 78 78 00 20 00

And I wanted the output buffer/file to be (in hex):
0D 0A 20 61 30 78 78 0D 0A 20 0D 0A

I wanted to:
1. Find: 00 2E 2E 2E 00 2E 2E 2E 00 2E 2E 2E 00
and replace with: 0D 0A 0D 0A
2. Find: 00
and replace with: 0D 0A 0D 0A

This site has plenty of resources but I can't find any specific example that addresses hexadecimal "search and replace", and I fail to understand enough to do it. Even the code I posted at the beginning was originally from a post in this forum, I just tweaked some bits to my needs until I realized I couldn't do what I wanted with it.

Anyway, I know I'm asking too much.

Sorry for wasting your time.
Topic archived. No new replies allowed.