Change all 2-byte combinations in a file matching certain criteria

Hey everyone. I'm a beginner programmer learning C++ mainly for video game reverse engineering, as well as for writing tools to assist with game modding.

I'm trying to mod UV maps in 3D models from an old PSX game called WWF SmackDown in order to convert the models to a slightly different format that allows importing them into the game's sequel, WWF SmackDown 2, without glitches.

I need to change the X and Y position values of UV coordinates based on these criteria:
1. If X > 0xBF and Y > 0xBF, then decrease X by 0xC0
2. If X > 0x7F and X ≤ 0xBF and Y > 0xBF, then decrease X and Y by 0x40

I can do this manually with a hex editor but it's a long and error-prone process so I'd like to automate it.

Since X and Y position values are coded by one byte each and always go together, I believe that comes to reading input from a file, storing its contents in an array, then searching the array for 2-byte combinations matching my criteria and manipulating each byte individually. How can I do that?

Specifically the last two steps: finding byte combinations in an array and manipulating each byte of the found combinations individually.

I got a very basic code running that just opens a small 16-byte test file (not a real model file yet), stores its contents in an array and then outputs them as hex numbers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <fstream>
#include <iostream>
#include <iomanip>

int main()
{
	std::ifstream inf{ "Sample.dat", std::ios::binary };

	if (!inf)
	{
		std::cerr << "Error: file could not be opened for reading!\n";
		return 1;
	}

	unsigned char array[16]{ 0 };
	int arrayIndex{ 0 };

	while (inf >> array[arrayIndex])
	{
		std::cout.fill('0');
		std::cout << std::hex << std::setw(2) << static_cast<int>(array[arrayIndex]);
		++arrayIndex;
	}

	return 0;
}


The "Sample.dat" file contains the following bytes:
 
AB 12 FD BC 44 15 AA 6B DF 4D 3A EE E1 D7 06 78


And I actually have one more problem as well. As an experiment I tried adding one more simple loop that would go through the array and change every byte that has a value of 0xAB (that would only be the very first byte in my file) like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <fstream>
#include <iostream>
#include <iomanip>

int main()
{
	std::ifstream inf{ "Sample.dat", std::ios::binary };

	if (!inf)
	{
		std::cerr << "Error: file could not be opened for reading!\n";
		return 1;
	}

	unsigned char array[16]{ 0 };
	int arrayIndex{ 0 };

	while(inf >> array[arrayIndex])
	{
		if (array[arrayIndex] == 0xAB)
		{
			array[arrayIndex] -= 0xC0;
		}
		++arrayIndex;
	}

	while (inf >> array[arrayIndex])
	{
		std::cout.fill('0');
		std::cout << std::hex << std::setw(2) << static_cast<int>(array[arrayIndex]);
		++arrayIndex;
	}

	return 0;
}


But I immediately encountered a problem - the program wouldn't output anything to the console. I used a debugger and found out that what seems to be happening is that the first loop works correctly but the second loop doesn't start because the condition inf >> array[arrayIndex] evaluates to false. I guess because I reach the end of the file in the first loop? So another question is how do I loop through the same file again without closing and reopening it?

I tried adding inf.seekg(0, std::ios::beg); after the first loop but that doesn't help.

So this is where I'm stuck. I tried googling a bunch of stuff but nothing seems to be the solution to what I need. Any help would be greatly appreciated, even hints and pointers to subjects to research.
Last edited on
I need to change the X and Y position values of UV coordinates based on these criteria:
1. If X > 0xBF and Y > 0xBF, then decrease X by 0xC0
2. If X > 0x7F and X ≤ 0xBF and Y > 0xBF, then decrease X and Y by 0x40


How do you know when you have an xy position byte pair?

I can do this manually with a hex editor but it's a long and error-prone process so I'd like to automate it.


So how do you know which bytes to change? If you can specify how to determine which bytes to change based upon the above formula then reading the file, changing the bytes and re-writing the file is quite easy and straightforward.

Note:
hex dec
40   64
7F  127
BF  191
C0  192

X ≤ 0xBF and X < 0xC0 are equivalent conditions.
I did google "C++ read binary" and got: https://stackoverflow.com/questions/5420317/reading-and-writing-binary-file

Thank you for the link! I haven't learnt <vector> and <iterator> yet but will definitely try to look into them asap and see if that helps.

How do you know when you have an xy position byte pair?

I don't as I haven't been able to fully understand the UV data structure in the models. The current idea is to change all the 2-byte combinations in a file that match the criteria. If that also messes up other data besides UV coordinates, then I'll go from there and try to solve that problem.

So how do you know which bytes to change? If you can specify how to determine which bytes to change based upon the above formula then reading the file, changing the bytes and re-writing the file is quite easy and straightforward.

The way I do it manually is I first import a model to Blender using a tool built for this specific model format. (Unfortunately, this tool doesn't allow exporting UV data back into the source file, otherwise I would've definitely tweaked the UVs in Blender. And I haven't been able to contact the tool author so asking him for help is not an option either). Then I export the UV map as an image and overlay it onto a 0xFF by 0xFF grid in Photoshop. Then I manually check the coordinates of every vertex that needs tweaking. Let's say I have a vertex with coordinates X = 0xFE and Y = 0xE1. I then use a hex editor to replace all instances of 0xFEE1 in a file with 0x3EE1 (because I need to decrease X by 0xC0).

So I'm trying to figure out how to do something similar automatically, only without the need to check the exact coordinates first. The program should first go through the file and search for 0xC0C0. If it finds it, then it changes it to 0x00C0. Then it goes through the rest of the file looking for more instances of 0xC0C0. When the end of the file is reached, the program then searches for 0xC0C1 and changes it to 0x00C1. And so on, until all the possible combinations of coordinate values have been checked.

Not sure if I'm making myself clear here :) Let me know if I'm missing something or what I'm trying to achieve is impossible or complicated.
Last edited on
X ≤ 0xBF and X < 0xC0 are equivalent conditions.

You're right! I don't have a X < 0xC0 condition though :) Or do you mean to say that using < instead of ≤ is a better practice?
OK. Based upon:

1. If X > 0xBF and Y > 0xBF, then decrease X by 0xC0
2. If X > 0x7F and X ≤ 0xBF and Y > 0xBF, then decrease X and Y by 0x40


and assuming that the bytes are X Y then as a simple starter, consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <fstream>
#include <iostream>
#include <string>

int main()
{
	std::fstream samp("sample.txt", std::ios::binary | std::ios::in | std::ios::out);

	if (!samp)
		return (std::cout << "Cannot open file\n"), 1;

	samp.seekg(0, std::ios::end);

	std::string data(samp.tellg(), 0);

	samp.seekg(0, std::ios::beg);
	samp.read(data.data(), data.size());

	for (size_t i = 0; i < data.size() - 1; ++i)
		// Process bytes here
		if (data[i] > 0xbf && data[i + 1] > 0xbf)
			data[i] -= 0xc0;
		else if (data[i] > 0x7f && data[i] <= 0xbf && data[i + 1] > 0xbf) {
			data[i] -= 0x40;
			data[i + 1] -= 0x40;
		}

	samp.seekg(0, std::ios::beg);
	samp.write(data.data(), data.size());
}


This will read the whole of the file as binary into the string data. Then process data as required, then write the contents of data back to the file as binary.
This will read the whole of the file as binary into the string data. Then process data as required, then write the contents of data back to the file as binary.

Yes, this seems like exactly what I need!

However, the if statements for byte processing don't seem to work as expected. The true statements do not get executed even though the condition is definitely met. I've tried it both with a real file and a small test file with the following bytes:

BF B7 C0 C1 FF DE 35 66 66 66 66 66 66 66 66 66

As you can see, the matching combinations are definitely there.

When I step through the execution with a debugger, it simply jumps over the true statement. Am I missing something here?
Last edited on
which statement is true and being skipped (exact code and value when it does the wrong thing)?
which statement is true and being skipped (exact code and value when it does the wrong thing)?


Here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <fstream>
#include <iostream>
#include <string>

int main()
{
	std::fstream samp("sample.txt", std::ios::binary | std::ios::in | std::ios::out);

	if (!samp)
		return (std::cout << "Cannot open file\n"), 1;

	samp.seekg(0, std::ios::end);

	std::string data(samp.tellg(), 0);

	samp.seekg(0, std::ios::beg);
	samp.read(data.data(), data.size());

	for (size_t i = 0; i < data.size() - 1; ++i)
		// Process bytes here
		if (data[i] > 0xbf && data[i + 1] > 0xbf) // this evaluates to true...
			data[i] -= 0xc0; // ...but this statement is skipped
		else if (data[i] > 0x7f && data[i] <= 0xbf && data[i + 1] > 0xbf) {
			data[i] -= 0x40;
			data[i + 1] -= 0x40;
		}

	samp.seekg(0, std::ios::beg);
	samp.write(data.data(), data.size());
}


The values were:
data[i] = 0xC0 'А'
data[i + 1] = 0xC1 'Б'

Visual Studio debugger showing the line is skipped - https://www.screencast.com/t/KZ3JNLR6
do me a favor and cast data[i] as unsigned char on both sides of that.

my hunch is that string is signed char based and its thinking your values > 127 are negative.
you may also want to make the constant ones unsigned. Not sure, they probably are actually integer sized instead of bytes and so ok without it.

do you have warnings about this junk, size mismatches, signed mismatches??
Last edited on
do me a favor and cast data[i] as unsigned char on both sides of that.

Yes, that fixed it! Thank you!

I suspected that might had something to do with comparing characters of a string and tried casting it as int before but that didn't work. Can you, please, explain why it had to be unsigned char?

Edit: Oh, actually unsigned int seems to work as well. So then the question is what is the problem with signed integral types?

Editx2: Just saw your edit.

my hunch is that string is signed char based and its thinking your values > 127 are negative.

You're right, that seems to be the case.

do you have warnings about this junk, size mismatches, signed mismatches??

Yes, actually I did have a warning about this line before I added casting as unsigned char:

std::string data(samp.tellg(), 0);

It said: "'argument': conversion from 'std::streamoff' to 'const unsigned int', possible loss of data"

I probably should've included it when describing my problem but I kinda forgot about it since it wasn't related to the if statement. It disappeared after I added casting.
Last edited on
cast data[i] as unsigned char on both sides of that.


This is why all code should always be tested and debugged! :)
I've done some more tests and everything seems to be working great. I only had to modify the loop to change the code that is executed after every iteration from ++i to i+=2 as I only want to check every 2 bytes.

Unfortunately, the program does modify quite a bit more data than just UV coordinates in the actual model files, which I guess was to be expected. But I can try working around it by feeding it smaller file chunks as I can more or less tell at least approximate regions where UV data is stored. So this should still be useful for me and save me some time.

Thank you for the help, seeplus and jonnin!
As you are working on unsigned data, it may be easier to read the file into a vector rather than a string and cast just twice for read/write. Each access to data then doesn't need to be cast. Consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <fstream>
#include <iostream>
#include <vector>

int main()
{
	std::fstream samp("sample.txt", std::ios::binary | std::ios::in | std::ios::out);

	if (!samp)
		return (std::cout << "Cannot open file\n"), 1;

	samp.seekg(0, std::ios::end);

	std::vector<uint8_t> data(samp.tellg());

	samp.seekg(0, std::ios::beg);
	samp.read(reinterpret_cast<char*>(data.data()), data.size());

	for (size_t i = 0; i < data.size() - 1; ++i)
		// Process bytes here
		if (data[i] > 0xbf && data[i + 1] > 0xbf)
			data[i] -= 0xc0;
		else if (data[i] > 0x7f && data[i] <= 0xbf && data[i + 1] > 0xbf) {
			data[i] -= 0x40;
			data[i + 1] -= 0x40;
		}

	samp.seekg(0, std::ios::beg);
	samp.write(reinterpret_cast<char*>(data.data()), data.size());
}

Topic archived. No new replies allowed.