searching for text in a program

If I create one MFC program and call it "Program1". How can I create a second program to rename all the titles in the first program from "Program1" to "Program2". I've written something that would work with a regular char, butI THINK I've run into a snag with the multi-byte character set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include "stdafx.h"
#include <iostream>
#include <fstream>
using namespace std;


int _tmain(int argc, _TCHAR* argv[])
{
	int length;
	char * buffer;

	ifstream is;
	is.open ("Program1.exe", ios::binary );
	if(!is.is_open())
	  return 1;

	// get length of file:
	is.seekg (0, ios::end);
	length = is.tellg();
	is.seekg (0, ios::beg);

	// allocate memory:
	buffer = new char [length];

	// read data as a block:
	is.read (buffer,length);
	is.close();

	for(int i = 0; i < length; i++)
		if(buffer[i] == 'P' && buffer[i+1] == 'r' && buffer[i+2] == 'o'
 		&& buffer[i+3] == 'g' && buffer[i+4] == 'r' && buffer[i+5] == 'a'
		 && buffer[i+6] == 'm' && buffer[i+7] == '1')
		{
			for(int j = 0; j < 8; j++)
				cout << buffer[i+j];
			cout << endl;
			sprintf(buffer+i, "Program2");
		}

	ofstream outfile ("new.exe",ofstream::binary);
	outfile.write (buffer,length);
	outfile.close();


	cout << "Press ENTER to continue...";
	cin.ignore( 1, '\n' );

	delete[] buffer;

	return 0;
}
I would use a string (wstring) instead of a char*, then you can use the string's member functions like .find() and .substr() to change stuff a lot easier.

Also, you do know that a .exe is not a legible text file...?
Sure that's why I used the ios::binary option, but any text in the program will be somehow stored as a string in the .exe format. The two catches that I foresaw were

1) The characters might not be stored as 8 bit chars ( which I did find some of, but not the ones I wanted)

2) Something I'm still not sure of, but perhaps the characters are somehow shifted in by a non 8bit increment
Sounds to me like you're just wasting your time.
You can't know if the internal string belongs to something else, like debugging information. You know, the kind of thing that wastes the executable if messed with.

I think, though I'm not sure, all static data begins at a file offset multiple of 4 or 8 (depending on the system), but it would be ridiculous to store data at bit offsets not multiple of 8.
With all due respect, I still want to try. I also think that it's unfair to shut down a whole thread by simplifying a problem from "difficult" to "impossible so don't even try". I'm looking for one specific string that does exist, and as you can clearly tell, I'm preserving the original by making a copy so "trial and error" iterations aren't a problem.

Thanks.
You couldn't just do a global replace on the source and recompile?
Nope, wouldn't need this post if I could.
I think you'll have better luck using a hex editor, then.
I found that multi-byte character set (MBCS) format can vary within the code usually between UTF-8 and UTF-16, sometimes even UTF-32. The MBCS I needed to search for was UTF-16 which is an extension of the normal ASCII byte turned into a WORD by preceding each character by 0x00.


@helios
Seriously? I really don't want to start a flamewar here or anything but first you tell me "You can't", "you're wasting your time" now you tell me to use a hex editor? I think it's poor etiquette to post just for the sake of posting, when people are searching for a solution to a specific problem. firedraco posed a possible solution, seymore15074 asked a background question which might lead to a future solution. You basically said "your method sucks".

All I wanted to know was a how a program stores a string, which I already know is NOT a ASCII 8 bit character string. I've already done a hex search for "Program1" which failed for the most part (it found one matching pattern, but not the one I wanted and my program found the same string).

I know you can find WORDS in a hex editor, but that wasn't the question I was posing in this thread especially since I didn't know I should be looking for WORDS. Also, You can only do so much in a hex editor and I might want to expand my program in the future. In response to your "You can't", "you're wasting your time". It can be done and was done.

Thanks to those who actually actually gave constructive feedback.



Ok, I see what you mean. You are saying that in the .exe, the string isn't just the bytes "Program1" but is in another encoding. So, before you can do a global replace, you need to know what format it is in.

I think that might be dependent on the platform you are using. What OS is it, and maybe there will be some more information available online about it.

You could also use a hex dump of the file and try to find the characters manually. For example, if you find the ASCII P character and then a null byte and then the ASCII r, and so on, you could then try to replace that accordingly.

That being said, this still doesn't sound like a good idea.
Yes. Seriously. Time is precious. You're just wasting it in overcomplicated solutions of dubious effectiveness.
If you want to know the internal structure of an executable file, that is already extensively documented, both for PE and for ELF. There's no need for trial and error.
And yes. Your method does suck. There are situations where using a program wastes more time than it saves, and this is one of them (well, at least at this point. If you were actually reading the file structure, that'd be a different matter).

I was just trying to propose an alternative that'd save you some time. You don't have to be jerk.
Last edited on
You're still missing my point. I was trying to find out what I was looking for, not how to look for it. Hex editors ware no good when you don't know what you're looking for.
It depends on how they are stored in your file. If you didn't compile in unicode, chances are that the strings are stored as straight ASCII. Search and replace "Program1" with "Program2".

They could be stored in Unicode or UTF-8 as well, in which case you will need to search and replace the proper binary string. For Unicode, use wchar_t like helios suggested. For UTF-8, they will look exactly like they do in ASCII (for this particular string).

One caveat: you can make the string shorter, but you cannot make it longer. (Not with a simple search and replace.)


However, if you want modifiable strings, you should do one of the following:

1. Store them as resource strings. Then you can simply modify the exe's resources. (ELF files don't have "resources" defined the same way that Windows PE32 files do, but that doesn't mean you can't have them. Check out http://ktown.kde.org/~frerich/elfrc.html for some cool stuff.)

2. Get them from a DLL. Then you can simply change the DLL to use different strings. (On Unix, DLLs have a .so extension.)

Both of these methods are useful for internationalization. (There are variations, but generally the user program is aware that certain strings may be modifiable.)

Good luck.
Topic archived. No new replies allowed.