Is this code is machine code?

Pages: 12
I need information about the code, when you change file index from .exe to .txt and get some strange code.

How to understand this code.

Code example:

1
2
3
~øÛ8ÚQ¸ø¹K,ïRv>I£?>É5ž?Oźh­¿3´³ñÈť……o£î6ÖûRÓøè¸4?ÄçsÓ?É>źoN?Ág?z?ØO8ô/Iv ªLEú"/> ś«óí?ÒqeÊ??ÞéOw?¬Nûç)?Ü]?1t/Ê'>c~°?+ImˆÞ%Ò?NÍ]R\N>GQ¬¿áè¶?ï5?}DZ   ??^´#ì1'?<âá^zÂ?Ù^xãð$ >"š×»°áxÙYRÀÍ?þíãG`>‡-Ü?¶å[b®Ñê?·¹ÑrÃ?Æ-xó5cNHå4…Wq@.Jmk«sê?ói¹ò??ÞÚ.>òøs?g%Q°ûW5TËé='??Hà?±¬>¬‡|ȁü[¦ðáO7? Ť?Юi
l‡ť³~?Zð?^c)?ø Áñ-d×FsjjÊ%Qž5×V7èÂť7ѺY
äÚ§àśho?"1× šýƒìí?8IôÂö?È"Ž©Ť[]s"Ö5·m\'×uÈ~ýJ?KLåÇhö&9ÆÆåû5wu£=ÊYªùÞwn"S£×?ñc?Õ ÞóèÇ,n%?´@$û¿ÂâîZf,ûÚ&C?ºjä?_eb^?8ùôåöXä½þ)Ž??»dÛ]]}4dOÂ'zy¯-<msxžDk


It probably is, although executables not only contain machine code.
Use a disassembler to transform it into human-readable mnemonics.
closed account (zwA4jE8b)
Like Athar said, use a disassembler.

The .exe in text format is the text editor reading the binary and translating it into the appropriate characters, which are not really representative of "code", its just what the .exe's pattern of 1's and 0's happens to translate into via the text editor.
Hmm, if you knew the architecture it's compiled for could you copy and paste all that into a program that converts ASCI or Unicode characters into their respective codes, convert that to binary and then find out exactly what it's doing?
Yes, you could :)
1
2
while ((c = getchar()) != EOF) 
        printf("%c\n", itoa(c, str, 2));
O: is that more or less the first stage in how an dissassembler works?
That looks like it's simply printing out the numeric values of the bytes in the file...just a different way to represent the exact same data :p
Printing them out in binary though, so it's doing exactly what xander was talking about. And no, I don't think that's how a disassembler works XD.
@xander337, you're thinking about representations more than about what they mean. A disassembler is not much different from any parser. Except that there's a gazillion instructions you have to be aware of...
A disassembler would be very easy to write IMO. All you need is a look-up table to convert from opcodes to mnemonics; then you just read the opcodes and convert them. Constructing the look-up table would be the hardest part - but all you would have to do is find a reference for your target instruction set and hand-write the table (or write a script to do it, if you can).

A good disassembler would be more difficult if, for example, you wanted the disassembled code to actually be legible. A decompiler would be much more difficult since you'd have to convert from assembly to a higher-level language.
Last edited on
Well, that's what I meant when I said first stage. Of course it'd just be opcodes and you'd have to convert it all to mnemonics to be able to read it, but that's why I said " if you knew the architecture it's compiled for" in my first post.
Ohh, well if you wanted to target any architecture then I guess you could use an XML schema.
Made a little dissasembler (got flamed out for incorrectly using this term, so let's call it a low-level decompiler). Not exactly a model of efficiency, and I went all out on typecasting but it works well enough anyways :)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <iostream>
#include <cstdlib>
#include <conio.h>
#include <fstream>
#include <cmath>

std::string itob(int);

int main(int argc, char *argv[])
{
	std::ifstream input("input.exe");
	std::ofstream output("output.txt");
	char code;
	while (input.get(code)) 
		output << itob(code).c_str();
	_getch();
	return EXIT_SUCCESS;
}

// convert n to binary
std::string itob(int n)
{
	// determine the number of bits needed
	int digits = 0;
	while (pow(2.0, static_cast<double>(digits)) < n)
		++digits;
	std::string bin(digits, '0');		// binary representation of b
	for (std::string::size_type i = 0; digits != 0; ++i) {
		--digits;
		int currPlace = pow(2.0, static_cast<double>(digits));
		if (currPlace <= n) {
			n -= currPlace;
			bin[i] = '1';
		}
	}
	return bin;
}


And when I fed it your executable it produced this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
11111101110001010001100101110110010100101110110111110100100111111111111011111111101011111111
00111111010001100111101111110110101001011010011111111100111111111111101101111100111011111111
00111111111111101011111110011111110001011111001001111011011111100110010001011000101011111111
10111111111111110001110010111111111111110011111110111111111100111010100111111110111011111111
10001111010010111110011111111011000111111110111111101011100100111011011001011111111001110101
11011010010101110010011101111101000111101000111111111010111111111111011000100101101011111111
11111111111111111111011110100011110001100111111111111100101111011110101111111011110111100010
01001111111111010001011110001011001101001011111110001111100000111110101101111111101101111000
10111111111001011111110110111110001101011100011100111010010001101001010111111000111111110111
01001010110110111010111110011111111110100111111111111110111011111011100111111111100111100101
10100011010111110101101010011110110011111111111111110010001111111111101111100101101110011111
10111111111111111111111101001101011011001111110111111101101011111110111101100011101001111111
11111111111110110111001001000110111001111010101101010100101101000111010110101101101111101111
01100110101101000110111111111110001011000111111111111111000100100111111110001010110111011101
11100111000101101011101101101110010011111101011111110100101011111110010111001100110100010011
01110011101011110111111010111110110110011110111110111010001010100111111111100011111111111111
01100110111010010111111111111110010010110101100110101100100110100001111111111010101111111011
11111001011100010101111011111111100010110001010011111111111111100100101110110111011111101110
100110010010011111001111111010111100110110111110011011011110011111100010001001101011
Last edited on
That's not a disassembler. That just prints out the contents of a file in base 2, which is even less useful than in ASCII.
Excuse my incorrect terminology then, I was just doing what xander was talking about earlier in the thread :/
if you wanted to target any architecture then I guess you could use an XML schema
A disassembler is not hard, but it's not trivial either. A fair amount of code is needed to parse the arguments of instructions. For x86 there are some prefixes that need special treatment. Also, if one thread ( http://www.cplusplus.com/forum/lounge/56399/2/#msg307263 ) has taught me anything, it's that different architectures are different.
But if you had an XML file that described each assembly language you could probably write a parser that worked on many different ones. That's how the GtkSourceView can highlight so many different languages.
Right, but how would this xml look? It may be that writing it would be as hard as writing the whole disassembler.
I don't know, I wouldn't have thought it would be that difficult if you designed it well and put some thought into it.
Do give it a try.

To illustrate, here's a description of a case of mov instruction in x86 architecture. This is a transfer between register and register or register and memory in either direction. Note that I myself have only dealt with 8086 so my knowledge of modern expansions is limited.

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
1 0 0 0 1 0 d w| mod  reg  r/m |               offset


w is a bit that indicates the size of operands. If w = 0, the size is byte. In 16 bit architectures if w = 1, the size is word*.

reg is a value of 3 bits indicating which register is one of the operands. Depending on w, same number indicates different registers (0 could be al, ax or eax)

mod is a value of 2 bits indicating the type of the second operand.

r/m is a value of 3 bits indicating what is the second operand. If mod = 11, this is the same as reg. In other cases the second operand is memory. The address in memory is built from a sum of several registers and the offset**. Then r/m indicates what registers to add up and mod indicates how many bytes of offset there are (could be 0, 1 or 2). Also, there is a thing called a segment override prefix, which needs to be handled too.

d is a bit that indicates the direction. If d = 0, reg contains the source operand and if d = 1, the destination.

* In 32 bit architectures, normally if w = 1, size is double word, but, I think, if there is a prefix, the size can be word too. No idea what 64 bits do. Probably more prefixes.

** It was like that in 8086. More modern architectures have more complex rules.
Pages: 12