Being new to programming I don't understand how people hack programs. Once the program is compiled into machine code how can they alter the code or gain access to parts of the program they are not meant to.
I don't want a tutorial on how to hack just a brief explanation of how people do it and how to code against it.
Machine language is sequnce of commands represented by numbers. They are more restrictive that nigher level language, but if you know them, you can write any program directly in machine language.
Assembly is a mnemonic language which makes it easier to understand machine commands. For example something like E70204 might be represented in assembly as add eax ebx
It is essentually lower level programming language. So you can alter code, reassemble program and enjoy the changes.
Alternatively you can look for program interaction with memory. There is a bunch of conventions programs usually adhere, some common idioms and deep understanding how computer works helps to find weak points. For example if you can access stack allocated array by arbitrary index, you can rewrite function return address and make CPU to continue execute commands starting from arbitrary address. You can write anything in machine language to the array you have access to, and make program execute it: almost complete control of the target.
The theory behind hacking is far more simple then most people realize or are willing the except. Computers and the programs running on them are all finite-state machines, so we can expect that if we input 'x' into function 'y()' then the result will be 'z'.
Computers and the programs running on them are all finite-state machines
Technically, no. Computers are Turing machines (if we relax the "infinite" requirement of memory to "arbitrarily large"), while FSMs operate on fixed amounts of memory. A program considered as a whole can run the entire range of power of computation, from the program that takes no input and produces no output, to a virtual machine.
Computers are also technically non-deterministic for various reasons ranging from hardware bugs to power failures (including spikes, drops, and blackouts) and quantum effects.
Technically, no. Computers are Turing machines (if we relax the "infinite" requirement of memory to "arbitrarily large"), while FSMs operate on fixed amounts of memory. A program considered as a whole can run the entire range of power of computation, from the program that takes no input and produces no output, to a virtual machine.
But real world computers have some fixed amount of memory, and therefore can only assume finitely many states.
Sure it may be theoretically possible that you simply add more memory as needed, without bound, while running an algorithm; giving you the power of a T.M. But even then there is some point where it becomes physically impossible as you need more hardware than the universe can provide materials for.
Anyone knows about a good resource or book on this topic?
You have an entire hard drive full of resources on this topic. As for tools to use I recommend IDA Pro or Ollydbg for de-compiling binary files depending on how much hand holding you are comfortable with. Another tool I find useful at times is DLL Export Viewer from Nirsoft.
@ helios: This is an odd statement to make, it might be true in theory but not in fact. Programs limit themselves in scope and specialize themselves to certain tasks. Sure an entire PC system might be Turing capable, but practicality and cost force the individual components to be otherwise. We can reduce that scope even further and say that the components that make up the individual processes (DLL's, .SO's, Frameworks, etc.) are even more limited. I can see what you're saying if you view the potential of the entire system as a whole, but that's a bit misleading and it isn't how this is done.
Can you also point me to any good links for the resources? I would prefer if the tutorial / book teaches more practical aspects than delving too much into assembly language syntax.
I don't have any to recommend on hand to be honest. I learned ASM while working with PIC's for me EE degree, from there I learned about (actually still learning, I am not that good at this stuff) the calling conventions and the individual x86 commands the hard way. It looks like a lot to examine because ASM is very tedious, there aren't any real functions like we have in C\C++. There are things like RET but even that requires the stack to be restored either before or after the call.
This is an odd statement to make, it might be true in theory but not in fact. Programs limit themselves in scope and specialize themselves to certain tasks. Sure an entire PC system might be Turing capable, but practicality and cost force the individual components to be otherwise. We can reduce that scope even further and say that the components that make up the individual processes (DLL's, .SO's, Frameworks, etc.) are even more limited. I can see what you're saying if you view the potential of the entire system as a whole, but that's a bit misleading and it isn't how this is done.
Even a primitive computer with only a handful of the most basic assembly instructions is Turing complete, if given unlimited memory and allowed to run for an unlimited amount of time. ( At least that is what the Digital Modelling Theses says ) In fact C++ as well as most other programming languages are Turing complete.
However no real physical computer can achieve unlimited memory or run for an unlimited amount of time. So no existing computer is Turing complete, and I don't think it is possible to ever create a real Turing complete computer.
Some problems are solvable only in theory as some problems require too much memory or time to compute.
I program is a file. Change the file and you change the program. I used to do this with games in the early 90's to circumvent the copy protection. All it took was a debugger and some knowledge of assembly code. Sometimes cracking the copy protection was more fun than the game itself. :)
But you need access to the program file for that. The more common method these days is to give a program specially tailored input to cause it to overwrite part of its code, change the return address of a function to execute injected code, etc.
But real world computers have some fixed amount of memory, and therefore can only assume finitely many states.
Sure it may be theoretically possible that you simply add more memory as needed, without bound, while running an algorithm; giving you the power of a T.M. But even then there is some point where it becomes physically impossible as you need more hardware than the universe can provide materials for.
That we can't infinitely construct more memory for a machine is a problem with the universe, not the machine itself. If we transported a computer, untouched, to a universe with an infinite amount of matter, would its Turing completeness change? If so, how would that make any sense?
Sure an entire PC system might be Turing capable, but practicality and cost force the individual components to be otherwise. We can reduce that scope even further and say that the components that make up the individual processes (DLL's, .SO's, Frameworks, etc.) are even more limited. I can see what you're saying if you view the potential of the entire system as a whole, but that's a bit misleading and it isn't how this is done.
I'm not really following. The language of dynamic libraries is exactly as powerful as the language of the machine, since its one and the same. As for frameworks, the machine model exposed by most if not all programming languages is also Turing complete.