Self-modifying programs

So, for a course I'm in I have to partake in a discussion on how self-modifying code is possible in the von Neumann architecture, and what implications it has as a programmer. Now, I know how it's possible, and the concept behind it. I also know that self-modifying code is kind of a thing of the past due to security measures now.

Anyways, I don't know what implications it could have.
Aha! Moschops Im always pleased to see your posts. Sadly, I can't go into this discussion and start bringing up skynet :/
If you mean programs that write to their code segment, this would imply, I assume, that all programs would have to be written in asm, which is a huge handicap. I can't say what could be gained. Some weird optimisations maybe?
From what I've found during research, it is "easiest" in assembly, but it is possible some higher level languages (COBOL for example I guess has a ALTER keyword that modifies some code).

I've also read that used correctly it can lead to some optimization, but I can't see how really. Do you think you could explain why you think it might optimize?
I'm not sure if homoiconic languages would fit the "implications of self-modifying programs". That seems more of a tool to self modifying code. I need more of a reason as to why/why not to use or/not to use self modifying programs.
Computer viruses, but I guess that's a thing of the past.
Maybe dynamic operating systems, morphing to adapt to the machine?
@JLBorges,
This is a bit of a problem. This sort of self modifying code is simply the use of indirect calls. It is as much 'self modifying' as using function pointers in C. You could call all programs self modifying, since they all write to their memory although that definition is a bit useless. I don't see a line clearly.

@ResidentBiscuit,
I was mostly thinking about optimization for space, although that is a bit silly. Maybe some time could be saved by replacing conditional jumps with unconditional ones that have their targets modified at runtime*? I suppose some jumps could be removed entirely (consider if(c) a = b+c; else a = b-c;. If you could replace that add with sub, you'd save a jump*)

*The code required to do this would take longer than the jump. Unless there's an instruction for conditional move. There might be..
I don't see a line clearly.


Well, here is the the topic for this.
"Under the von Neumann architecture, a program and its data are both stored in memory. It is therefore possible for a program, thinking a memory location holds a piece of data when it actually holds a program instruction, to accidentally (or on purpose) modify itself. What implications does this present to a programmer?"

I don't under assembly enough to really understand what these implications could be.
Last edited on
Check out the article in Wikipedia at http://en.wikipedia.org/wiki/Self-modifying_code

Basically:
Accidental modification can lead to the program crashing in a location extremely unrelated to the actual code that had the error. (eg An write to an unitialized pointer may cause a crash when the pointed INSTRUCTION is executed, which appears random).

And since the code in the source file will not be changed it's a nightmare to find.

Modification of code also leads to easier attack vectors, since the attacker can write actual code and execute it, without regard for higher level protection (eg no-execute bit).

Actual usage:
You can hide code inside data, eg scramble the code and reassemble it at runtime. Many copy-protection programs did that. This way the actual code doing the checks was not present on a disassembled executable.

You can also store compressed executable code along with the code to decompress in-place. The first time the code is executed it's decompressed. After that it's already present. This plays havoc with modern OS paging, but it's an extreme solution for systems with tight storage memory and large RAM (or extremely slow storage).

Nowdays it's mostly used to change pointers to functions when loading dynamic libraries.
eg Load the library and set the pointers to the function versions you want (optimized for this processor).

I guess you could also apply it the same way to change the functions in-place based on hardware / configuration information.

Another use today is the JIT compilation in Java and Javascript interpreters. The bytecode gets changed into actual machine code and executed. Since they use metrics to optimize the code (based on loop counters and branches taken), the same thing could be done to optimize native code.
You run the code for some time with instrumentation instructions (eg counting branches) and then change the actual code by removing the instrumentation and changing the branching code.

But the difficulty and security problems are what has all but removed self-modifying code from our current programming habbits.

And the slowdown due to having to flush the cache when the code changes.
> This sort of self modifying code is simply the use of indirect calls.
> It is as much 'self modifying' as using function pointers in C.

It is very (very) different.

To give the closest possible C++ analogy, it is as if every construct in code was represented as a std::list<boost::any>; the list being the sequence generated by an in-order traversal of the abstract syntax tree. This list is both code and data - it can be modified like any std::list<> could be modified and it can be executed like any piece of code could be executed.

Now take this one step further. The engine that executes these data-cum-code is itself written using a bunch of such lists. Which being code can be executed. Which being data can be modified. A meta circular evaluator in short.
Self-modifying programs

isn't that refered as metaprograming?

using template tricks or something like that?
@JLBorges,
While lisp is certainly neat, it achieves self modification by labelling data as code. I should probably not get into this discussion as it reminds me of http://www.cplusplus.com/forum/beginner/58499/ more than I'd like it to.

@codekiddy,
Well, templates are a bit static. It's dynamic self modification that ResidentBiscuit needs.
Yea it's more of modification, that during runtime, may be unknown. Like for example, some kind of modification dependent on some user input, or hardware architecture that might not be known on the target machine
Topic archived. No new replies allowed.