x86/64 hooking

So, lately I've been getting into hooking since my job is directly related to it. This question is for the x86 gurus.
Suppose there's a 3 byte instruction somewhere; say, an opcode plus an offset. If I were to do a LOCK XCHG from a different thread at that position with a 32-bit value, would this situation be possible?
1. Thread A fetches opcode.
2. Thread B overwrites memory atomically.
3. Thread A fetches overwritten offset.
4. Summoning of nasal demons complete.
closed account (13bSLyTq)
What do you mean? You want to atomically patch the memory address. Yes maybe you may need to patch it using 0xF4EB then patch Address - 0xA then do the jump, this way you not only successfully hook it with you own callback but it is easy.

Then you can do whatever instruction you want.
Last edited on
I want to overwrite the whole instruction.

Yes maybe you may need to patch it using 0xF4EB then patch Address - 0xA then do the jump
That would definitely not work. The instruction would be left in an inconsistent state for a brief moment.
closed account (13bSLyTq)
Okay so first all few 0x90 than use 0xF4EB this that way the instruction would be done. Then patch it atomically.
closed account (o1vk4iN6)
^
You are changing the behavior of the code by putting all those NOPs...


@helios
Are you sure that situation is even possible ? You can say the samething in the situation of a move statement, if the value is not aligned it would take 2 fetches to get the value. What if inbetween those fetches an atomic write happens and changes that value ? So you now have 2 halfs of old and new.

I'd try creating a test maybe, have 3 threads running the code that gets modified and then have another thread modifying the value atomically between two instructions and offsets in such a way that if the new offset is used with the old opcode it'll bring the code into a bad state.
Okay so first all few 0x90 than use 0xF4EB this that way the instruction would be done. Then patch it atomically.
The hookee must pass from unhooked state to hooked state in a single step. Leaving it in a third state momentarily is completely unacceptable.

xerzi: Well, no, I'm not sure if the situation is possible. That's the point of this thread.
closed account (13bSLyTq)
Hi,

for example when I hook X86SwitchTo64BitMode we need to do a atomic patch via using InterlockedExchange. Yes so it is possible.

Anyway we are not leaving it unfinished lol. if you are so much worried about it simply use a 0xE9 but make sure to build a stub before jumping back to "working" instructions but make sure to perform the calculation in the callback . It would work.
Can someone explain to us layman's why the race condition is necessary? I've dabbled in modifying assembly code and WHEN it gets modified never seems to be an issue.
Can someone explain to us layman's why the race condition is necessary?
What do you mean "necessary"? An RC is a bad thing.

I've dabbled in modifying assembly code and WHEN it gets modified never seems to be an issue.
Well, timing only really becomes an issue in a concurrent context.
I guess my question could be better asked as: "Why can't this offset address be modified before Thread A fetches the Op Code?". Or "Why does it have to be concurrent?".

I can tell you right away that my question stems from my comparative lack of experience. Even when I've had to work around run-time integrity checks I can get away with putting a break\pause point at the beginning of the function in question, I've never had to modify anything right when it is called. Thanks for the response.
Why can't this offset address be modified before Thread A fetches the Op Code?
Because user processes don't have such a fine-grained control of the scheduler or the CPU.

"Why does it have to be concurrent?"
The simplest form of remote hooking (i.e. hooking from another process) involves injecting a DLL by starting a remote thread. At this point, there's several ways of guaranteeing atomicity. One of them involves successively suspending and resuming all threads until none of their instruction pointers are within any of the address ranges that interest you.
It doesn't have to be concurrent; it's just the less dumb solution.

Even when I've had to work around run-time integrity checks I can get away with putting a break\pause point at the beginning of the function in question, I've never had to modify anything right when it is called.
Well, that's the thing. With few exceptions, you can only assume that the function you're hooking can be synchronized if it's yours. If that's the case, there's no point in dabbling in run time hooking. Just modify the source code and be done with it.
Topic archived. No new replies allowed.