IDA -> Inline Assembly [Possible]??

I am trying to write something in Visual Studio using inline assembly, but not quite sure how to convert from IDA view disassemble. At first I thought that if IDA view looked like this (pulled from Google)..

1
2
3
4
5
6
7
8
9
10
11
CODE:004024B5     ; init ExtendedRegisters
CODE:004024B5
CODE:004024B8 164                 xor     ecx, ecx
CODE:004024BA 164                 mov     edx, 10h
CODE:004024BF 164                 call    c_memset        ; eax = buffer
CODE:004024BF                                             ; edx = count
CODE:004024BF                                             ; ecx = int
CODE:004024BF
CODE:004024C7 164                 xor     ecx, ecx
CODE:004024C9 164                 mov     edx, 44h
CODE:004024CE 164                 call    c_memset


Then Visual Studio conversion would be...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
VOID ExtendedRegisters()
{
  __Asm {
    //Save our registers
    push ecx 
    push edx

    //Set registers and call function
    xor ecx, ecx
    mov edx, 10h
    call c_memset

    //Set registers and call function
    xor ecx, ecx
    mov edx, 44h
    call c_memset

    //Revert our registers to original data
    pop ecx
    pop edx
  }
}


Basically thinking that whatever IDA sees, I must register and in the same order. However, after looking at some examples online I realized I'm pretty lost. So if you guys can either give me an IDA2InlineASM script that converts an IDA function into inline assembly that can be put in C++ or if you can just put a basic example of a small function in IDA View converted to Visual Studio 2008 with a small explanation of how you converted it. For example, do you call the registers backwards, which ones do I need to call, why is some of it left out, any small little info like that helps.
You have not set eax, nor does the value in edx and ecx correctly match what you want to do by calling memset.
I think your missing my point, like I said I pulled that code from Google. I can PM you my code whenever I get back home if you want, that was just an example. Basically I'm asking how I convert a function I'm reading in IDA View to inline assembly in Visual Studio C++.
I think you are missing Zaita's point. Just because you found something, somewhere on the internet doesn't make it smart. And when messing with disassemblers you need to be very careful about register values --compilers will often move things around because they know something about what values a register holds. Simply cut-n-paste'ing code is dangerous.

https://www.openrce.org/repositories/users/dennis/launch_image_in_memory.html
Notice how it takes care to provide an address in EAX?

Also, c_memset is non-standard, and the only references I can find for it are for IDA and Haskell. Standard cdecl calling conventions expect you to push and pop arguments:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
void fooey()
  {
  char s[ 5 ];
  // in the following, I don't care about clobbering EAX and ECX
  // --because VC++ doesn't expect them to be preserved
  __asm {
    // count = 5
    mov  eax, 5
    push eax
    // value = 0
    mov  eax, 0  // people used to use xor eax, eax because on 
    push eax     // really old processors it made a difference in speed
    // destination = s
    lea  eax, byte ptr s
    // memset( s, 0, 50 )
    call _memset
    // (eax has the return value, which is s)
    // clean up
    pop  ecx
    pop  ecx
    pop  ecx
    
    mov  [eax],   'A'  // s[ 0 ] = 'A'
    mov  [eax+1], 'B'  // s[ 1 ] = 'B'
    mov  [eax+2], 'C'  // s[ 2 ] = 'C'

    // puts( s )
    push eax
    call  _puts
    pop  eax
    }
  }


You can learn everything you want to know about inline assembly from Microsoft
http://msdn.microsoft.com/en-us/library/4ks26t93(VS.71).aspx

Another useful link (has an example of a pure assembly function)
http://www.cplusplus.com/forum/beginner/3280/

Also (just to be sure), you have to know at least basic Intel/MASM assembly if you plan to use it.

Good luck!

[edit] I've never used VC++ __asm before. Line 14 might be wrong. Perhaps it is just mov eax, s ?
Last edited on
In VC++ you have to write what kind of pointer it is, byte, word or dword.
Xor eax, eax may be still sometimes faster than mov eax, 0 afaik.
And I think if you are not using naked functions you do not even have to push eax yourself, because a prolog and an epilog is created by the compiler.
PM me the code.
Last edited on
Going to try and explain myself a little bit better in two separate posts. The first post (this one) is going to try and explain a few things that I guess I didn't make clear before. The second post (below this) is going to ask the same question, reformatted, so that hopefully you guys understand my question more as well as including an IDA-View code snippet of the ACTUAL code that I'm trying to convert to C++.

Hehe alright maybe I didn't explain myself properly because no one seems to understand what I'm asking, let me try again. First of all, Duoas that did help me a little bit, but again it isn't exactly what i was asking. The code that I posted above was something I found in IDA view on Google (no IDA on work computer) and was just a snippet of a much larger dissasembly of some random file.

That IDA assembly I posted has NOTHING to do with what I want to accomplish, as a matter of fact I don't even have a clue what file it's dissasembling. The reason I posted the IDA view is to show you guys how I have been converting to inline assembly (finding the function in IDA and copy/pasting the assembly, then just paste it up in the same order in Visual Studio __Asm, except obviously changing the formatting so it will compile.) I know it's wrong, I was just showing you what I've been doing so far. I also know that c_memset is not a valid assembly expression and I would have to replace it with whatever the address of the function in the exe is.
Last edited on
Inline Assembly Questions


Question 1
I've always been a little confused about how C++ knows what function your writing the assembly for. I have an idea, I'll provide an example with the inline assembly below. Note that this is just assembly written off the top of my head as an example, and most likely is not coded properly or won't compile.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
//Declare MovePlayer functions address in the program
#define MovePlayerToXYZCoords 0x40DD40


//Contains assembly for 0x4000A0 which is a function that
//sets these registers (eax, ecx and edx) which contain the
//new X, Y and Z locations and then passes them to the
//MovePlayer function which uses the registers to move you
///to new location.
VOID MoveLocation(float x, float y, float z)
{

//From my very little knowledge in assembly it seems
//that unless I set parameters as a local variable, the 
//assembly won't compile. So I just set some local variables
//pulled from the parameters here so it will compile.
float WarpXLoc = x;
float WarpYLoc = y;
float WarpZLoc = z;
__asm {
//Initialize
    push eax
    push ecx
    push ebp
    
//Mov X, Y, and Z into registers with local variables in brackets
//because from what I've read they must be.
    mov eax,[x]
    mov ecx,[y]
    mov ebp,[z] 

//Call move function with new registers
call MovePlayerToXYZCoords
    
//Revert
    pop eax
    pop ecx
    pop ebp
  }
}


So my guess is that C++ doesn't actually NEED to know what offset the assembly is from. The reason is because when you set whatever registers you plan to set, then call the function your passing them to, then the function your calling will use these registers to execute whatever it is that it is supposed to do. I may be wrong, this is just a guess and wasn't read anywhere or anything like that.

Question 2
What happens when you find a function your trying to write assembly for and it uses the same register to declare two different variables that are both needed. Look below for an example, using a replaced MoveLocation function from the code posted in Question 1.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
VOID MoveLocation(float x, float y, float z)
{
float WarpXLoc = x;
float WarpYLoc = y;
float WarpZLoc = z;

__asm {
    push eax
    
//Mov X, Y and Z locations into the eax register. This
//is my guess on how to use the same register for 
//three different variables, I may be wrong and please
//tell me the correct method if I am wrong.
    mov eax,[x]
    push eax

    mov eax,[y]
    push eax

    mov eax,[z] 
    push eax

//Call MovePlayer function
    call MovePlayerToXYZCoords
    
//Revert
    pop eax
  }
}


Because if my guess on Question 1 is right, when you use inline assembly and call a function, it takes the passed registers to execute whatever it is that the called function does. So what happens if eax is used for X, Y, and Z? Wouldn't the eax register get reset each time so that your really just passing eax with the z parameter in the end?

Question 3
So here's my final question, the one that I really came on the forums to ask. I need to convert the following IDA assembly code to inline assembly in Visual Studio. How would I go about doing this? Look below for the code, I removed all the ".text:0xoffset's" on the left to make it easier to read. I've also marked up the 3 areas I will be editing here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
mov     eax, dword_980D84
test    eax, eax
jz      short loc_40A365
mov     ecx, dword_7B8440
push    ebx
push    0
//Variable One (0x24A1 = opcode)
push    24A1h
call    sub_658FB0
mov     ecx, dword_980D84
//Variable Two (2 = size)
push    2               ; Size
push    offset word_97F2B8 ; int
push    4               ; int
//Variable Three (ax = data)
mov     word_97F2B8, ax
call    sub_64A430
//Not used in my assembly, a little unsure what 
//exactly it does (0 = memory?)
push    0               ; Memory
mov     bl, al
call    j_j__free
mov     ecx, dword_A49E3C
add     esp, 4
dec     ecx
test    bl, bl
mov     dword_A49E3C, ecx
pop     ebx
jnz     short locret_40A380
mov     ecx, dword_7B8444
push    offset dword_980D84
call    sub_440B90
retn
Xor eax, eax may be still sometimes faster than mov eax, 0 afaik.

Only if you count the amount of time it takes to load the instruction, since the XOR is two bytes and the MOV EAX is five. (So XOR is still better...)

[SHOUTING]
Going to try to explain my answer a little bit better, but on just one single post because I think you are intelligent enough to notice it without resorting to spamming.

(NB. You are lucky you are getting any response at all. You've just done the internet equivalent of biting the hand that feeds you then adding insult to injury. If you don't think you have got a satifactory answer, don't just turn up the volume and repeat the question. The reason I am responding at all is because of the clear effort you made to explain yourself better --so I assume that the inordinate rudeness was an accidental faux pas.)


Answer 1
If you stick inline assembly in the middle of a function, C and C++ presume that the assembly code is part of that very function, and simply insert the assembly code (assembled to machine code, of course) into the function exactly where your __asm statement(s) are. There is no magic to it.

But, if I understand your example correctly, you are wondering about the local variables x, y, and z.

Typically, functions have specific entry and exit frame code, or (as jmc referred to it), a prolog and an epilog. Basically it means that a set amount of code is inserted at the beginning and end of a function. It works something like this simplified example:
1
2
3
4
5
int quux( int x, int y )
  {
  int z = x + y;
  return z / (x - y);
  }

The prolog adjusts the stack's frame pointer (EBP on Intel architectures) and makes room on the stack for all local variables. A "frame" is simply a reference to a specific position on the program's stack. (For purposes of illustration I will not optimize that local variable out of existence and that we are using the default cdecl calling convention.)
1
2
3
4
5
  ; when the function is called, the stack looks like this:
  (return address) <-- ESP  ; (lower addresses)
  (x)                       ; (function arguments were pushed right to left)
  (y)
  ...              <-- EBP  ; (higher addresses)
1
2
3
4
5
_quux:
  ; The function prolog
  push ebp       ; preserve the last function's frame
  mov  ebp, esp  ; and set this function's frame
  sub  esp, 4    ; make room for our local variable: sizeof( z )
1
2
3
4
5
6
7
  ; now the stack looks like this
  (z)              <-- ESP
  (previous EBP)   <-- EBP
  (return address)
  (x)                       ; (function arguments were pushed right to left)
  (y)
  ...
1
2
3
4
5
6
7
8
9
10
  ; The function body
  mov  eax, [ebp+8]   ; x
  add  eax, [ebp+12]  ; + y
  mov  [ebp-4], eax  ; --> z

  mov  ecx, [ebp+8]   ; x
  sub  ecx, [ebp+12]  ; - y
  xor  edx, edx       ; (high-order word of dividend = 0)
  mov  eax, [ebp-4]   ; z
  idiv ecx            ; z / (x - y) --> quotient eax, remainder edx
1
2
3
4
5
6
  ; The function epilog
  ; (eax contains the function's return value)
  mov  esp, ebp  ; restore the top of the stack to what it was before allocating space for local variables
  pop  ebp       ; restore the previous function's frame
  ret            ; return to previous function (the cdecl calling convention says
                 ; that the caller must remove the arguments x and y from the stack)

So, when the compiler comes across something like
mov eax, [z]
it knows that you want to move the contents of memory at address 'z' into the EAX register. The compiler looks in its current symbol table and discovers that 'z' is a local variable at (in my example) ebp-4. So it replaces [z] with [ebp-4] and assembles that into machine code.

When you call a function, like
call _quux
again the compiler looks in its current symbol table to find the actual address of the function 'quux', and again produces the correct machine code.

Exact details of how the stack frame is constructed and used depend on the function's calling convention. There are many besides the old cdecl.

(Hopefully you understand now why I said that you must know basic assembly to use this stuff.)


Answer 2
The registers EAX, ECX, and EDX are typically understood to be "use as you will". Other registers (including EBX) may (or may not) have restrictions on them. Exactly which registers you can clobber and which registers you must preserve depend on your compiler. You will have to read the documentation to know for sure.

Your assumption is wrong: the variables aren't being declared. They already exist. They are just being used.

In the example you posted, EAX is used as a temporary value. (It wasn't strictly necessary --the code could have been written as just
1
2
3
  push [x]
  push [y]
  push [z]
)
In any case, EAX is changed each time you assign it a value (using the MOV instruction, or any other instruction that modifies it).

Now, if you were using the __fastcall calling convention (MS VC++), you could just use ECX and EDX as the first two arguments. The function
void __fastcall MovePlayerToXYZCoords( float x, float y, float z ) would be called with:
1
2
3
4
5
6
  mov ecx, [x]
  mov edx, [y]
  push [z]
  call @MovePlayerToXYZCoords@12
  ; The function cleaned up its own stack, so we don't need to pop anything here
  ; Also, the function returns nothing, but if it returned an ordinal value it would be in EAX (not on the stack) 



Answer 3
All those things named 'dword_12345' are the names the IDA disassembler gives to things it doesn't know the names of. This is typical of local variables (which you already named in your original program) and temporary values produced by the compiler (which don't have a name to begin with).

You cannot use these IDA names in your inline assembly.

The C++ compiler would have no clue what you are talking about. And, even if you did by some lucky miracle get some name right:
1) it could change the next time you compile and/or disassemble
2) the name wouldn't be listed in the local symbol table the compiler keeps for user names. (It keeps a separate table for symbols that it makes up.)

Your first task will be to figure out what each of those names are.
1
2
3
4
dword_980D84 = player_count (global variable)
dword_7B8440 = ?
sub_658FB0   = adjust_player_hitpoints (function name) [It should be obvious that I am making these names up]
...

The thins with names like 'loc_40A365' are address labels. If there isn't a lable at the given address, you'll have to add one.
1
2
3
4
5
6
  jz short return_label
  ...

  return_label:
  ; do your prolog
  retn


Make sure you look up the meanings of the instructions you are using. RETN means "near return" --are you using a medium, small, or tiny memory model?

Beyond that I cannot guess much to what the code you posted means or does.

Hopefully this helps you get nearer to understanding your goal.
I have not read your entire reply yet, I am just about to. Right now I just read that first comment you made about how I am lucky you responded at all. I am extremely sorry if you took my response the wrong way, I absolutely did not mean to sound rude in any way. I just simply thought to myself that I must have worded the question wrong, because even though it sounded right to me it seemed as though the responses people gave me had thought I was asking about something I wasn't. Please don't hold my response against me, because again, I absolutely meant no disrespect to anyone at all.

I want everyone that may have taken offense to what I said to please not hold it against me. I highly respect everyone on this website that devotes their personal time to helping others, sorry if it came out differently than that. Thanks for the big write up Duoas, I'm going to read it now. To everyone else that offered their help, I appreciate that as well - I wasn't saying I didn't appreciate it, merely that I think you may have misunderstood what I was asking.
Yes, that did help me a ton thanks.
Full IDA view of the function is found here - http://pastebin.com/d376b6aea
Thanks a lot for that write-up Duoas, much appreciated and well written. I'm going to be doing a little more research and learning on assembly so I can better understand everything, I provided you a link with the full process (not much was missing) but if you do not want to convert that to inline for me then it's no problem, I need to learn more myself regardless. Any further help appreciated, but what's provided is already more than I expected.
That link was useful. If you get hung up anywhere I (or someone else here) will be glad to help.

It looks like you can pretty much copy most of the function into some __asm blocks.

Just a few pointers:

Make sure that you declare your C routine as a near call. (In VC++ I think you have to tag it with __near --but I can't get the darn thing to install yet so I haven't used it in a while...) Oh yeah, almost forgot: you'll have to find all the places where the old function was called and fix them to call your new function's address. You can only do this after you have compiled-in the new code.

Rename the labels (that start with 'loc_') to something better, like l_send_message: and l_return:

For the labels that start with 'sub_', find in the IDA listing the line with the address. For example, for 'sub_658FB0', find the routine that starts at 658FB0 (the front of the line in IDA should read
.text:658FB0:
).
At this point, you can see the code for that function, so you can try to figure out what its actual name is. Once you know, replace the 'sub_658FB0' with the actual name --case sensitive. You may also have to mangle the name to match the calling convention and your compiler's type info additions.

For the stuff that starts 'dword_', 'word_', and 'byte_', you should be able to find those in the .data segment (using the same way you did for the routine). However, chances are that you can just drop the address in there directly (assuming you don't change any variables or constants in the program --otherwise you'll have to use its proper name).

Don't put the retn statement in. Let the compiler do that itself for the wrapper function.

Test your modified program as thoroughly as you can. You will probably have to try a few times to get it right.

Good luck!
Last edited on
Topic archived. No new replies allowed.