symbols?

Hi guys,

I'm looking for a good resource or even an explanation on symbols, so I'm currently reading about calling conventions and they seem to click, in a simplistic way of explaining them they are just how a function will be called ie how the parameters will be passed or if they will be passed using registers (__fastcall) or on the stack(__cdecl) and how the stack will be cleaned for example by the caller or callee, but then I came across __declspec(dllimport) and __declspec(dllexport) and the explanations which I have looked at talk a lot about symbols, but what are these symbols and what does __declspec do to these symbols?

from my best understanding or my some what unclear comprehension, a symbol is just the function name that the compiler uses, for example since C++ supports operator overloading it will add something like __int__char or __char__char__ to the end of the function name, in C this will not be done or we can explicitly specify this with extern "C".

thanks
Last edited on
A symbol is a name that's visible by the linker (either compile-time or run-time). A symbol may point to either code or data.

__declspec has nothing to do with the calling convention. It tells the compiler what sort of symbol a declaration should produce and how to mangle the name.
__declspec(dllimport) is used when the declaration is defined in a statically-linked library, or in a dynamically-linked library with an import library.
__declspec(dllexport) is used when compiling a library and the symbol in question should be visible to a user of the library.
those look the same but are not. those are for invoking code in a library; import means you want to use a dll library, and export means your code will produce one. they are NOT calling conventions. They are for windows; unix does not use the same approach to libraries.

if I understood the question, the symbols are indeed the names of the functions, and yes, they are 'mangled' to make unique versions because of overloading and other similar reasons. The mangles sometimes include ID numbers and other nonsense. I think the dll import/export uses a known approach to the mangles and using those may trigger the compiler to use that method of symbol manipulations. But this is where I forget how all that works; its not something one cares about too much on a year to year basis but only once in a while if doing assembly or other very low level work or if you have to write a compiler yourself, etc.


oh ok, that is some what clicking, so what I know is this:

symbols which essentially represent classes/structs & variables & functions are contained in a symbol table by the compiler, this symbol table will tell us if the symbol has been defined or undefined, an example with code below.

The compiler compiles main.c and it recognizes that it calls a function named hello, hello is not defined but this is the linker's job to make sure this function is declared thus it will add a symbol for _hello to the symbol table.

The compiler compiles one.c and adds hello to it's symbol table which will be defined.

Now the linker will link the two object files, and since hello() is defined we should hopefully get no undefined symbol error.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

// one.h

#include <stdio.h>

void hello();

// one.c

#include "one.h"

void hello(){ printf("hello");}

// main.c

#include "one.h"

int main(){ hello(); }


to expand on this:
and this may get quite heavy. I'm sure each function or symbol has an offset as to where it begins in that object file, so how is this offset translated into a physical (well virtual) memory address? I have heard the term relocation being thrown around.

Also let's say that instead of linking two .obj files, instead help() is defined inside of a dynamic library, so is the process different here? obviously we use __declspec(dllexport) but how does this relate?

thanks guys
Last edited on
I'm sure each function or symbol has an offset as to where it begins in that object file, so how is this offset translated into a physical (well virtual) memory address?
The linker first needs to decide where to place it in the executable file, then it can figure out how it will map regions of the file to memory.

The simplest case (only usable for regular programs) is where the linker just sets up the binary so the OS will load the pages at fixed locations in virtual memory. For example, the process' entry point might always be at 0x4000000, which would be specified somewhere in the executable format. Obviously, since all the code is at fixed locations, once the code has been copied or mapped to memory there's nothing else to do, because all memory references, jumps, etc. were generated by the linker knowing where everything was going to be at run-time.

It's also possible to generate relocatable code, which is what's used for dynamic libraries. Dynamic libraries must be relocatable, because there's no guarantee that any particular page will be unoccupied by some other piece of code or data when the process attempts to load the library.
Relocatable code works quite simply. Following the idea that "all problems in computer science can be solved by adding another level of indirection", instead of generating direct references to memory addresses, the linker generates references to a table to memory addresses. When the executable or dynamic library is loaded by the OS, the OS's load routine must decide a place to load it and then fill the relocation table with the correct addresses, according to the rules established by the binary format.

I've actually had to implement the relocation routine when I needed to load a DLL without passing through Windows' LdrLoadDll(), because the process was intercepting those calls. This is what that looks like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
__declspec(dllexport) std::uint32_t __stdcall load_dll(void *p){
    auto manual_inject = (MANUAL_INJECT *)p;
    auto pIBR = manual_inject->base_relocation;
    auto delta = (uintptr_t)manual_inject->image_base - (uintptr_t)manual_inject->nt_headers->OptionalHeader.ImageBase;
 
    //Relocate the image.
 
    while (pIBR->VirtualAddress){
        if (pIBR->SizeOfBlock >= sizeof(IMAGE_BASE_RELOCATION)){
            auto count = (pIBR->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(WORD);
            auto list = (WORD *)(pIBR + 1);
 
            for (decltype(count) i = 0; i < count; i++){
                if (list[i]){
                    auto ptr = (uintptr_t *)((char *)manual_inject->image_base + ((uintptr_t)pIBR->VirtualAddress + (list[i] & 0xFFF)));
                    *ptr += delta;
                }
            }
        }
 
        pIBR = (IMAGE_BASE_RELOCATION *)((char *)pIBR + pIBR->SizeOfBlock);
    }

    //... 
Topic archived. No new replies allowed.