Ok, assembly written for x64 can never work on x86 CPUs because x64 assembly is a superset of x86. That aside, as far as I see it, there are only 3 issues that stop assembly from being cross-platform for OSs.
1. Syntax of assemblers e.g. ml64.exe accepts different syntax then whatever assembler x64 Linux uses.
2. Calling conventions vary between OSs.
3. API functions.
Now say you have a c++ aplication from which you want to call some assembly functions (extern "C").
1. As far as I see it, this is a non-issue as syntax consistency can easily be achieved by writing an assembler that accepts the same syntax on both platforms.
2. This is a tough one but say on x64 linux the 4th (quadword) parameter is in rcx while on x64 windows the 1st parameter is rcx. So you can write c++ code that stores all of the functions's arguements inside a struct. Then you can call an assembly function (via 'extern "C"') and have a pointer to that struct in the 1st and 4th arguements (put some random junk in the other 2 arguements). Then when writing the assembly you will know that rcx will point to the struct that has all the parameter data.
The other thing to consider is 'scratch' (registers that must be saved by caller) vs non-scratch registers. This can be over-come by simply making sure that both sets of non-scratch registers get preserved by the callee. Inefficient but it should work.
3. Simply don't use these. Plenty of useful assembly code (SSE, AVX optimizations etc) can avoid using any API functions. Or if the function for example needs heap space, then that space can be allocated outside the assembly function (in cross-platform c++) and a pointer to it can be passed in the struct mentioned in '2'.
Is there anything else to consider? Has anyone tried something like this?
Assembly in some cases is cross-platform. The problem is how it's handled per assembler and compiler. For instance, I have some cpuid code that works across most x86 processors, regardless of platform.
Calling conventions don't really differ that much in C across platforms. Windows has an unofficial ABI (which dictates calling convention) and most other compilers follow that, including Intel, GCC, Clang, etc. Calling C code from assembly shouldn't differ much between platforms although I've not tried it so I could be wrong.
Most of the time, you'll have to make OS API calls which is going to be the bigger problem. There is no way to avoid that.
2. This is actually not all that difficult, if you're willing to make certain assumptions (e.g. that the callee will neither throw nor use TLS [e.g. no rand()] nor do anything weird like that). You just need to understand that the caller in platform X leaves the machine in, for example, a state of the form state(param1, param2, ..., caller-state1, caller-state2, ...) and the platform-agnostic callee expects something of the form state(caller-state1, param1, caller-state2, param2, ...), and conversely the callee will leave the machine in a state one form and the caller will expect a state of some other form. All you do is put a platform-specific function in the middle -- called a "shim" in this context -- that transforms the state in one direction on entry and in the opposite direction on exit. "Anti-shims", so to speak, can be used when calling regular C code from the platform-agnostic code.
The advantage of shims is that the caller code can be left untouched.
All of these things are done already. For code that doesn't do anything with the thread context, writing x86-* Assembly for the same architecture and different OSs is not too difficult; it's the whole point of things like NASM.
"For instance, I have some cpuid code that works across most x86 processors, regardless of platform. "
Is this inline assembly?
"Most of the time, you'll have to make OS API calls which is going to be the bigger problem. There is no way to avoid that. "
This doesn't apply to me. Most of the time I want to use assembly is basically to modify large byte arrays in a complex manner. SSE, AVX and tricks with self-modifying code (to avoid branch miss-prediction because of unpredictable conditional jumps) is all easy to do in assembly and quite difficult or sometimes impossible to do in c++.
tricks with self-modifying code (to avoid branch miss-prediction because of unpredictable conditional jumps)
This can be done neatly and efficiently with templates if the code doesn't need to be "patched" too often, at the cost of somewhat increased code size.
"This can be done neatly and efficiently with templates if the code doesn't need to be "patched" too often, at the cost of somewhat increased code size. "
It depends on the use case. Templates tend to only work for basic cases. A lot of SSE instructions for example take an 8-bit immediate value. To turn that 8-bit immediate into a dynamic value, self-modifying code is a good solution as it limits code size and does not requite a conditional jump because a conditional move can be used (of course pipeline issues must be considered). If (for example) you have a large loop with 2 SSE instructions which have 'dynamic' imm8 values then there are 2^16 cases to consider, thus without self-modifying code there is no good solution.
Yeah I use that page quite often. However those intrinsics are actually more difficult to debug and work with then assembly (mostly because compilers tend to choose to store most values in the CPU cache rather then in registers; so without a memory window it's hard to keep up with what is going on). What I'm often doing at the moment is writing code in assembly first and then converting it to intrinsics.
I see what you mean. However, for that use case, I think a code generator gives you the same advantages while being easier to maintain. For example, https://github.com/kobalicek/asmjit
because it's assembly.... it's basically just a wrapper for hexadecimal. You can't get any lower than hexadecimal, lol.
I guess by "hexadecimal" you're really referring to machine code, which is the raw data that the CPU reads. Hexadecimal is a numeric base; true, it's commonly used to represent machine code (purely because it's a convenient compromise between human-readability and compatibility with binary), but the relationship goes no further.
Machine code is actually not the lowest level: most CPUs don't always execute machine code directly because their ISA (Instruction Set Architecture) includes compound instructions, which is called a CISC (Complex Instruction Set Computing) architecture. An example of a compound instruction is PUSHA on x86, which PUSHes all the general purpose registers on the stack in one instruction. The CPU uses what's called a microprogram (written in microcode) to translate compound instructions into simple ones that the hardware can execute.
However, in terms of what you as a programmer can do, machine code is the lowest level because the specification for Intel's microcode is kept under lock-and-key. It's possible to update the microprogram stored on the CPU, like you can flash your BIOS, but it's encrypted and checksummed to ensure you can't reverse-engineer it or write your own. Also, the microcode differs between vendors, CPUs and even different versions of the same CPU (e.g. P3 and P4). If you get a job at Intel they might teach you how to program your own microcode but otherwise it's pretty much out of the question.
[edit] It probably is ROM but other firmware is stored on ROM too. It's probably EEPROM. Seems kinda dumb to call it ROM when it can actually be written to. They should call it "Read-Mostly Memory".