i need a programming correction: what is machinelanguage?

Pages: 12
from school i had learned that the machine language is zero's and one's.
but is these a language or the machine language is the Assembly(the CPU language)?
Assembler is a level up from machine language.
An assembler translates human readable instructions into machine language.
Usually one assembler instruction represents one machine instruction, but assemblers support macros that can generate multiple machine instructions.

The next level up from an assembler are compilers. In the early days of C and C++ compilers, the compiler would generate an intermediate file of assembler instructions that would them be assembled into machine instructions. Today most compilers generate the machine code directly.
i can convert C\C++ in Assembler... but how can i compile(on Windows and Linux) the code to exe?
On GCC at least, the step to converting to assembly is:
g++ -S main.cpp -o main.s

To then convert that assembly into the final executable, you'd do:
g++ main.s -o out
I couldn't get this thing to show machine code, but it's a good visual representation of source -> assembler. Hope you find it useful.
https://godbolt.org/z/mPAJak
windows used to have masm (microsoft's assembler) and there may be others. I haven't tried to find an assembler in a while, but I would guess that one is still active?

Good answers, but I thought I'd chime in with an accumulation and summary.

Many will say assembler and machine code are not the same, and technically while that is true, there is a one to one correspondence between assembler nuemonics and machine code. The only real difference is that the assembler is the human readable version of machine code, where machine code is a numeric form and the assembler is a pseudo word form of the same instruction. It would be torture to write directly in machine code, but the assembler is more comprehensible even though it means exactly the same thing.

Under the hood, the CPU is a device with allegories to mechanical counterparts. An oversimplified illustration is simple gearing. Imagine you have one gear with 10 teeth, which contacts a gear with 30 teeth. One turn of the 30 tooth gear will turn the 10 tooth gear 3 times. This is a multiplier by 3. Extend this notion with gears for each of the digits 2 through 9 (1 seems a waste), and you have a simple multiplication engine.

The CPU's electronic implementation is, of course, not gears, but they are electronics (and base 2) allegories to them. There are such components for the important operations, like addition, subtraction, multiplication, division, comparison, etc.

Now, imagine a physical, gear oriented machine operated by a motor which can so those things. "Programming" such a gear oriented machine amounts to shifting levers, like gears on a manual transmission, to active one component or another to do work.

The machine language/assembler instruction resort to doing that kind of thing electronically, selecting particular primitive components in the CPU to do the required work. We tend to think of them as language components, but in the circuits of the CPU they are similar to throwing switches to select one component or another for a particular operation.

Of course, that includes moving parameters into appropriate register (which correspond to twirling some knob on a physical, gear oriented machine to do something similar).

Each CPU design has it's own implementation. The instructions used for the typical Intel/AMD CPU differ from those of the ARM or Power CPU's. There is a general sense of similarity to these assembler languages, but each CPU has it's own specific language. The ARM and Power CPU's have a significant difference from Intel because they are "RISC" designs and Intel is not (at least, not the assembler/machine language set - for historical reasons). Internally, Intel/AMD CPU's have "RISC" behaviors, but are not strictly RISC languages. What this means is that the instructions for ARM are more verbose than for Intel, while the ARM may have slight efficiency gains even though the programs are usually larger.

Historically, C was designed to be an assembler. This is a matter of history from 1970 (69/68). The objective was to create a language to write the UNIX operating system that would be a CPU independent assembler level language (with higher level extensions for convenience and development efficiency). This relationship is recognizable where some of the most primitive statements and clauses in the C language translate into 1 or 2 machine language instructions (like "if" or "switch", or the simple math operations like multiplication, subtraction, increment, etc).

Today's compilers are a lot more advanced than counterparts from the origins of C. Most (as in nearly any you would use today) are optimizing compilers that understand how to translate many language sources into extremely efficient assembly/machine code, quite often with better results than human authors can obtain. As a result, the tendency to write directly in an assembler is rare now, and instead is more often found as "inline assembler" within C or C++ code.

Several compilers now generate a intermediate language from C and C++ source, which is subsequently translated into assembler at the link stage. The intermediate language is "universal" to all CPU's. Hardly anyone bothers to read this intermediate language.

While you can see the assembler result from compilation if you request it (in some compilers), I find it more useful to merely use a "disassembly" window in a debugger, so I can see it "live" while the software executes. It is useful to understand the assembler implications of the code we write, but because of optimization that has become somewhat more confusing than it used to be. You can see very unexpected results from your source code, representing a kind of "black magic" the optimizer is able to employ when generating code.

thank you so much Niccolo.
i need ask more 1 thing: we have the *.asm(assembler) file, now what i must do for i get the *.exe file?
GCC compile creates the *.exe file... but the Windows or Linux don't have their own compiler?
Last edited on
now what i must do for i get the *.exe file?
you "compile" it using a program that eats assembly language (this is called assembling instead of compiling and the tool is called an assembler). This is what I was saying: masm is the program you want for windows, as far as I know. It makes a .exe file on windows from a .asm file. You can also get g++ on windows, and it will do it too. There are different ways to get g++, I use Cygwin. Some of their stuff is a little quirky, but it does all I need these days.

thank you jonnin.
can i get a masm tutorial or even an Assembler tutorial explain the diferences between CPU's and OS's?
masm gonna work a lot like a c++ compiler. You can find some tutorials with it, including the gun, which is a copy of notepad (very high performance version) from windows written in c++, and some other small examples.

Different CPUs is a giant topic at this level. Each cpu is going to have its own assembler/machine language, and they are not at all compatible. Most of us usually talk in x86 here, and that is what masm speaks and what most home and office PC use if running windows or Linux. I do not know what computers run non x86 right now, eg no clue whats inside a mac.

OS do things differently, and programs interface to the OS as well as the hardware, so you have some differences at the assembly level. I do not know what anymore; I haven't done this in > 20 years.

What little asm I have done in the later years is a little block here and there in a c++ program.

If you want to write a full program, youll have to get on the web and dig around. If you just want to embed a few statements -- each compiler does that syntax a little differently it seems, but its also all similar, and the support for halfway statements is also all over the place (half way statements meaning access of a c++ variable in asm block directly, the compiler bridges the gap, visual c++ has a good version of this). Masm would not like these small handy shortcuts, its a little different.

What is your goal? In my experience, there are only 2 things most c++ programmers do in assembly, if they do any at all:
1) speed up something super critical (either by examine asm generated and change c++ until generate is cleaner or by directly fixing it)
2) do something you can't do easily in c++ (eg access the 80+ bit floats)

writing full programs in assembly is rare: it takes between 3 and 10 statements to do what 1 c++ statement can do on the average, and a few things are even worse.

I dunno how much asm speedup is used anymore either. Ive gone from < 30hz computers where that was important to todays 3000000000 hz + computers where its usually fast enough without it.
Last edited on
i need learn... i know a full program is too much using Assembly.
i have the GNU\G++\GCC compiler... it's free ;)
if i use Assembly on C++(GNU compiler), i must use the masm X86 version?
you do not have to use masm. g++ works fine on windows if you have it.
If you use inline assembler (which is the common and practical approach used today), you do not need masm. Inline assembler works on most compilers, though the exact syntax of the inline code differs on each compiler, the result is the same.

Typically one creates a C or C++ function, then does something like this:

1
2
3
4
 __asm__ ("movl %eax, %ebx\n\t"
          "movl $56, %esi\n\t"
          "movl %ecx, $label(%edx,%ebx,$4)\n\t"
          "movb %ah, (%ebx)");


That said, what do you intend to do with assembler?

If you're writing in C/C++, you generally have zero need for assembler - until such time that you have such a need, and by then your skills should be sufficiently advanced that the line of questions you're asking would already be understood.

Put another way, your inquiry seems way out of bounds for the level of study you appear to have. I sense you may be leading yourself into a direction you don't need.

If you do need assembler, it is likely you'll need inline assembler for some specific optimization. Since you've never exposed any specific optimization concern, I get the sense you have no actual need for assembler.

Anyone writing in C or C++ at an early level never considers assembler, which leaves me with the puzzle as to why you're asking. If we better understand your motivation for the information, we may have far better information to provide.
Last edited on
Niccolo: i test the code, but with normal problems that i have
that's why i can't or don't understand how use Assembly :(

error messages:
1- junk `(%edx,%ebx,$4)' after expression
2 - operand type mismatch for `mov'

want just learn and maybe create something with Assembler.. it's more for learn...
but how i can learn if not all code it's compatible? :(
its not that big a deal. you learn the basic assembly commands, for example mov eax, variable and may need to change it just a little for different compilers, but the core command parts are more or less the same. Its a little zenny but you are worrying too much.
jonnin you have right, i'm worrying too much, but isn't my fault... the Assembler, even with C\C++ Inline Assembler, have several versions :(
1 - can you give me a code that prints "hello world"?
2 - can you share a link for the tutorial?
@Cambalinho, your latest post shows me you are punching way above your weight at the moment.

Before you work assembler, you should more fully understand the compiler tools, C/C++, and have a reason to bother with assembler.

In the last decade or two, the use of assembler became nearly obsolete because it has been proven that C/C++ compilers optimize to such a degree that a human programmer usually can't even match the performance offered by the compiler.

While @jonnin is correct to say the core commands of assembler are more or less the same, every target CPU has a different "language". The way we do something on Intel/AMD differs completely from the way it is done on ARM CPU's. Even with Intel/AMD, a 64 bit CPU works very differently than a 32 bit CPU. The assembler code must be written for each target, which is one reason we don't usually bother.

You asked for a "hello world" example, but strangely printing is actually quite difficult in assembler. The functions used to print on an operating system are C function calls. To use them from assembler you have to be able to make calls to C functions from assembler. This is not for beginners.

One typical use of inline assembler is to perform math rapidly in ways that might be slower when written in C. If you perform the multiplication of a matrix by a vector, for example, it may be faster in assembler. This would be written as a C function where the body of the code is inline assembler. If you are not performing linear algebra, this is a bad choice for an example. There are also multiple ways to do it. The CPU has resources, like a floating point processor, the MMX extensions, the SSEn extensions, and the AVX extensions - and even though these are all on the same CPU, each one has a different way to accomplish this in assembler (some are extremely different).

I see that you use GCC, and while there is nothing wrong with GCC, you may find a more enlightening study if you can use Visual Studio with the same source code, but open the debugger's disassembly window to watch the assembler/machine code generated by C code. Trace through it, reading the assembler which is the result of C code and you gain tremendous understanding as you experiment and study.

In Linux, using GCC, I think various IDE's can do this too - I just don't know them well enough to advise.

The summary point is that I sense you've ventured into a curiosity that may be too far a diversion from more pertinent study of C and C++, even though I and many from my generation came from assembler/C first.

yes i'm using GCC, so what is the best tutorial for i start?
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html looks like a good starting point, but, it assumes you know something about intel x86 assembly and does no explain what the asm statements are doing.

I would start here to get a handle on raw assembly:
https://software.intel.com/en-us/articles/introduction-to-x64-assembly

The above warns you that 64 bit asm isn't supported in many c++ compilers for some unholy reason, so you may want to focus on the 32 bit instructions and registers.

Between these two resources you should be able to do something simple.
Why not try some practical but simple programs? Some homeworks:

reverse the integer endian and use c++ to print the variable both endian formats in hex.

swap two integers using registers and memory commands, no temps.

move a FPU sized floating point value's bytes out of the FPU and access it via c++ (even if just printing it as raw bytes). Some c++ compilers have codes to allow use of these variables (eg long double may on some systems) but I do not know if your G++ does or no.

code a crc32 in c++ and again in inline asm and compare the run time and answer (should be identical) over a gigabyte sized file.


Last edited on
Pages: 12