In many applications, mixing Assembly and C is routine (pun intended). There are many reasons for it, but, in general, you want to use Assembly when you want to deal with the hardware directly or perform a task with maximum speed and minimum use of resources, while you use C to perform some high level stuffs that don’t attend the former requirements. In either case, you’ll need one integrated system.
There are three ways to mix Assembly and C:
- Using Assembly-defined functions into C
- Using C-defined functions into Assembly
- Using Assembly code in C
We’ll explore them all in this tutorial.
Using Assembly-defined functions in C
Let’s first take the example of a function that takes no parameters and doesn’t return anything, like one that just prints something on screen.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
(If you don’t quite understand the above syntax, read my previous tutorial)
Now let’s create a C program to call this function:
1 2 3 4 5 6 7 |
|
Notice the use of extern
keyword. It tells the compiler that the definition of a given function or variable is defined in somewhere else other than the current file. It’s the linker job to connect this declaration with the actual definition.
Now let’s compile and link our both programs at the same time in order to obtain an executable file:
1
|
|
That’s all! Pretty easy, right? Now let’s advance to a more challenging scenario: A function that returns a value. As I said on previous tutorial, by convention, Assembly functions return values on AX
register. This is also true for C programs. Check out this example:
1 2 3 4 5 |
|
This function only puts the value ‘10’ into the EAX register. Now on C side:
1 2 3 4 5 6 7 8 |
|
It’s worth noting that, on Assembly side, I’m moving a two words value into the EAX register. I could move a four words value to the RAX register instead, but it would print 0. Why? Here’s the reason:
As you may know, RAX is the 64 bits version of the AX register, hence it can store 64 bits simultaneously. Those bits are stored from left to right, i.e., let’s suppose we move the decimal value ‘10’ into the RAX register. It would appear that way:
01010000000000…0 (0101 + 60 zeroes).
The EAX holds the 32 most significant bits (the lower half), therefore, if I access this sequence through EAX, I would only see zero values! And this is what the int
datatype is implicitly converted to, since it’s a datatype with size equals to 32 bits. In order to avoid this problem, I should either stick with EAX, EBX… registers or use long int
on C side.
Lesson learnt: One must check if the size of registers match the size of types in C.
Now the last scenario: A function that takes parameters and returns a value, like that one that returns the sum of two values:
1 2 3 4 5 6 7 8 9 |
|
Now the Assembly definition:
1 2 3 4 5 6 |
|
You may be asking: Hey, what’s wrong? Why am I using the edi
and esi
registers?
Here’s the trick: In GCC compiler, instead of the parameters being pushed into the stack by the callee to be read from the calling function, they are stored in registers. It’s the calling function job to push them into the stack if they need to. Those registers are used in the following order:
- _di: Holds the first argument
- _si: Holds the second argument
- _dx: Holds the third argument
- _cx: Holds the fourth argument
- r8d: Holds the fifth argument
- r9d: Holds the sixth argument
And so on… In the above example, the value 2
is stored in the edi
register and the value 3
is stored in the esi
register. Therefore, we simply sum them (through the addl
instruction) and move the result to eax
register.
Using C-defined functions into Assembly
Here’s the first example: Using the printf
C function into Assembly:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now compile the Assembly program with GCC:
1
|
|
The GCC will automatically link with the function definition. In the same way we used the extern
keyword in C, we use the .extern
directive to tell the Assembler that printf
is defined externally.
That is equivalent to the following C program:
1 2 3 4 5 6 |
|
When compiling Assembly programs with GCC, the starting symbol is no longer _start
but main
. main
is a function, therefore it must have the ret
instruction in the end of it.
The printf
in C takes two or more parameters: The format and the value(s). As said previously, the first parameters goes to rdi
register while the second parameter goes to rsi
register. Note: Before calling the function, the value of rax
must be zero!
Our second example is using the scanf
function. Like printf
, it takes two more parameters: The format and the destinating addresses where the standard input will be stored. Note: The second and so on parameters are no longer values, but memory addresses (pointers).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
First, we declare three “variables” in data section:
- a: A two words (32 bits) region of memory that initially stores the value zero;
- b: A two words (32 bits) region of memory that initially stores the value zero;
- format: A region of memory that stores the ASCII string “%d %d”.
We then pass the address of format
as first parameter, the address of a
as second parameter and the address of b
as third parameter. Before calling scanf
, we set RAX
to 0 (just like in the printf example). After it, we move the value stored in a
address to eax
register and the value stored in b
address to ebx
register. We then sum them both and store the result in eax
.
After executing the program, if we echo the program execution status:
1
|
|
We’ll able to see the sum of both typed numbers.
The above example is equivalent to the following C program:
1 2 3 4 5 6 7 8 9 10 |
|
Using Assembly code in C
Our third category is pretty straight-forward. See the example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Now you can compile it normally:
1
|
|
The compiler will simply insert the assembly code in the appropriated place in the compiled code.
Conclusion
We’ve just learnt very very powerful tools! Learning how to mix Assembly and C give us a deep insight of how the C compiler actually works. I strongly recommend this website for further learning. Play with it around, try some snippets, and see how it’s translated into Assembly.