GCC compatible inline assembly with Clang

This was originally intended as a follow-up to my LLDB and Clang cheat sheet to jot down what I learned about writing inline assembly on macOS and how it differs from the Visual Studio implementation on Windows which was taught in my programming class. Only I never got around to writing it because of work and studies taking up all my time. So three and half years later, here’s how to write inline assembly on macOS and Clang.

In this post, I will introduce the concept of inline assembly and its usage. I will then describe what I learned about inline assembly before delving into what I found out when I replicated the Windows material from my class notes on my mac. This is not a post on assembly language so it will only be referenced at a relatively high level and the examples will be fairly simplistic. There is some exceptionally good material to learn inline assembly from but it is hard to find and mostly referenced in obscure places. This post references the best resources I found all in one place.

Introduction to inline assembly

When C programs are compiled, they are first converted into assembly language and then assembled and linked to create an executable. The process is described in the book ‘Computer Systems A Programmer’s Perspective’ by Randal Bryant and David O’Hallaron.

Preprocessor – The C source code is preprocessed to remove comments, replace constants with their values and insert the contents of the included header files.
Compiler – The modified source is compiled into assembly language.
Assembler – The assembly code is assembled and relocatable object programs are output. Unlike the previous stages, these files are in binary format.
Linker – Links the relocatable objects and outputs the final executable binary. This stage will link the code from the standard library and other libraries with your code.

Conversion to assembly language is a part of the compilation process however, the major compilers allow assembly to be written inline, within a program. The main reason for using inline assembly is writing highly optimized code where efficiency is of the utmost importance and the optimizations provided by the compiler are insufficient. It’s likely many programmers will never encounter inline assembly but if you work with an operating system kernel you may come across asm macros.

x86 and x86-64 assembly

There are differences between 32-bit and 64-bit x86 assembly, most notably for this blog post:

The calling convention on 32-bit systems involved pushing the function arguments onto the stack before calling the function. The 64-bit convention involves using the registers rdi, rsi, rdx, rcx, r8, r9 for the first six arguments and pushing the remainder onto the stack in reverse order, such that they are popped off in order.
The addition of new registers: r8–r15.
The addition of the 64-bit registers: 32-bit systems use registers such as eax which are 32 bits in size. These are still available as the lower 32 bits of their 64-bit equivalents. The 64-bit registers start with an r so for a full example: rax, eax, ax, al for the a register’s 64, 32, 16 and 8 bit variations. Microsoft published a useful table for each register.

Inline assembly on Windows

On Windows, the Visual Studio IDE is capable of handling inline assembly. The syntax is fairly straightforward. Intel assembly syntax is used and enclosed in an __asm block. All variables in scope of the block are accessible using inline assembly and can even be accessed using their high level variable name. Visual Studio will generate the appropriate assembly code to allow this to work, making its inline assembly much like a hybrid between C and Assembly.

A very simple example to demonstrate these features is shown below. This code sets two integer variables (x and y) in C and then uses inline assembly to set the value of the y variable to equal the value in the x variable. This code is also using 32-bit x86 assembly language.

# include <stdio.h>

int main (int argc, char **argv)
{
    int x = 500;
    int y = 0;
    
    printf ("Before assembly, x = %d, y = %d \n", x, y);
    
    __asm
    {
        mov eax, x
        mov y, eax
    }
    
    printf ("After assembly, x = %d, y = %d \n", x, y);
    return 0;
}

Inline assembly on macOS

The default compiler on modern versions of macOS is Clang with the LLVM toolchain which can be installed with the Xcode developer tools. Clang is compatible with GCC inline assembly. The use of Extended Asm allows the reading and writing of C variables within inline assembly blocks.

When writing inline assembly on a mac, it is required to use AT&T syntax, rather than Intel, as Windows does. The main differences to be aware of are that AT&T register operands are preceded by % as opposed to their undelimited Intel equivalents. The order for source and destination operands is reversed e.g. mnemonic source destination. And mnemonic suffixes of b, w, l and q are used in AT&T assembly to specify the size of memory operands e.g. movw to move a 32-bit (word). Intel, on the other hand, prefixes memory operands.

Another requirement on macOS is to use 64-bit assembly language. Modern macs removed support for 32-bit applications and I wasn’t even able to find a way to compile 32-bit applications when I studied this module. As a result, all of the examples in this section use x86_64 assembly.

Extended inline assembly

All of the examples in this post will use extended assembly. An example of the general format is shown below. The keen-eyed will notice that this is the same program demonstrated in the Windows section, translated to GCC inline assembly.

#include <stdio.h>

int main(int argc, char **argv)
{
    int x = 500;
    int y = 0;

    printf("Before assembly, x = %d, y = %d\n", x, y);

    asm("movl %1, %%eax;"
        "movl %%eax, %0;"
        :"=r"(y)        /* output */
        :"r"(x)         /* input */
        :"%eax"         /* clobbered register */
    );

    printf("After assembly, x = %d, y = %d\n", x, y);
    return 0;
}

The assembly code is contained in an asm statement, though __asm__ is also valid if asm conflicts with something in your code. The use of extended assembly is indicated by the presence of the input, output and clobber lists, beginning with colon characters at the end of the assembly code. The format is "=r"(variable) to show a variable that will be written in the output list, "r"(variable) for a variable that will be read in the input list. And the clobber list is just a list of clobbered (changed) registers. Multiple entries are entered by comma separating a list of items in this format.

In this example, a few things should stand out that differentiate inline assembly from assembly code. Firstly, the %0 and %1 are the variables which were passed in using the input/output lists. It is possible to specify a register here, such as :"a"(variable) to store an input variable in the a register (rax) but in this case the compiler will choose a register and this may ultimately lead to better optimization. Because there is no way to know which register the compiler chose, the registers %0 and so on are used. This leads to the next point: registers have two percent signs for example %%eax. The single percent syntax is already in use for the variables, GCC inline assembly uses two to refer to literal registers. Furthermore, each instruction is enclosed in quotes.

In another very simple example, a constant (the decimal value five hundred) will be written to a variable using a compiler-chosen register. Constants have a $ prepended because this is the syntax used to identify an immediate operand in AT&T assembler. Furthermore, the decimal value 500 could also be entered as hexadecimal using the syntax 0x1f4

#include <stdio.h>

int main(int argc, char **argv)
{
    int x = 0;

    printf("Before assembly, x = %d\n", x);

    asm("movl $500, %0;"
        :"=r"(x)        /* x is 'write only' output */
    );

    printf("After assembly, x = %d\n", x);
    return 0;
}

Calling functions

Functions can be called using the call instruction. As described above, registers are used to pass parameters so a simple example, to call printf, looks like the following:

#include <stdio.h>

int main(int argc, char **argv)
{
    char *message = "Hello world!\n";

    asm("movq %%rax, %%rdi;"
        "call _printf;"
        :
        : "a" (message)
        : "%rdi"        /* clobbering %rdi */
    ); /* note we need to use %rdi etc for passing params in x86_64 */

    return 0;
}

Saving registers across calls

Not all registers are saved across calls. Some, for example, the rdi register which is used for passing the 1st argument to a function, are not. In the following example, only the first call to printf has the argument “Hello world!”.

#include <stdio.h>

int main(int argc, char **argv)
{
    char *message = "Hello world!\n";

    asm("movq %%rax, %%rdi;"
        "call _printf;"
        "call _printf;" /* %rdi nolonger contains message! */
        "call _printf;" /* %rdi nolonger contains message! */
        :
        : "a" (message)
        : "%rdi"        /* clobbering %rdi */
    ); /* note this compiles but doesn't work as %rdi doesn't hold its
        value after the call to printf as the stack does. */

    return 0;
}

Returning values

A function like printf takes parameters but does not return any values. In this example, I have written a user defined function – sub – which takes two integers and returns the value of the first minus the second. This demonstrates that the return value is placed in the rax register and in this case, it is moved to rdx as a final step to assign the result variable to it.

#include <stdio.h>

int sub(int x, int y);

int main(int argc, char **argv)
{
    int result = 0;

    asm("movq $500, %%rdi;"
        "movq $200, %%rsi;"
        "call _sub;"
        "movq %%rax, %%rdx;"
        : "=d"(result)
        :
        : "%rdi", "%rsi", "%rax"
    ); /* Remember the return value clobbers %rax! */

    printf("result = %d\n", result);

    return 0;
}

int sub(int x, int y)
{
    return x - y;
}

Another example, using 32-bit registers and a compiler-chosen register is also shown for demonstration:

#include <stdio.h>

int sub(int x, int y);

int main(int argc, char **argv)
{
    int result = 0;

    asm("movl $200, %%edi;"
        "movl $500, %%esi;"
        "call _sub;"
        "movl %%eax, %0;"
        : "=r"(result)
        :
        : "%edi", "%esi", "%eax"
    ); 
    /* Also works with 32 bit registers! 
    I swapped the order of params too so result is -300 this time.
    This time we allow the compiler to pick a register. */

    printf("result = %d\n", result);

    return 0;
}

int sub(int x, int y)
{
    return x - y;
}

Some basic instructions

In the last example, I used a C function to subtract an integer from another integer which demonstrated a point about return values nicely but is contrived. In real life, the inbuilt instructions for math operations would be used instead. In this section some basic math instructions will be introduced. Division is a little more complex and not shown here because it was out of scope of the module.

Subtraction

#include <stdio.h>

int main(int argc, char **argv)
{
    int x = 500, y = 200, z;

    asm("subl %%ebx, %%eax;"
        "movl %%eax, %%ecx;"
        : "=c"  (z)
        : "a"   (x), "b" (y)
        :                   /* empty clobber-list */
    );

    printf("x = %d\ny = %d\nz = %d\n", x, y, z);
    return 0;
}

Addition

#include <stdio.h>

int main(int argc, char **argv)
{
    int x = 500, y = 200, z;

    asm("addl %%ebx, %%eax;"
        "movl %%eax, %%ecx;"
        : "=c"  (z)
        : "a"   (x), "b" (y)
        :                   /* empty clobber-list */
    );

    printf("x = %d\ny = %d\nz = %d\n", x, y, z);
    return 0;
}

Multiplication

#include <stdio.h>

int main(int argc, char **argv)
{
    int x = 500, y = 200, z;

    asm("imull %%eax, %%ebx;"
        "movl %%ebx, %%ecx;"
        : "=c"  (z)
        : "a"   (x), "b" (y)
        :                   /* empty clobber-list */
    );
    /* Note: imul can also work with [constant], [reg], [reg]
    For example: "imull $0x500, %%ebx, %%ecx" 
    Multiplies 500 by %ebx and puts the result in %ecx */

    printf("x = %d\ny = %d\nz = %d\n", x, y, z);
    return 0;
}

Tying it all together

In this final example, the arguments passed to the program via argv are printed. Many of the techniques shown in this post and more are demonstrated here. This is not a perfect snippet, I did not manage to get everything to work exactly like I wanted in the week I had to learn all this. The main compromise I had to make was to hard code the number of arguments to print. To stay true to the original post (and because I’m lazy), I will not be updating this code.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    char *format = "Argument %d: %s\n";
    char *message = "All arguments printed\n";
    int i;

    asm("movl $0x0, %0;"
        "movl $0x3, %%r15d;" // hardcode number of args (see note above^)
        "jmp CHECK_I;"

        "INCREMENT_I:"
        "pop %%rax;"
        "movq %%rax, %3;"
        "pop %%rax;"
        "movl %%eax, %0;"
        "pop %%rax;"
        "movl %%eax, %%r15d;"
        "incl %0;"

        "CHECK_I:"
        "movl %0, %%eax;"
        "cmpl %%r15d, %%eax;"
        "jge END_LOOP;"
        "leaq L_.str(%%rip), %%rdi;"
        "movl %0, %%esi;"
        "movq $0, %%rdx;"
        "movq %3, %%rax;"
        "movslq %0, %%rcx;"
        "movq (%%rax,%%rcx,8), %%rdx;"
        "xor %%rax, %%rax;"
        "movslq %%r15d, %%rax;"
        "push %%rax;"
        "xor %%rax, %%rax;"
        "movslq %0, %%rax;"
        "push %%rax;"
        "xor %%rax, %%rax;"
        "movq %3, %%rax;"
        "push %%rax;"
        "call _printf;"
        "jmp INCREMENT_I;"

        "END_LOOP:"
        "xor %%rax, %%rax;"
        "leaq L_.str.1(%%rip), %%rdi;"
        "call _printf;"

        : "=r"  (i), "=r" (argv)
        : "r"   (i), "r" (argv)
        : "%eax", "%esi", "%rcx", "%rax", "%rdx", "%rdi", "%r15d"
    );

    return 0;
}

ARM-based macs

With the release of the M1 chip, Apple is making a move away from Intel and towards using their own silicon in macs. In terms of inline assembly, this means none of the examples above will work anymore. However, Clang is entirely capable of supporting ARM assembly language and the extended assembly format remains largely valid, but the code must be written using ARM assembly. This incompatibility highlights the fact that assembly language is not very portable and why using inline assembly should be avoided where possible in favor of portable C/C++ (or maybe Rust). Full disclosure, as a poor student, I am still using the same mac I was before I started at university. As such, while I can cross-compile a program using inline assembly, I can’t actually try running it on an M1 mac – sad.

The very first example is shown below, translated to ARM. For a full explanation of the ARM registers, see the documentation.

#include <stdio.h>

int main(int argc, char **argv)
{
    int x = 500;
    int y = 0;

    printf("Before assembly, x = %d, y = %d\n", x, y);

    asm(/* Not working :(
        "mov r0, %w1\n\t"
        "mov %w0, r0\n\t" */
        "mov %w0, %w1"  /* Seems to work */
        :"=r"(y)        /* output */
        :"r"(x)         /* input */
        :"r0"           /* clobbered register */
    );

    printf("After assembly, x = %d, y = %d\n", x, y);
    return 0;
}

I had a fair few problems getting this to compile and in the end, the single line mov from x to y is the only thing that actually didn’t throw any errors. Of course this is still untested and may not work anyway. Some interesting differences with ARM is how new lines are inserted, the size is indicated in the variables and the source/destination operands are the same way around as Intel assembly. For a tutorial on ARM assembly, see Azeria’s blog series.

Conclusion

Inline assembly is super cool! 🙂

By learning how code in a high level language, such as C, translates into machine code is useful because it can help during debugging. Understanding the low level details can lead to becoming a better programmer in general. Inline assembly can be a useful tool for those writing highly optimized code. However, the limitations with the lack of portability and added complexity of the language make it impractical for many general purpose situations.

In Cyber Security, understanding assembly helps to uncover how janky computers really are ‘under the hood’. Higher level languages hide the details behind a beautiful veil of magic where functions take parameters and return results in near English language. Learning assembly can show the sometimes ugly truth of what’s really going on and how your variables are even represented. Therefore, understanding assembly language is a vital skill in many security jobs such as reverse engineering and exploit development. Inline assembly can make starting to write assembly code more accessible.

A guide to using inline assembly on macOS