x86-64 assembly from scratch

· 15 min read

This week, I’m learning x86-64 assembly.

Why I am interested in assembly

After having worked as a full-stack web developer for a few years, I am currently looking for a job with a focus on cybersecurity.

I need to work on specific skills to be effective in this kind of job. One of the skills that seem to be very useful is assembly. It is used in forensic analysis, reverse engineering and a whole lot of other areas.

Starting off

Because my computer is a 64 bits computer, the easiest way to get started is probably to use the assembly that works on my computer. That’s x86-64 (which is apparently referred to as x64 or AMD64 sometimes).

I am new to assembly. I have seen some assembly code and have studied 68K assembly at University. But that was only on paper and I’ve never actually compiled and run assembly code. Let’s remedy that!

Looking for x64 linux assembly on DuckDuckGo, I find this Stackoverflow thread that has some good tips, both on what to do and what not to do.

What not to do:

The Intel documentation is a bad place to start learning assembler.

What to do:

Take a look here (http://asm.sourceforge.net) it’s the best place for Linux Assembly development, you will find resources, docs and links.

This website, asm.sourceforge.net, seems a bit complicated though. I’ll try to find some easier guides. Guides aimed at total beginners.

This series of tutorials by 0xAX is a goldmine. It documents the learning steps for x86-64 assembly. It looks like a great place to start learning assembly.

It lists basic information like what registers are available and their different forms (64 bits, 32 bits, etc) with the corresponding names (ie rax, eax, ax, al for the first register).

It gives you the command you need to install the NASM compiler on Ubuntu:

sudo apt-get install nasm

And the commands you need to compile and link assembly file so you actually get a working program:

nasm -f elf64 -o hello.o hello.asm
ld -o hello hello.o

nasm is new to me. I’d like to know more.

DuckDuckGo leads me to the nasm documentation. I highly recommend glancing through it: it is an easy read and contains plenty tips for assembly development with nasm.

One thing this documentation teaches me is that there are 3 different ways to define strings with nasm:

db 'Hello world', 10 ; with single quotes
db "Hello world", 10 ; with double quotes
db `Hello world\n`   ; with backticks, \n, \r, \t, etc are available

This documentation also gives a trick to get the length of a string. It looks like a pointer subtraction, $ being the pointer to the current instruction, ie the end of the string, and message being the pointer to the beginning of the string:

message db  'hello, world'
msglen  equ $-message

Calling conventions

42, the school at which I studied until the end of January 2017, has a course on assembly and it says the following:

Be aware of the “calling conventions”

Not knowing what “calling conventions” are, I go back to DuckDuckGo which leads me to wiki.osdev.org:

What I understand from this graphic is that the “calling convention” defines what registers have what role when you call a function. For instance, if I call a function with 2 parameters, I’ll put the first parameter value in rdi and the second parameter value in rsi. The function must put the return value in rax.

This means that, where I would do this in C:

int return_value_of_my_function = my_function(42);
printf("%d\n", return_value_of_my_function);

In assembly, I would do this (pseudo-code):

rdi <= 42
call my_function
rdi <= "%d\n"
rsi <= rax
call printf

Stack alignment

In 42’s e-learning platform, the video dedicated to assembly talks about “stack alignment”, without giving to much information about what it is. Searching for “stack alignment” on DuckDuckGo yields no easy to understand explanations.

Given that I’m not hindered by stack alignement for now, I’ll keep going and come back to it only if it actually poses a problem.

Update 08/2017: Since I wrote this post, I found this StackOverflow answer on “stack alignment” which is a great introduction to the topic. Worth a read!

Hello world

Now that I have the basics down, I’d like to create a real program. One that I can run in my terminal. Nothing better than a good old “Hello world”!

To write a functional “Hello world”, I’ll need to call the write system call. How can I do a syscall with nasm? A Stackoverflow question about syscalls suggests reading the psABI-x86-64. And here is what that document says:

A system-call is done via the syscall instruction. The kernel destroys registers %rcx and %r11.

The number of the syscall has to be passed in register %rax.

Moreover, when I had a look at the ELF file format, I saw that read-only data is saved in the .rodata section and that executable code was saved in the .text section. So here is the first version of assembly “Hello world”:

section .rodata
    msg:    db 'hello, world', 10
    msglen: equ $-msg

section .text
    main:
        ; write(1, msg, msglen)
        mov rdi, 1
        mov rsi, msg
        mov rdx, msglen
        mov rax, 1
        syscall

What’s happening here? I put 1 into rax and then use the syscall instruction. This calls write, which has syscall number 1, as can be seen in this syscall table.

Then I compile with:

nasm hello.s -f elf64 -o hello.o && ld hello.o -o hello

But, I’m getting a warning:

ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080

I made a mistake here. I assumed that the first function to be called would be main, as in C. But it seems like this is not the case with assembly.

Looking for “cannot find entry symbol _start asm” on DuckDuckGo, I find out that my code should look like this:

.text
        global _start

    _start:
        ; code goes here

The global keyword indicates that the symbol _start is accessible from outside the current module, hello.s in this case. And if you have a look at the man elf, you can actually see that this global keyword is translated into a STB_GLOBAL flag:

Anyway, so I replace main with _start and add global _start:

section .rodata
    msg:    db 'Hello world', 10
    msglen: equ $-msg

section .text
        global _start

    _start:
        ; write(1, msg, msglen)
        mov rdi, 1
        mov rsi, msg
        mov rdx, msglen
        mov rax, 1
        syscall

Then, I recompile and execute the program with ./hello, which gives me the following output:

Hello world
Segmentation fault (core dumped)

That’s not good!

But I’m not the only one that runs into this is issue, as this Stackoverflow question on segmentation faults with NASM shows:

Because ret is NOT the proper way to exit a program in Linux, Windows, or Mac!!!! For Windows it is ExitProcess and Linux is is system call - int 80H using sys_exit, for x86 or using syscall using 60 for 64Bit or a call to exit from the C Library if you are linking to it.

Let’s try to apply this by using the exit syscall:

section .rodata
    msg:    db 'Hello world', 10
    msglen: equ $-msg

section .text
        global _start

    _start:
        ; write(1, msg, msglen)
        mov rdi, 1
        mov rsi, msg
        mov rdx, msglen
        mov rax, 1
        syscall
        ; exit(0)
        mov rdi, 0
        mov rax, 60
        syscall

This time, after I’ve compiled and run the program, everything works, yay!

Now, with libc

In the previous examples, I was compiling without the libc. I’d like to be able to use printf instead of the write syscall now, so I’ll need the libc.

Let’s start by compiling (more specifically linking) with gcc instead of ld. gcc will automatically add the libc into the mix:

nasm hello.s -f elf64 -o hello.o && gcc -Wall -Wextra -Werror -o hello hello.o

Oh! This time, it looks like main is missing:

hello.o: In function `_start':
hello.s:(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o:(.text+0x0): first defined here
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status

Let’s go back to using main instead of _start:

section .rodata
    msg:    db 'Hello world', 10
    msglen: equ $-msg

section .text
        global main
    main:
        ; write(1, msg, msglen)
        mov rdi, 1
        mov rsi, msg
        mov rdx, msglen
        mov rax, 1
        syscall
        ; return 0
        mov rax, 0
        ret

Note how I also replaced exit(0) with return 0 because the libc’s main function automatically calls the exit syscall with the value that is returned from main.

After compiling and running ./hello, I get “Hello world”. All is good!

And now, with printf

Now that I have the libc in place, I’ll replace write with printf.

On 42’s e-learning platform, there was an example printf call that show I had to use extern printf. The extern keyword is described in the NASM documentation:

EXTERN (…) is used to declare a symbol which is not defined anywhere in the module being assembled, but is assumed to be defined in some other module and needs to be referred to by this one.

This is what I get once I’ve added these bits of code:

section .rodata
    format: db 'Hello %s', 10
    name:   db 'Conrad'

section .text
        global main
        extern printf
    main:
        ; printf(format, name)
        mov rdi, format
        mov rsi, name
        call printf
        ; return 0
        mov rax, 0
        ret

To make things a bit more interesting, I set printf’s first argument to a format as opposed to a regular string.

After compiling and running the program, I get a Segmentation fault once again. Because printf has variable arguments, I’m thinking maybe I got the calling convention wrong. As is often the case Stackoverflow has a question and an answer about the printf calling convention:

Yes, RAX (actually AL) should hold the number of XMM registers used.

I have no idea what XMM registers are at this point, so I don’t think I’ve used any. I’ll just try to add mov rax, 0 before calling printf and see what happens:

Hello Conrad
Conrad

Better! The segfault is gone. But for some reason, my first name is output twice. After fiddling around a bit, I realise that printf expects a C string that has \0 at the end.

After adding \0, this is what my assembly code looks like:

section .rodata
    format: db 'Hello %s', 10, 0
    name:   db 'Conrad', 0

section .text
        global main
        extern printf
    main:
        ; printf(format, name)
        mov rdi, format
        mov rsi, name
        ; no XMM registers
        mov rax, 0
        call printf
        ; return 0
        mov rax, 0
        ret

I compile and run ./hello:

Hello Conrad

It works!

A “simple” function, bzero

Now, let’s try to clone some “simple” functions from the libc, but in assembly. This will be a good way to learn about new x86-64 instructions, since I will need loops and index incrementation.

The prototype of bzero is void bzero(void *s, size_t n). This means that the rdi register will be a pointer to s and the rsi register will be the number of bytes to set to 0. The simplest way to clone bzero would probably be something like this:

while (--rsi >= 0) {
    rdi[rsi] = 0;
}

For --rsi, there seems to be a well-suited DEC instruction, as can be seen in the x64 instruction set published by Intel.

For the >= 0 comparison, the Jcc instructions seem appropriate. They set the instruction pointer to a specific address depending on predefined conditions:

The condition codes used by the Jcc, CMOVcc, and SETcc instructions are based on the results of a CMP instruction.

For rdi[rsi] = 0, I think the MOV instruction should work. However, I don’t know how to tell MOV how to copy at the rsi index of rdi.The nasm docs on indexing know though: it is mov word [addr], val.

Knowing this, the assembly version of bzero would be something like this:

while (1) {
    rsi--; // DEC
    if (rsi < 0) return; // CMP, JL
    rdi[rsi] = 0; // MOV, JMP
}

I end up with the following assembly version of bzero:

section .text
    global my_bzero
    my_bzero:
    .loop:
        ; rsi--
        dec rsi
        ; if (rsi < 0) return
        cmp rsi, 0
        jl .ret
        ; rdi[rsi] = 0
        mov byte [rdi+rsi], 0
        jmp .loop
    .ret:
        ret

I’ve replaced mov word [...], 0 with mov byte [...], 0 after realising that word meant 16 bits instead of 8 bits. My goal is to copy one byte (8 bits) at a time. And I’ve named my function my_bzero so it doesn’t conflict with the libc’s bzero.

Now, I’ll test my bzero with the following C program:

#include <stdio.h>

#define ZEROED_LEN (10)

void my_bzero(void* addr, long unsigned len);

int main() {
    char test[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

    my_bzero(test, ZEROED_LEN);
    printf("%d\n", test[0]);
    printf("%d\n", test[1]);
    printf("%d\n", test[2]);
    printf("%d\n", test[3]);
    printf("%d\n", test[4]);
    printf("%d\n", test[5]);
    printf("%d\n", test[6]);
    printf("%d\n", test[7]);
    printf("%d\n", test[8]);
    printf("%d\n", test[9]);

    return 0;
}

I can change the ZEROED_LEN constant to check for a variety of behaviors.

Everything works as expected. Nice!

“Repeat String Operations”

42’s course on assembly suggests learning about “Repeat String Operations”. In the x64 instruction set, I see references to instructions: REP, REPE, REPZ, REPNE and REPNZ.

These instructions are supposed to be useful for functions like strlen, so that’s what I’ll try to clone from libc.

The x64 instruction set shows that the operands of these instructions are of type m8, m16, etc:

However, in this course about “String instructions” by Ben Howard, there are examples without operands:

MOV     DI, DX ;Starting address in DX (assume ES = DS)
MOV     AL, 0  ;Byte to search for (NUL)
MOV     CX, -1 ;Start count at FFFFh
CLD            ;Increment DI after each character
REPNE SCASB    ;Scan string for NUL, decrementing CX for each char

So I try using SCAS without operands to reproduce strlen’s behavior in assembly:

section .text
    global my_strlen

    my_strlen:
        mov rsi, rdi ; backup rdi
        mov al, 0    ; look for \0
        repne scas   ; actually do the search
        sub rdi, rsi ; save the string length
        dec rdi      ; don't count the \0 in the string length
        mov rax, rdi ; save the return value
        ret

But that doesn’t compile:

hello.s:7: error: parser: instruction expected

After multiple searches, and for a reason I don’t remember, I searched for “nasm prefix instruction expected” on Google and luckily found a tip regarding the use of “repeat string operations” with nasm on Stackoverflow:

NASM doesn’t support the ‘LODS’, ‘MOVS’, ‘STOS’, ‘SCAS’, ‘CMPS’, ‘INS’, or ‘OUTS’ instructions, but only supports the forms such as ‘LODSB’, ‘MOVSW’, and ‘SCASD’ which explicitly specify the size of the components of the strings being manipulated.

Given that my goal is to compare each and every byte, one after the other, I’ll use the “byte version”, that’s SCASB:

section .text
    global my_strlen

    my_strlen:
        mov rsi, rdi ; backup rdi
        mov al, 0    ; look for \0
        repne scasb  ; actually do the search
        sub rdi, rsi ; save the string length
        dec rdi      ; don't count the \0 in the string length
        mov rax, rdi ; save the return value
        ret

This compiles fine!

Let’s test with the following C program:

#include <stdio.h>

#define ZEROED_LEN (10)

void my_bzero(void* addr, long unsigned len);
long unsigned my_strlen(const char *s);

int main() {
    char test[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

    my_bzero(test, ZEROED_LEN);
    printf("%d\n", test[0]);
    printf("%d\n", test[1]);
    printf("%d\n", test[2]);
    printf("%d\n", test[3]);
    printf("%d\n", test[4]);
    printf("%d\n", test[5]);
    printf("%d\n", test[6]);
    printf("%d\n", test[7]);
    printf("%d\n", test[8]);
    printf("%d\n", test[9]);

    printf("length: %lu\n", my_strlen("test"));
    printf("length: %lu\n", my_strlen(""));
    printf("length: %lu\n", my_strlen("hello world"));

    return 0;
}

The output looks OK at first sight:

0
0
0
0
0
0
0
0
0
0
length: 4
length: 0
length: 11

However, there is actually a bug. When I remove the bzero code, I have the following test script:

#include <stdio.h>

long unsigned my_strlen(const char *s);

int main() {
    printf("length: %lu\n", my_strlen("test"));
    printf("length: %lu\n", my_strlen(""));
    printf("length: %lu\n", my_strlen("hello world"));
    printf("length: %lu\n", my_strlen("bla"));

    return 0;
}

And with that, the output is not at all OK:

length: 18446744073709551615
length: 0
length: 11

What’s happening here? I know that the char test[10] table was allocated on the stack. Maybe this is the mysterious “stack alignment” coming back to bite me?

Actually, it’s not. After looking around a bit, I realise that Ben Howard puts -1 in the CX register. When I do this, my code works too:

section .text
    global my_strlen

    my_strlen:
        mov rcx, -1
        mov rsi, rdi ; backup rdi
        mov al, 0    ; look for \0
        repne scasb  ; actually do the search
        sub rdi, rsi ; save the string length
        dec rdi      ; don't count the \0 in the string length
        mov rax, rdi ; save the return value
        ret

This is the ouput I get with the test program in C:

length: 4
length: 0
length: 11

Copying and pasting code from Ben Howard without understanding it is no fun though. So I’ll look for the reason why mov rcx, -1 magically fixes things. The answer is in the REP instruction algorithm inside the x64 instruction set documentation:

WHILE CountReg ≠ 0
DO
  Service pending interrupts (if any);
  Execute associated string instruction;
  CountReg ← (CountReg – 1);
  IF CountReg = 0
    THEN exit WHILE loop; FI;
  IF (Repeat prefix is REPZ or REPE) and (ZF = 0)
  or (Repeat prefix is REPNZ or REPNE) and (ZF = 1)
    THEN exit WHILE loop; FI;
OD;

This pseudo-code shows that if CountReg (RCX in my case), reaches 0, then the loop stops. What matters to me though, is the Repeat prefix is REPNE condition. That means that RCX must never reach 0. The simple way to prevent that is setting it to -1 so that when it is decremented at CountReg - 1, it never gets to 0.

Summary

I’ve discovered x86-64 assembly basics: registers, libc function calls, creating custom functions and calling them from C code, and various instructions (CMP, DEC, MOV, JMP, SUB, RET, JL, POP, PUSH and REPNE SCASB).

I have a lot left to learn, most notably what stack alignment is all about. And of course, there are tons of instructions and register types (XMM for instance) that could be helpful.