Skip to content

Latest commit

 

History

History

x86-intel

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

x86-64 Intel Syntax

Code Examples


There are many different assembly languages, depending on the processor you want to talk to. These code examples specifically are written in x86-64 Intel syntax.

X86 is one of the most useful assembly languages, but is also one of the more complicated ones to write. Most modern desktop computers and game consoles use it. It's used for Intel processors, which have to process a lot of data!

The 86 is pulled from the model names of the Intel chips that use this assembly language, which all end in 86 (like the 8086 chip).

The 64 part is referring to the number of bits that the processor registers hold. The original x86 processors were 32 bit, so we specify "-64" to know we're talking about the 64 bit version. You'll see some examples online that use the 32 bit version, and the registers they refer to are different. Usually 32 bit registers start with the letter E, whereas 64 bit registers usually start with the letter R.

Pre-requisites

By default, macOS doesn't ship with developer tools, since most computer users aren't writing code. In order to compile these examples, you'll need to download Xcode and their command line tools.

If you have Homebrew installed, you can download the yasm package.

Running programs

We will use the Hello World program as our example for this, but you will see the same steps for the Uppercaser program.

For a Mac (both Intel and Apple Silicon), these are the instructions to compile and run the code. We have 3 steps to run our program:

  1. Assemble it into an object file
  2. Generate our executable
  3. Run our executable

1. Assemble our program into an object file

$ yasm -f macho64 hello-world.asm

This whole command creates an object file, which is machine code. You can view it in a hex editor. If you view it in a normal text editor, it tries to convert the machine code to ASCII, which makes it nonsensical.

yasm is our assembler, -f flag is to specify our file format. macho64 is our file format, used for Mac executables.

2. Generate our executable

$ ld hello-world.o -o hello-world -macosx_version_min 12.4 -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem -no_pie

This generates our executable by linking our object file to any libraries it needs. It bundles everything together into machine code, -o lets us specify what we want our executable to be called.

3. Run our executable

$ ./hello-world

Note: For the Uppercaser program, you'll have to pass command line arguments (eg. words to uppercase) so your command might look like ./uppercaser words to uppercase

All together now!

$ yasm -f macho64 hello-world.asm && ld hello-world.o -o hello-world -macosx_version_min 12.4 -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem -no_pie && ./hello-world

Anatomy of a program

We will use the Hello World program as our example for this, but you will see a similar setup in the Uppercaser program as well.

First thing you'll see is a section for read-only constants.

This is used in our Hello World program for setting up our string data, but if you look in the Uppercaser program, that section is empty because we don't need to set up any constants for it.

; Section for read-only constants
section .data
    ; msg is a label
    ; db = Data Bytes
    ; saves the ASCII number equivalent of this msg into memory, retrievable later by its label
    ; 10 is ASCII for a newline
    msg: db "Hello, world!", 10

    ; Define an assemble-time constant, which is calculated during compilation
    ; Calculate len = string length.  subtract the address of the start of the string from the current position ($)
    .len: equ $ - msg

Next you'll see that the code that we actually want to execute on program launch goes in the .text section.

; Executable code goes in the .text section
section .text
  ; The linker looks for this symbol to set the process entry point, so execution start here
  global _main

We start our program in _main, where we write "Hello World" out to the terminal.

_main:
    mov     rax, 0x2000004 ; system call for write. anything with 0x2 is mac specific
    mov     rdi, 1 ; Set output to stdout. 1 = stdout, which is normally connected to the terminal.
    mov     rsi, msg ; address of string to output
    mov     rdx, msg.len ; rdx holds address of next byte to write. msg.len is the number of bytes to write
    syscall ; invoke operating system to do the write

We then exit our program.

  mov     rax, 0x2000001 ; system call for exit. anything with 0x2 is mac specific
  mov     rdi, 0 ; exit code 0
  syscall ; invoke operating system to exit

Common Instructions

For more common instructions, check out the Stanford CS107 list.

Instruction Arguments Explanation
mov dst, src dst = src
add dst, src dst += src
sub dst, src dst -= src
cmp a, b b-a set flags
jmp label jump to label
je label jump if equal (ZF=1)
jne label jump not equal (ZF=0)
jg label jump > (ZF=0)
push src add to top of stack
pop dst remove top from stack
call fn push %rip, jmp to fn
ret pop %rip

Registers

Registers in x86-64 are 64 bits.



Commonly used registers from Stanford CS107


Resources

Here are some resources I ended up using while writing the programs in this section.

The "Getting Started Writing Assembly Language" series

Guides

Hello World

Command line params

Syscalls

Debugging