-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0e958d3
commit f789294
Showing
13 changed files
with
2,675 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
--- | ||
layout: article | ||
title: "Compilation Process Deep Dive: How a C Program Becomes an Executable" | ||
--- | ||
Take this C source file `hello.c`: | ||
|
||
``` | ||
#include <stdio.h> | ||
int main(void) | ||
{ | ||
puts("Hello, world!"); | ||
return 0; | ||
} | ||
``` | ||
|
||
You should be familiar with the following invocation: | ||
|
||
gcc hello.c -o hello | ||
|
||
This is the most basic way to compile a C source file into an binary file you can execute. | ||
|
||
Most of you are also familiar with breaking this process into two steps: | ||
|
||
**Compilation**: | ||
|
||
gcc -c hello.c -o hello.o | ||
|
||
**Linking**: | ||
|
||
gcc hello.o -o hello | ||
|
||
|
||
This has advantages for large projects because the compilation can be done in parallel, and as you edit the code, only the files that you change need to be recompiled. | ||
|
||
While *two* steps is enough for practical purposes (e.g. decreasing build time), it is not the full story. | ||
In reality, the C compiler performs least _four_ distinct processes behind the scenes: preprocessing, compilation, assembly, and linking. | ||
|
||
The command `gcc -c hello.c -o hello.o` encompasses the preprocessing, compilation, and assembly steps, while the command `gcc hello.o -o hello` encompasses the linking step. | ||
|
||
We can invoke each step manually like so: | ||
|
||
0. Preprocessing | ||
|
||
cpp hello.c -o hello.i | ||
|
||
The | ||
[C preprocessor](https://en.wikipedia.org/wiki/C_preprocessor) | ||
removes comments, collapses whitespace, and resolves macros. | ||
|
||
The output is traditionally given the suffix `.i` which stands for intermediate. | ||
|
||
0. Compilation | ||
|
||
cc -S hello.i -o hello.s | ||
|
||
This is where C language constructs like variables, types and control-flow are flattened into undifferentiated data and code. | ||
|
||
After this point, we have no way to tell with certainty that this assembly output came from C program input. A compiler for a different language could plausibly generate identical assembly output. | ||
|
||
0. Assembly | ||
|
||
as hello.s -o hello.o | ||
|
||
The instructions are replaced with their machine code equivalents. This part is reversible, but | ||
the assembler also rips out the last remnants of structure leftover from the original C program source. Static data and functions lose their names and are referred to by only their address, and any exported or imported variables and functions become generic entries in a symbol table. All other labels (e.g. the target of a jump within the same function) are gone without a trace. | ||
|
||
After this step, the output is no longer human readable text. | ||
|
||
0. Linking | ||
|
||
ld hello.o -l:crt1.o -lc -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o hello | ||
|
||
Even though _our_ functions have been compiled into machine code that our CPU could in theory execute, | ||
there is still work do be done. The linker collects the dependencies of our program | ||
(the C startup runtime `-l:crt1.o` that provides the `_start` symbol and the C standard library `-lc` which provides the `puts` symbol) and bundles them into one file. | ||
The linker makes connections between object files, by cross-referencing their symbol tables | ||
to resolve previously unresolved symbols with their now known locations. | ||
|
||
In reality symbol resolution is an instance of | ||
the classic engineering trade-off between | ||
execution speed and memory footprint. | ||
Our C program, like most, is at least partially | ||
[dynamically](https://en.wikipedia.org/wiki/Dynamic_linker) | ||
linked at runtime (`-dynamic-linker /lib64/ld-linux-x86-64.so.2`). | ||
|
||
The output is an executable ELF file that the kernel loader can load into memory and execute on a CPU. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,217 @@ | ||
--- | ||
layout: article | ||
title: Assembly demo | ||
--- | ||
### Assembly code written in during a live lecture | ||
|
||
``` | ||
.text | ||
.globl _start | ||
_start: | ||
#write(1, QUESTION, sizeof(QUESTION) - 1); | ||
mov $1, %rdi #stdout fileno | ||
lea question, %rsi #pointer to string | ||
mov $question_len, %rdx #length | ||
mov $1, %rax #no. for write | ||
syscall #do it! | ||
cmp $0, %rax #check return value | ||
jl error #if negative, error out | ||
#read(0, buffer, sizeof(buffer)); | ||
mov $0, %rdi #stdin fileno | ||
lea buffer, %rsi #pointer to buffer | ||
mov $buffer_len, %rdx #length | ||
mov $0, %rax #no. for read | ||
syscall #do it! | ||
push %rax #save return value | ||
cmp $1, %rax #check return value | ||
jle error #if <= 1, error out | ||
#write(1, MESSAGE, sizeof(MESSAGE) - 1); | ||
mov $1, %rdi #stdout fileno | ||
lea hellomsg, %rsi #pointer to string | ||
mov $hello_len, %rdx #length | ||
mov $1, %rax #no. for write | ||
syscall #do it! | ||
cmp $0, %rax #check return value | ||
jl error #if negative, error out | ||
#write(1, buffer, (size_t)len); | ||
mov $1, %rdi #stdout fileno | ||
lea buffer, %rsi #pointer to buffer | ||
pop %rdx #saved length | ||
mov $1, %rax #no. for write | ||
syscall #do it! | ||
cmp $0, %rax #check return value | ||
jl error #if <= 1, error out | ||
mov $0, %rdi #exit status of 0 | ||
mov $60, %rax #no. for exit | ||
syscall #do it! | ||
error: | ||
mov $2, %rdi #stderr fileno | ||
lea errormsg, %rsi #pointer to string | ||
mov $error_len, %rdx #length | ||
mov $1, %rax #no. for write | ||
syscall #do it! | ||
mov $1, %rdi #exit status of 1 | ||
mov $60, %rax #no. for exit | ||
syscall #do it! | ||
.data | ||
question: | ||
.ascii "What is your name?\n" | ||
.equ question_len, . - question | ||
errormsg: | ||
.ascii "error!\n" | ||
.equ error_len, . - errormsg | ||
buffer: | ||
.equ buffer_len, 100 | ||
.space buffer_len, 0 | ||
hellomsg: | ||
.ascii "Hello, " | ||
.equ hello_len, . - hellomsg | ||
``` | ||
|
||
### Similar prewritten example for both x86-64 and aarch64: | ||
|
||
#### x86-64: | ||
|
||
``` | ||
#include <syscall.h> | ||
#define STDIN_FILENO 0 | ||
#define STDOUT_FILENO 1 | ||
.globl _start //make _start a global symbol so linker can find it | ||
_start: //_start is entry point for all executibles | ||
mov %rax, $SYS_write //%rax holds syscall number, 1 represents `write` | ||
mov %rdi, $STDOUT_FILENO //%rdi holds first syscall arg, 1 represents `stdout` | ||
lea %rsi, prompt //%rsi holds second arg, =prompt gets address if prompt string from data section | ||
mov %rdx, $prompt_len //%rdx holds third arg, prompt_len is macro that expands to calculated size | ||
syscall //perform a system call | ||
cmp %rdi, $0 //check if return is negative | ||
jl .out //if it is, exit program early with exit code based on return value | ||
mov %rax, $SYS_read //0 represents `read` | ||
mov %rdi, $STDIN_FILENO //0 represents `stdin` | ||
ldr %rsi, =buffer //read into buffer | ||
mov %rdx, $buffer_len //at most buffer_len bytes | ||
syscall //perform syscall | ||
cmp %rdi, $0 //check for error as above | ||
jl .out | ||
mov %rcx, %rdi //save returned length to only print that many bytes | ||
mov %rax, $SYS_write //back to writing, send "Hello, " to stdout | ||
mov %rdi, $STDOUT_FILENO | ||
ldr %rsi, =msg | ||
mov %rdx, $msg_len | ||
syscall | ||
cmp %rdi, $0 //check for error | ||
jl .out | ||
mov %rdi, $1 //need to set %rdi back to 1 because it was replaced with return code of last call | ||
ldr %rsi, =buffer //whatever they input | ||
mov %rdx, %rcx //and however long it was | ||
syscall //send that | ||
cmp %rdi, $0 //check for errors | ||
jl .out | ||
mov %rdi, $0 //if there was not an error, set return code to 0 | ||
.out: //otherwise we were sent here and %rdi already contains error code to return | ||
mov %rax, $SYS_exit //60 represents exit | ||
syscall //exit program | ||
//exit syscall does not return, so _start function does not need to return to caller | ||
.data //data section for strings | ||
prompt: .ascii "What is your name? " | ||
.equ prompt_len, .-prompt //.equ makes a new macro, `.` represents current location in binary, and subtracting the value of prompt gives how many bytes prompt contained | ||
buffer: .space 64 | ||
.equ buffer_len, .-buffer | ||
msg: .ascii "Hello, " | ||
.equ msg_len, .-msg | ||
.data | ||
message: | ||
.ascii "Hello, World!\n" | ||
len = . - message | ||
.text | ||
.global _start | ||
_start: | ||
mov $1, %rdi | ||
mov $message, %rsi | ||
mov $len, %rdx | ||
mov $1, %rax | ||
syscall | ||
mov $13, %rdi | ||
mov $60, %rax | ||
syscall | ||
``` | ||
#### aarch64: | ||
|
||
``` | ||
#include <syscall.h> | ||
#define STDIN_FILENO 0 | ||
#define STDOUT_FILENO 1 | ||
.globl _start //make _start a global symbol so linker can find it | ||
_start: //_start is entry point for all executibles | ||
mov x8, #SYS_write //x8 holds syscall number, 64 represents `write` | ||
mov x0, #STDOUT_FILENO //x0 holds first syscall arg, 1 represents `stdout` | ||
ldr x1, =prompt //x1 holds second arg, =prompt gets address if prompt string from data section | ||
mov x2, #prompt_len //x2 holds third arg, prompt_len is macro that expands to calculated size | ||
svc #0 //perform a system call | ||
cmp x0, #0 //check if return is negative | ||
b.lt .out //if it is, exit program early with exit code based on return value | ||
mov x8, #SYS_read //63 represents `read` | ||
mov x0, #STDIN_FILENO //0 represents `stdin` | ||
ldr x1, =buffer //read into buffer | ||
mov x2, #buffer_len //at most buffer_len bytes | ||
svc #0 //perform syscall | ||
cmp x0, #0 //check for error as above | ||
b.lt .out | ||
mov x3, x0 //save returned length to only print that many bytes | ||
mov x8, #SYS_write //back to writing, send "Hello, " to stdout | ||
mov x0, #STDOUT_FILENO | ||
ldr x1, =msg | ||
mov x2, #msg_len | ||
svc #0 | ||
cmp x0, #0 //check for error | ||
b.lt .out | ||
mov x0, #1 //need to set x0 back to 1 because it was replaced with return code of last call | ||
ldr x1, =buffer //whatever they input | ||
mov x2, x3 //and however long it was | ||
svc #0 //send that | ||
cmp x0, #0 //check for errors | ||
b.lt .out | ||
mov x0, #0 //if there was not an error, set return code to 0 | ||
.out: //otherwise we were sent here and x0 already contains error code to return | ||
mov x8, #SYS_exit //93 represents exit | ||
svc #0 //exit program | ||
//exit syscall does not return, so _start function does not need to return to caller | ||
.data //data section for strings | ||
prompt: .ascii "What is your name? " | ||
.equ prompt_len, .-prompt //.equ makes a new macro, `.` represents current location in binary, and subtracting the value of prompt gives how many bytes prompt contained | ||
buffer: .space 64 | ||
.equ buffer_len, .-buffer | ||
msg: .ascii "Hello, " | ||
.equ msg_len, .-msg | ||
``` | ||
|
||
### Example makefile for assembly with preproccessing | ||
|
||
``` | ||
.PHONY: all clean | ||
all:asm_hello | ||
asm_hello: asm_hello.o | ||
ld -o asm_hello asm_hello.o | ||
asm_hello.o: asm_hello.s | ||
as asm_hello.s -o asm_hello.o | ||
asm_hello.s: asm_hello.S | ||
cpp asm_hello.S -o asm_hello.s | ||
clean: | ||
-rm asm_hello.s asm_hello.o asm_hello | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
layout: article | ||
title: Everything is a file (in Linux) | ||
--- | ||
This elegant design principle dates back to | ||
the beginning of (Unix) time, a.k.a. | ||
[the 70s](https://en.wikipedia.org/wiki/January 1, 1970). | ||
However, this simple principle is an | ||
oversimplification - consider the existence | ||
of directories. | ||
In reality, the slogan | ||
["Everything is a file"](https://en.wikipedia.org/wiki/Everything_is_a_file) | ||
is a convenient shorthand for the more accurate | ||
but less catchy notion that (almost) all | ||
resources available to a process on a | ||
[Unix-like](https://en.wikipedia.org/wiki/Unix-like) | ||
operating system can be referenced by a | ||
[file descriptor](https://en.wikipedia.org/wiki/File_descriptor). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
layout: article | ||
title: Introduction to Git | ||
--- | ||
* We used [this](https://kdlp.underground.software/course/slides/git.html) slide deck | ||
|
||
* Git is distributed [version control](https://en.wikipedia.org/wiki/Version_control) software | ||
|
||
* [Git](https://git-scm.com/) is not [GitHub](https://github.com) | ||
|
||
* GitHub is one implementation of an interface for git | ||
|
||
* There are variously featured alternatives, such as [GitLab](https://gitlab.com/), [Bitbucket](https://bitbucket.org/), and [cgit](https://git.zx2c4.com/cgit/) | ||
|
||
* The KDLP team maintain a custom-themed cgit instance [here](https://kdlp.underground.software/cgit) | ||
|
||
* Git is built on a [tree-like data structure](https://en.wikipedia.org/wiki/Tree_(data_structure)) that contains the entire change history of a project | ||
|
||
* **Git proficiency is of the most useful and valuable software engineering skills a computer science student can learn in preparation to enter the industry** | ||
|
||
* Charlie did a demo in the terminal. Here's a rough outline of the various git commands he covered: | ||
|
||
* `git clone`: Cloning the [ILKD_assignments](https://kdlp.underground.software/cgit/ILKD_assignments/) repository | ||
|
||
* `git commit`: Committing new local changes to the repository | ||
|
||
* `git merge`: Combining two change histories into one | ||
|
||
* `git reset`: Undoing previous changes, and going nuclear with `--hard` | ||
|
||
* `git rebase`: Rewriting the git history | ||
|
||
* (single commit rewrite cases can be handled with `git commit -amend`) | ||
|
||
* When things don't go right, you may have to resolve merge conflicts by manually editing source files and re-committing | ||
|
||
* This should not be something you have to do for this course, however for anyone who is interested, here is an article on [merge conflicts](https://css-tricks.com/merge-conflicts-what-they-are-and-how-to-deal-with-them/) |
Oops, something went wrong.