title | author | tags |
---|---|---|
Meltdown and Spectre |
Edward Higgins |
bugs low-level |
At the start of 2018, a set of three serious bugs in CPU hardware were revealed to be affecting the vast majority of processors manufactured since the mid-90s. This week, Ed explained how these bugs work, explaining some advanced features of CPUs along the way.
You can find the original slides for this talk here.
-
Exploits side effects of out-of-order execution to read arbitrary kernel memory
-
Affects all modern Intel CPUs in recent years
-
Induces victim to speculatively perform operations which leak information
-
Affects many high-performance CPUS, including Intel, AMD and ARM chips in recent years, and others
-
Next instruction fetched from memory
-
Instruction is decoded and the location any of indirectly referenced memory is interpreted
-
Instruction is executed, and written to wherever is specified
-
Each instruction takes several clock ticks
- Instructions take several clock cycles
-
Modern computers use multiple types of memory
-
Each Various levels present different trade-offs of speed & memory
-
When the CPU needs memory, it will move it into a lower-level cache
-
Example:
Memory | Size | Latency |
---|---|---|
L1 Cache | 64 KB | 4-12 |
L2 Cache | 256 KB | 26-31 |
L3 Cache | 4 MB | 43-60 |
RAM | 8 GB | 100s |
-
F-D-E cycle micro-ops can be done in parallel, each is done by different hardware
-
Breaking the cycle down into more micro-ops allows more instructions to be processed at once
- Modern CPUs allow micro-ops of many operations to be done out of order
- Allows commands such as loads & stores to be issued before:
- preceding branches resolve
- preceding operations complete
-
Branches include:
- Conditionals
- Direct calls & jumps
- Indirect calls & jumps
- Returns
-
Calculated by the Branch prediction unit (BPU), including:
- Return stack buffer (RSB): A history of recent return addresses
- Branch target buffer (BTB): Recent outcomes from conditionals/calls
-
Instructions that:
- Are executed out of order
- Leave measurable side effects
-
Occur all the time in normal operation
-
Exploitable if their operation depends on a secret channel
...
if (x < 0)
call OutOfBounds();
var = array[x];
...
-
Instructions that:
- Are executed out of order
- Leave measurable side effects
-
Occur all the time in normal operation
-
Exploitable if their operation depends on a secret channel
...
cmp rax, 0 ; Compare register rax to 0
jl OutOfBounds ; If rax < 0, jump elsewhere
mov rcx, [rbx + rax] ; Now, move some memory into rcx
...
-
Usually, multiple programs run on the same hardware
-
State of the CPU can be changed by these programs
-
Such changes may be detectable by other programs
- Branch history
- BTB
- Caches (e.g. Flush+Reload)
- Allows non-privileged users to read privileged memory
-
The content of a restricted memory location is loaded into a register, throwing an exception
-
A transient instruction accesses an uncached memory address based on the contents of that register, fetching it into cache
-
A side-channel attack (e.g. Flush+Reload) used to determine which memory has been moved to cache, revealing the value of the restricted memory
-
Line 5 attempts to retrieve the secret byte from address
rcx
intorl
-
CPU checks permission bits of address, and raises an exception
-
While that is happening line 8 speculatively fetches some offset from the probe array, caching it
-
Once line 5 retires, the exception resolves and the CPU registers and pipeline are flushed
1 ; rcx = secret address
2 ; rbx = probe array
3
4 retry:
5 mov a1, byte [rcx]
6 shl rax, 0xc
7 jz retry
8 mov rbx, qword [rbx + rax]
-
Using a Flush+Reload attack, access time to the probe_array can be measured
-
By timing access to each entry in the probe array, the entry corresponding to the value of the secret byte becomes apparent, (in this case it was 84)
-
User processes don't know physical addresses, they use a virtualised address space
-
User processes may need to access the kernel, so kernel memory is mapped within this space
-
Since the kernel manages everything, the entire physical memory is mapped within the kernel address space
-
In the past, the address of physical memory was easy to figure out for a given kernel
-
Within the past 15 years, ASLR has been implemented in all main OSs to randomize these addresses
-
Randomization is limited to 40 bits, so on a machine with 8GB of memory, only 128 tests are needed to find the actual physical memory
-
Once found, the attacker can proceed to dump the entire physical memory
-
Since steps 1 and 2 are much faster than step 3, performance can be improved by only reading 1 bit at a time:
-
In this case, only one read of the probe array is needed:
- if it's cached it's a 1
- else it's a 0
-
Using this technique, an attacker can read any portion of physical memory at
500KB/s, with an error rate of <0.04%
-
The fix for meltdown involves remapping the virtual address space every time a program makes a system call to the kernel
-
This means that the kernel memory won't be in unprivileged processes' address spaces, but will slow down certain operations
-
This has been patched in all major OSs ("Kernel Page Table Isolation" or KPTI for linux)
-
Make sure your computers are up-to-date to minimize the risks
-
Allows attacker to trick a victim process into revealing secret memory from their address space
-
Involves training the victim code to speculatively execute code it otherwise wouldn't
-
2 approaches involving : a) Training the outcome of a conditional branch in the victim b) Training the call address of a victim's call
-
Consider some victim code:
1 if (x < array1_size) 2 y = array2[array1[x] * 256];
-
Calling this code (e.g. through an API) with allowed
x
multiple times trains the CPU to speculatively execute line 2. -
Now, calling with some malicious
x
, line 2 can cache memory based on the target value, as previously mentioned -
Selecting appropriate values for
x
allows an attacker to read arbitrary memory from the victim's address space -
For example:
- Accessing secrets from a cryptographic library
- Accessing arbitrary browser data from a sandboxed JS environment
-
In some cases, a victim will make a branch call while the attacker has control over some CPU registers
-
E.g. A function making a function call while dealing with externally provided data
-
The attacker can train the BTB to branch to some gadget code instead of the correct destination
-
This way, data from addresses calculated from those registers can be leaked
-
Much harder to fix than Meltdown, KPTI and similar fixes won't work
-
Fixes can include allowing indirect branches to be isolated from speculative execution
-
Likely to be an issue for a while