what is heap? Please read about heap vs. stack if unfamiliar.
your computer has storage (in this case, random access memory) which is always available for use by C executables. any given program starts at a location in that 'storage', and the stack specific to that program grows from that location onwards. the stack is fast to access, but it's only around 8 MB of space. in comparison, the heap is for dynamically allocated memory, which can be much larger than 8 megabytes. as a downside, it's slightly slower to access (and you have to free the allocated memory manually)
note: allocating on the stack is faster than on the heap (freeing also takes some time in heap malloc), but once a chunk of information is allocated on the heap, it can be used with the same speed as information on the stack. realistically, this heavily depends on usage pattern because modern systems utilize caches (which allow for very fast access if all the data you use are placed close together in memory). for example, a linked list may have its node living at arbitrary spots on the heap, which will usually trash cache often, but a contiguous array will store its elements next to each other, and so iterating through elements is fast.
malloc(size_t size)
Allocates size bytes and returns pointer
calloc(size_t, nmemb, size_t size)
Allocates nmemb*size bytes and zeros out the memory.
free(void *ptr)
Frees the memory chunk indicated by pointer
what you need to know: dynamically allocating memory stores the 'information' on the heap instead of the stack (pictured below). to "free" this information stored on the heap, you use the free()
function.
note: this diagram is outdated for modern use. there are multiple program images present in memory address space (main program + libraries) which have their own data, text, bss, etc. sections. the above heap and stack model diagram above is managed by brk (and sbrk) system calls, called program break. it extends beyond the bss section of the program (where all initialised globals live) and grows toward the stack. On modern systems, however, the location of the stack & heap is completely arbitrary, and they're usually in disconnected areas. This is more true on 64-bit systems, where big addresses (at least of 48-bits, sometimes up to 52-bits of meaningful data) allow for huge gaps between memory segments. The memory layout is now usually randomized, through mechanism called ASLR (address-space layout randomization) to prevent some security issues concerning an attacker being able to jump to arbitrary location given they know the address to target. Pretty much, with ASLR, they can't. However, that's only damage control and a proper system must ensure this kind of situation can't happen in the first place.
haha! probably not what you think! actually, i will get back to this. first, let me diagram a simplified version of allocating memory on heap:
In this example, i've allocated memory on the heap, and the address to each respective chunk of memory is stored in x
and y
. if i use free(x)
or free(y)
, it does not remove the allocated memory from the heap (this is the counterintuitive part)! a computer can either store or retrieve a value, not delete (although, storing a value can replace the previous value, inherently destroying the previous contents of the memory cell. this is still different than removing a chunk of memory of the heap). Instead of removing the allocated chunk of memory from the heap, the heap manager will put the same chunk of memory into a cache and label it as "available for usage". This is scuffed! It is also optimal! the reusage and recycling of memory is what helps your computer not explode over time.
note:
x
andy
are pointers to an arbitrary location (with a unique address) of 0x10 bytes of allocated memory on the heap
note: because information stored on the heap is stored in the RAM (usually volatile memory), everything in RAM will be lost when the computer is turned off. this is a way of "deleting" memory chunks stored on the heap through technicality.
each of these new locations on the heap has a unique address. dumbed down, you can imagine computer memory as a sequence of storage compartments called memory cells
. each memory cell has a unique address (indicating its relative position in memory). most computers have millions of individual memory cells (each with their own unique address), and the information stored inside each cell is called the contents
of the cell. every memory cell has contents, but normally we only know the information of the cells whose contents we replace manually (variable assignment, etc.) or are for example replaced by an automated process during compilation (stored program concept).
when you call free(x)
, it does not remove the memory chunk (with a unique address) from the heap. instead, it stores the memory chunk in a cache (specific cache bin depends on byte size of chunk) and labels the chunk as 'available for usage', allowing for the same chunk and address to be used by another memory allocation process (pointers, etc.).
note: the algorithm for
free()
first checks to see if the chunk before or after the freed chunk is also freed (exists in cache). if this is the case, the chunk will merge with the other freed chunk to create a bigger freed chunk. if there are no neighboring freed chunks, thefree()
algorithm will then mark the freed chunk as "available for usage" and place it in its appropriate bin/cache based on its size.
note: different types of caches/bins exist. tcache and fast bins are considered "in use", so they will not merge with adjacent freed chunks.
processes run with one or more threads at the same time. all threads in a specific process have access to the same heap, but each individual thread has access to its own stack (cool). all threads have access to the same heap. this can be an issue if one thread is writing on the heap while another thread is reading from the same address at the same time. a simple solution to this is to use a lock, which is pretty much an activity boolean check before writing. unfortunately, this means there is a lot of stall time spent waiting for other operations to finish. while the stall time is minimal, the heap is always in use by all threads, so it can quickly add up to a slower program.
tcache (per-thread cache) is designed to use per-thread arenas, which is the manager's solution to the problem described above. in this context, an arena is a big blob of neighboring memory chunks. for example, the heap is an arena (typically referred to as the main arena). the 'per-thread arena' model is based on the creation of a new arena when memory is allocated for the first time within a specific thread. therefore, one can expect no collision among threads writing on the heap because each thread is working within their own arena.
each thread has 64 different singly-linked-list tcache bins. each bin can only hold seven different memory chunks. if a bin of a specific byte size runs out of space (or if chunk size exceeds max tcache bin size), the manager will resort to the normal heap lock process with the fast bin (or small bin, large bin, unsorted bin; depends on size) recycling process.
if the memory chunk is of size 16 bytes, it will get cached in the bytes 0..24 bin (24). likewise, if 24 bytes, then bytes 0..24 bin (24). in comparison, a 32 byte chunk would get stored in the bytes 24..40 bin (40). Only seven different chunks can be stored within each sized bin.
in the context of the x
pointer created above, a call of free(x)
stores the memory chunk in a cache (specific cache bin depends on byte size of chunk) and labels the chunk as 'available for usage', allowing for the same chunk and address to be used by another memory allocation process.
note: the
memory chunk stored in cache after being freed
is now labeled asavailable for usage
by the computer.
the most important part for this exploit is understanding that x
still points to the same memory chunk address on the heap despite x
being freed. therefore, when y = malloc(0x10);
is called and new memory is allocated of the same byte size as x
, the manager will check the tcache to see if a chunk of that size exists (which is true now) and then 'assigns' that chunk on the heap to the new pointer of the allocated memory (y
in this case) even though x
still points to the same chunk...
...because x
still points to the same address that y
's chunk exists at, they both are pointing to the same memory chunk.
therefore, changing the value of the chunk that x
is pointing to will directly change the value that y
is pulling from. tada!