-
Notifications
You must be signed in to change notification settings - Fork 0
PltHook
This documentation is not meant to provide a comprehensive collection of everything I learned during the development of PltHook. By writing this document I intend to:
- Summarize key concepts, challenges, corresponding solutions, and collect links for study materials.
- Provide a memo for myself
- Give professors a clear view of my progress
- Provide a good starting material for people who would like to work on this after me.
Id | Task | Status | Comments |
---|---|---|---|
1 | Understand dynamic linking mechanism | Finished | |
2 | Write code to access PLT&GOT table | Finished | Used plthook project as a reference and implemented my own version of plthook. |
3 | Replace function pointer in GOT with a customized C/C++ handler. | Finished | |
4 | Replace function pointer in GOT with a customized assembly handler. | Finished | |
5 | Replace GOT function pointer in GOT with a customized assembly handler that JMP to the original function. | Finished | |
6 | Check stack to see if it's possible to get caller address directly from stack | Finished | |
7 | Hook all external function calls before main function | WIP | |
8 | Build a demo to intercept single-threaded program. (Intercept only one PLT table) | Finished | |
9 | Record timestamps to thread-local variables | WIP | |
7 | Build a demo to intercept multi-threaded programs. (Intercept only one PLT table) | ||
8 | Build a demo to intercept all PLT tables | ||
9 | Aggregate data by symbol name | ||
10 | Run on Parsec and calculate overhead |
The ultimate goal for plthook library is to intercept all external function calls without modifying user's program. To achieve this, we need to have a good command of dynamic link mechanism in OS.
ELF is the format of executable files in *nix system. On disks, all executables are stored in this format. In memory, all executables are loaded into ELF format. To understand dynamic linking mechanism, we first need to understand ELF Well.
Wikipedia; Ch 3 and 7 of Book 程序员的自我修养—链接、装载与库 provides a comprehensive introduction for this part.
- Sections and Segments
The first important thing in ELF file is section and segment. Data of the same function are grouped into a section. For example, all assembly code are stored in .text section and all symbol name is stored in .symtab. This information is important for linkers, because they need to perform different action based on different kind of data. However, when code are loaded into memory into an ELF image. The operating system don't care about the content typle anymore. It only cares about what's the permission of different area. To reduce fragmentation, sections with the same permission are grouped into a segment. So a segment can contain >=0 sections.
- Relocation Table
Address for external symbols can only be set after linking. Compilers would use pseudo address for those symbols during compilation. So we need to know how to find those symbols in a compiled program. For relocatable files (eg: *.o) r_offset is the offset of that symbol with respect to the start of its segment. For executable files (eg: *.so/File with no extension) r_offset is the virtual address of the first byte of that external symbol. r_info contains the index of that symbol and the type of that symbol (Type specifies how to calculate the actual address). We could use relocation table to get all external symbols. If .text has symbol that requires relocation, then a .rel.text table is generated.
Ch 6 of Book 程序员的自我修养—链接、装载与库 provides a comprehensive introduction for this part.
ELF files are loaded into memory through ELF iamges.
Use /proc/self/map to check loading address
GCC use AT&T syntax, but AT&T syntax don't seem to have a complete brochre. The correct way to do things is to take intel's x86 assembly manual as a reference and convert code into AT&T syntax.
I should use far jump.
When the CPU executes an unconditional transfer, the offset of destination is moved into the instruction pointer, causing execution to continue at the new location.
The hook function don't know the signature of original function. So it's impossible to call original function using C/C++.
The solution to this is to use in-line assembly.
After replacing the GOT entry with a hook function. When that hook function is called there's no chance to know which is the original function.
Solution 1: Generate a source code file on-the-fly. Create a function for each external symbol, hard-code function address in hook function. Then, compile and load that .so file. Replace GOT with loaded function.
Solution 2:
Hard-code binary code in memory and let GOT point to corresponding code.
Inline hook (Related) https://blog.csdn.net/arvon2012/article/details/7766439