We recently worked on a Linux system security-related project in which we needed to hook important Linux kernel function calls such as for opening files and launching processes. We needed it to enable system activity monitoring and preemptively block suspicious processes.
Eventually, we invented an efficient approach for hooking any function in the kernel by its name and executing our code around its calls with the help of ftrace, a Linux kernel tracing feature. In this first part of our three-part series, we describe the four approaches we tried for Linux function hooking prior to coming up with a new solution. We also give detailed information on the main pros and cons of each of these approaches.
Four possible solutions
There are several ways that you can try to intercept critical functions of Linux kernel:
- Using the Linux Security API
- Modifying the system call table
- Using the kprobes tool
- Using the ftrace utility
Below, we talk in detail about each of these kernel options.
Using the Linux Security API
At first, we thought that hooking functions with the Linux Security API would be the best choice since the interface was designed for this particular purpose. Critical points of the kernel code contain security function calls that can cause callbacks installed by the security module. The module can study the context of a specific operation and decide whether to permit or prohibit it.
Unfortunately, the Linux Security API has two major limitations:
- Security modules can’t be loaded dynamically, so we would need to rebuild the kernel since these modules are part of it.
- With some minor exceptions, a system can’t have multiple security modules.
While kernel developers have different opinions on whether a system can contain multiple security modules, the fact that a module can’t be loaded dynamically is a given. To ensure the constant security of the system from the start, the security module has to be part of the kernel.
So in order to use the Linux Security API, we would need to build a custom Linux kernel and integrate an additional module with AppArmor or SELinux, that are used in popular distributions. This option, however, didn’t suit our client, so we looked for another solution.
Modifying the system call table
Since monitoring was required mostly for actions performed by user applications, we could implement it on the system call level. All Linux system call handlers are stored in the table sys_call_table. Changing the values in this table leads to changing the system behavior. As a result, we can hook any system call by saving the old handler value and adding our own handler to the table.
This approach also has some pros and cons. The main advantages of changing values in the system call table are the following:
- Full control over all system calls as the only kernel interface for user applications. Thus, you won’t miss any important actions performed by a user process.
- Minor performance overhead. There are one-time investments in updating the system call table. The two other expenses are the inevitable monitoring payload and the extra function call we need to call the original system call handler.
- Minor kernel requirements. In theory, this approach can be used for nearly any system because you don’t need specific kernel features in order to modify the system call table.
Still, this approach also has several drawbacks:
Technically complex implementation. While replacing the values in the table isn’t difficult, there are several additional tasks that require certain qualifications and some non-obvious solutions:
- Finding the system call table
- Bypassing kernel write protection of the table’s memory region
- Ensuring safe performance of the replacement process
Solving these problems means that developers have to waste more time on realizing, supporting, and understanding the process.
Some handlers can’t be replaced. In Linux kernels prior to version 4.16, system call processing for the x86_64 architecture has some additional optimizations. Some of these optimizations require the system call handler to be implemented in the assembler. These kinds of handlers are either hard or impossible to replace with custom handlers written in C. Furthermore, the fact that different kernel versions use different optimizations boosts the technical complexity of the task even more.
Only system calls are hooked. Since this approach allows you to replace system call handlers, it limits entry points significantly. All additional checks can be performed either immediately before or after a system call, and we only have system call arguments and their return values. As a result, sometimes we may need to double-check both access permissions of the process and the validity of system call arguments. Plus, in some cases the need to copy user process memory twice creates additional overhead charges. For instance, when the argument is passed through a pointer, there will be two copies: the one that you make for yourself and the second one made by the original handler. Sometimes, system calls also provide low granularity of events, so you may need to apply additional filters to get rid of the noise.
At first, we tried to alter the system call table so we could cover as many systems as possible, and we even implemented this approach successfully. But there are several specific features of the x86_64 architecture and a few hooked call limitations that we didn’t know about. Ensuring support for system calls related to the launch of specific new processes – clone () and execve () – turned out to be critical for us. This is why we continued to search for other solutions.
One of our remaining options was to use kprobes – a specific API designed for Linux kernel tracing and debugging. Kprobes allows you to install pre-handlers and post-handlers for any kernel instruction as well as for function-entry and function-return handlers. Handlers get access to registers and can alter them. This way, we could possibly get a chance to both monitor the work process and alter it.
The main benefits of using kprobes for tracing Linux kernel functions are the following:
- A mature API. Kprobes has been improving constantly since 2002. The utility has a well-documented interface and the majority of pitfalls have already been discovered and dealt with.
- The possibility to trace any point in the kernel. Kprobes is implemented via breakpoints (the int3 instruction) embedded in the executable kernel code. Thus, you can set the trace point literally in any part of any function as long as you know its location. Plus, you can implement kretprobes by switching the return address on the stack and trace any function’s return (except for ones that don’t return control at all).
Kprobes also has its disadvantages, however:
Technical complexity. Kprobes is only a tool for setting a breakpoint at a particular place in the kernel. To get function arguments or local variable values, you need to know where exactly on the stack and in what registers they’re located and get them out of there manually. Also, to block a function call you need to manually modify the state of the process so you can trick it into thinking that it’s already returned control from the function.
Jprobes is deprecated. Jprobes is a specialized kprobes version meant to make it easier to perform a Linux kernel trace. Jprobes can extract function arguments from the registers or the stack and call your handler, but the handler and the traced function should have the same signatures. The only problem is that jprobes is deprecated and has been removed from the latest kernels.
Nontrivial overhead. Even though it’s a one-time procedure, positioning breakpoints is quite costly. While breakpoints don’t affect the rest of the functions, their processing is also relatively expensive. Fortunately, the costs of using kprobes can be reduced significantly by using a jump-optimization implemented for the x86_64 architecture. Still, the cost of kprobes surpasses that of modifying the system call table.
Kretprobes limitations. The kretprobes feature is implemented by substituting the return address on the stack. To get back to the original address after processing is over, kretprobes needs to keep that original address somewhere. Addresses are stored in a buffer of a fixed size. If the buffer is overloaded, like when the system performs too many simultaneous calls of the traced function, kretprobes will skip some operations.
Disabled preemption. Kprobes is based on interruptions and fiddles with the processor registers. So in order to perform synchronization, all handlers need to be executed with a disabled preemption. As a result, there are several restrictions for the handlers: you can’t wait in them, meaning you can’t allocate large amounts of memory, deal with input-output, sleep in semaphores and timers, and so on.
Still, if all you need is to trace particular instructions inside a function, kprobes surely can be of use.
There’s also a classic way to configure kernel function hooking: by replacing the instructions at the beginning of a function with an unconditional jump leading to your handler. The original instructions are moved to a different location and are executed right before jumping back to the intercepted function. Thus, with the help of only two jumps, you can splice your code into a function.
This approach works the same way as the kprobes jump optimization. Using splicing, you can get the same results as you get using kprobes but with much lower expenses and with full control over the process.
The advantages of using splicing are pretty obvious:
- Minimum requirements for the kernel. Splicing doesn’t require any specific options in the kernel and can be implemented at the beginning of any function. All you need is the function’s address.
- Minimum overhead costs. The traced code needs to perform only two unconditional jumps to hand over control to the handler and get control back. These jumps are easy to predict for the processor and also are quite inexpensive.
However, this approach has one major disadvantage – technical complexity. Replacing the machine code in a function isn’t that easy. Here are only a few things you need to accomplish in order to use splicing:
- Synchronize the hook installation and removal (in case the function is called during the instruction replacement)
- Bypass the write protection of memory regions with executable code
- Invalidate CPU caches after instructions are replaced
- Disassemble replaced instructions in order to copy them as a whole
- Check that there are no jumps in the replaced part of the function
- Check that the replaced part of the function can be moved to a different place
Of course, you can use the livepatch framework and look at kprobes for some hints, but the final solution still remains too complex. And every new implementation of this solution will contain too many sleeping problems.
If you’re ready to deal with these demons hiding in your code, then splicing can be a pretty useful approach for hooking Linux kernel functions. But since we didn’t like this option, we left it as an alternative in case we couldn’t find anything better.
Is there a fifth approach?
When we were researching this topic, our attention was drawn to Linux ftrace, a framework you can use to trace Linux kernel function calls. And while performing Linux kernel tracing with ftrace is common practice, this framework also can be used as an alternative to jprobes. And, as it turned out, ftrace suits our needs for tracing function calls even better than jprobes.
Ftrace allows you to hook critical Linux kernel functions by their names, and hooks can be installed without rebuilding the kernel. In the next part of our series, we’ll talk more about ftrace. What is an ftrace? How does ftrace work? We’ll answer these questions and give you a detailed ftrace example so you can understand the process better. We’ll also tell you about the main ftrace pros and cons. Wait for the second part of this series to get more details on this unusual approach.
There are many ways you can try to hook critical functions in the Linux kernel. We’ve described the four most common approaches for accomplishing this task and explained the benefits and drawbacks of each. In the next part of our three-part series, we’ll tell you more about the solution that our team of experts came up with in the end – hooking Linux kernel functions with ftrace.
Have any questions? You can learn more about our Linux kernel development experience here or contact us by filling out the form below.