Ftrace is a Linux utility that ’s usually used for tracing kernel functions. But as we looked for a useful solution that would allow us to enable system activity monitoring and block suspicious processes, we discovered that Linux ftrace can also be used for hooking function calls.
This is the final part of our three-part series dedicated to hooking Linux kernel functions with ftrace. In this article, we focus on the main ftrace pros and cons and describe some unexpected surprises we’ve faced when hooking Linux kernel functions with this utility. Read the first part of our series to learn about four other approaches that can be used for hooking Linux kernel function calls. Are you wondering: What is an ftrace? How does ftrace work? If so, go to the second part of our series to get answers to these questions, see an ftrace example, and learn more about how you can use ftrace to configure Linux kernel function hooking.
Ftrace makes hooking Linux kernel functions much easier and has several crucial advantages.
- A mature API and simple code. Leveraging ready-to-use interfaces in the kernel significantly reduces code complexity. You can hook your kernel functions with ftrace by making only a couple of function calls, filling in two structure fields, and adding a bit of magic in the callback. The rest of the code is just business logic executed around the traced function.
- Ability to trace any function by name. Linux kernel tracing with ftrace is quite a simple process – writing the function name in a regular string is enough to point to the one you need. You don’t need to struggle with the linker, scan the memory, or investigate internal kernel data structures. As long as you know their names, you can trace your kernel functions with ftrace even if those functions aren’t exported for the modules.
But just like the other approaches that we’ve described in this series, ftrace has a couple of drawbacks.
Kernel configuration requirements. There are several kernel requirements needed to ensure successful ftrace Linux kernel tracing:
- The list of kallsyms symbols for searching functions by name
- The ftrace framework as a whole for performing tracing
- Ftrace options crucial for hooking functions
All these features can be disabled in the kernel configuration since they aren’t critical for the system’s functioning. Usually, however, the kernels used by popular distributions still contain all these kernel options as they don’t affect system performance significantly and may be useful for debugging. Still, you’d better keep these requirements in mind in case you need to support some particular kernels.
Overhead costs. Since ftrace doesn’t use breakpoints, it has lower overhead costs than kprobes. However, the overhead costs are higher than for splicing manually. In fact, dynamic ftrace is a variation of splicing which executes the unneeded ftrace code and other callbacks.
Functions are wrapped as a whole. Just as with usual splicing, ftrace wraps the functions as a whole. And while splicing technically can be executed in any part of the function, ftrace works only at the entry point. You can see this limitation as a disadvantage, but usually it doesn’t cause any complications.
Double ftrace calls. As we’ve explained before, using the parent_ip pointer for analysis leads to calling ftrace twice for the same hooked function. This adds some overhead costs and can disrupt the readings of other traces because they’ll see twice as many calls. This issue can be fixed by moving the original function address five bytes further (the length of the call instruction) so you can basically spring over ftrace.
Let’s take a closer look at some of these disadvantages.
The kernel has to support both ftrace and kallsyms. This requires enabling two configuration options:
Next, ftrace has to support a dynamic register modification, which is the responsibility of the following option:
To access the FTRACE_OPS_FL_IPMODIFY flag, the kernel you use has to be based on version 3.19 or higher. Older kernel versions can still modify the register %rip, but from version 3.19, this register can be modified only after setting the flag. In older versions of the kernel, the presence of this flag will lead to a compilation error. For newer versions, the absence of this flag means a non-operating hook.
Last but not least, we need to pay attention to the ftrace call location inside the function. The ftrace call must be located at the beginning of the function, before the function prologue (where the stack frame is formed and the space for local variables is allocated). The following option takes this feature into account:
While the x86_64 architecture does support this option, the i386 architecture doesn’t. The compiler can’t insert an ftrace call before the function prologue due to ABI limitations of the i386 architecture. As a result, by the time you perform an ftrace call the function stack has already been modified, and changing the value of the register isn’t enough for hooking the function. You’ll also need to undo the actions executed in the prologue, which differ from function to function.
This is why ftrace function hooking doesn’t support a 32-bit x86 architecture. In theory, you can still implement this approach by generating and executing an anti-prologue, for instance, but it’ll significantly boost the technical complexity.
At the testing stage, we faced one particular peculiarity: hooking functions on some distributions led to the permanent hanging of the system. Of course, this problem occurred only on systems that were different from those used by our developers. We also couldn’t reproduce the problem with the initial hooking prototype on any distributions or kernel versions.
According to debugging, the system got stuck inside the hooked function. For some unknown reason, the parent_ip still pointed to the kernel instead of the function wrapper when calling the original function inside the ftrace callback. This launched an endless loop wherein ftrace called our wrapper again and again while doing nothing useful.
Fortunately, we had both working and broken code and eventually discovered what was causing the problem. When we unified the code and got rid of the pieces we didn’t need at the moment, we narrowed down the differences between the two versions of the wrapper function code.
This is the stable code:
And this is the code that caused the system to hang:
How can the logging level possibly affect system behavior? Surprisingly enough, when we took a closer look at the machine code of these two functions, it became obvious that the reason behind these problems was the compiler.
It turns out that the pr_devel() calls are expanded into no-op. This printk-macro version is used for logging at the development stage. And since these logs pose no interest at the operating stage, the system simply cuts them out of the code automatically unless you activate the DEBUG macro. After that, the compiler sees the function like this:
And this is where optimizations take the stage. In our case, the so-called tail call optimization was activated. If a function calls another and returns its value immediately, this optimization lets the compiler replace a function call instruction with a cheaper direct jump to the function’s body. This is what this call looks like in machine code:
And this is an example of the broken call:
The first CALL instruction is the exact same __fentry__() call that the compiler inserts at the beginning of all functions. But after that, the broken and the stable code act differently. In the stable code, we can see the real_sys_execve call (via a pointer stored in memory) performed by the CALL instruction, which is followed by fh_sys_execve() with the help of the RET instruction. In the broken code, however, there’s a direct jump to the real_sys_execve() function performed by JMP.
The tail call optimization allows you to save some time by not allocating a useless stack frame that includes the return address that the CALL instruction stores in the stack. But since we’re using parent_ip to decide whether we need to hook, the accuracy of the return address is crucial for us. After optimization, the fh_sys_execve() function doesn’t save the new address on the stack anymore, so there’s only the old one leading to the kernel. And this is why the parent_ip keeps pointing inside the kernel and that endless loop appears in the first place.
This is also the main reason why the problem appeared only on some distributions. Different distributions use different sets of compilation flags for compiling the modules. And in all the problem distributions, the tail call optimization was active by default.
We managed to solve this problem by turning off tail call optimization for the entire file with the wrapper functions:
- #pragma GCC optimize("-fno-optimize-sibling-calls")
For further hooking experiments, you can use the full kernel module code from GitHub.
While developers typically use ftrace to trace Linux kernel function calls, this utility showed itself to be rather useful for hooking Linux kernel functions as well. And even though this approach has some disadvantages, it gives you one crucial benefit: overall simplicity of both the code and the hooking process.
Apriorit professionals can handle tasks of any complexity level, from writing a simple Linux device driver to building complex kernel solutions. Fill in the form below to learn more about how the Apriorit team can help you reach your goals.