mhook

How we have increased mhook’s performance, enhanced its capabilities and eliminated certain bugs.

Contents:

Is there any other Similar Libraries?

Problems with Mhook Library

Increasing Performance: Case # 1

Increasing Performance: Case # 2

Getting Project Files for Different IDEs

Hooking a function with a Conditional Jump in the First 5 Bytes

Bug: continuous recursion

Bug: deadlock

Bug: the hook has a wrong function

Conclusion

Written by: Artur Bulakaiev, Software Developer in Driver Development team.

 

Mhook is an open source API hooking library for intercepting function calls (setting hooks). In other words, it is a library for embedding the code into threads of other applications. It comes in handy when you need to monitor and log system function calls or run your own code before, after, or instead of the system call. To intercept a function call, mhook replaces 5 bytes at the address to be intercepted with the unconditional jump code (jmp #addr) for the interception function. Then mhook transfers those removed 5 bytes to a special allocated place which is called a trampoline. When the interception function becomes inactive, it can make an unconditional jump to the springboard that has those 5 stored bytes running. Then a jump to the intercepted code will happen. To learn more about how to use mhook for API hooking, you can read our mhook tutorial.

Is there any other Similar Libraries?

Detours is another API hooking library. In the mhook vs detours contest, it is hard to highlight a certain winner, though we at Apriorit use mhook due to its convenience. Mhook supports both x64 and x86 platforms for free while Detours supports only x86 with a noncommercial license. To get an official x64 support, you have to purchase a license. The main advantage of Detours is that it supports transactional unhooking and hooking.

Problems with Mhook Library

We often use mhook to solve tasks within the projects related to cybersecurity and reverse engineering. When using mhook library, we have faced the following issues:

  • low performance with a large number of threads in the system and when setting multiple hooks in a row;
  • necessity to manually create projects for all integrated development environments;
  • impossibility to hook those functions that do not have suitable first five bytes for recording the jump to the hook;
  • infinite recursion (bug);
  • deadlock (bug);
  • a hook leads to the wrong function (bug).

As a result, we have improved the original mhook version and made our package public. In this article, we will describe the problems we faced during our work with the original solution, and how we solved them within our mhook enhancements.

Increasing Performance: Case # 1

Increasing the performance of the mhook library using the NtQuerySystemInformation function.

Issue Description

Mhook starts working very slowly with a large number of threads in the system.

Causes

When setting a hook,information about processes and threads is used to suspend all threads of the current process except its own thread and change the function address to the one specified by the developer. As a result, despite the fast speed of getting a thread status snapshot using CreateToolhelp32Snapshot, the Thread32Next function starts working very slowly with the increasing number of threads in the system. Microsoft does not open its source code, but you can find similar methods in the ReactOS project. it seems that each Thread32Next call triggers the NtMapViewOfSection that performs a quite resource-intensive operation.

Solution

Instead of CreateToolhelp32Snapshot, Thread32First, and Thread32Next from tlhelp32.h, we used the NtQuerySystemInformation function.

Our tests showed that when using CreateToolhelp32Snapshot, calling the Thread32Next is about 10 times more expensive in terms of resource usage than getting a snapshot. While using NtQuerySystemInformation, getting the snapshot is cheap enough (cheaper than initial implenetation), and the thread iterations are almost free (about 10 times cheaper than the snapshot), basically coming down to calculating pointers. In general, the NtQuerySystemInformation-based approach is about 10 times faster than a CreateToolhelp32Snapshot-based one. In a system with about 3000 threads, setting one hook takes about 0.02 seconds, while the original method could take as much as 0.14 seconds per one hook.

speed of setting and removing one hook

The measured code:

#include &ltwindows.h&gt
#include &ltvector&gt
#include &ltthread&gt
#include &ltchrono&gt
#include &ltiostream&gt
#include &lttlhelp32.h&gt
#include "mhook-lib/mhook.h"
 
using namespace std;
using namespace chrono_literals;
 
auto TrueSystemMetrics = GetSystemMetrics;
 
// This is the function that will replace GetSystemMetrics once the hook is in place
ULONG WINAPI HookGetSystemMetrics(IN int index)
{
	MessageBoxW(nullptr, L"test", L"test", 0);
	return TrueSystemMetrics(index);
}
 
void testPerformance()
{
	auto startTime = chrono::high_resolution_clock::now();
 
	Mhook_SetHook((PVOID*)&TrueSystemMetrics, HookGetSystemMetrics);
	Mhook_Unhook((PVOID*)&TrueSystemMetrics);
 
	auto timePassed = chrono::duration_cast<chrono::duration&ltdouble&gt>(chrono::high_resolution_clock::now() - startTime);
 
	cout &lt&lt "Time passed: " &lt&lt timePassed.count() &lt&lt endl;
}
 
int main()
{
	HANDLE snap = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
	
    THREADENTRY32 te;
	te.dwSize = sizeof(te);
 
	// count threads in system
	DWORD initialThreadCount = 0;
 
	if (Thread32First(snap, &te))
	{
    	do
        {
      	  ++initialThreadCount;
    	}
        while (Thread32Next(snap, &te));
	}
 
	CloseHandle(snap);
 
	cout &lt&lt "Initial threads count: " &lt&lt initialThreadCount &lt&lt endl;
 
	testPerformance();
 
	vector&ltthread&gt threadsToTest;
 
	const int kThreadsCount = 1000;
	const int kThreadsCountStep = 100;
	bool testFinished = false;
 
	for (int k = kThreadsCountStep; k &lt= kThreadsCount; k += kThreadsCountStep)
	{
    	for (int i = 0; i &lt kThreadsCountStep; ++i)
    	{
 	       threadsToTest.push_back(thread([&amp]()
        	{
            	while (!testFinished)
            	{                	
                    this_thread::sleep_for(10ms);
            	}
        	}));
    	}
    	cout &lt&lt "Start Threads count increased by " &lt&lt k &lt&lt endl;
    	testPerformance();
	}
 
	testFinished = true;
 
	for (int i = 0; i &lt kThreadsCount; ++i)
	{
    	threadsToTest[i].join();
	}
 
	cout &lt&lt "End" &lt&lt endl;
	cin.get();
 
	return 0;
}

Increasing Performance: Case #2

Increasing the performance of the mhook library using the Mhook_SetHookEx method.

Issue Description

Setting multiple hooks in a row works slowly.

Causes

As we have already mentioned, you have to suspend all threads of the current process to set the hook. If you set 100 hooks in a row, then you will have to suspend threads 100 times and restart them 100 times, which is obviously inefficient.

Solution

We have added the Mhook_SetHookEx method to set several hooks during only one thread suspension. The input retrieves an array of HOOK_INFO structures containing the same information that used to transmit to Mhook_SetHook.

Using the Mhook_SetHookEx method in mhook: example:

struct HOOK_INFO
{
	PVOID *ppSystemFunction;	// pointer to pointer to function to be hooked
	PVOID pHookFunction;    	// hook function
};
 
// returns number of successfully set hooks
int Mhook_SetHookEx(HOOK_INFO* hooks, int hookCount);
int Mhook_UnhookEx(PVOID** hooks, int hookCount);

This modification provides substantial performance increase that is relative to the number of simultaneously set hooks, as compared to the same hooks being set consecutively.

For example, here is the comparison of the performance when setting three hooks with both methods:

comparison of the performance when setting three hooks

performance when setting three hooks

On average, the speed of setting hooks using the Mhook_SetHookEx method is about 2.8 times for the three hooks in comparison with installing one hook using a traditional setHook method.

The code for this test is basically the same as the previous one. All you need to do is to set several hooks in the testPerformance function using the Mhook_SetHook method and the Mhook_SetHookEx one.

Getting Project Files for Different IDEs

In this paragraph, we will describe how we have managed to get project files for different IDEs instead of operating with a single .sln file.

Issue Description

You need to manually create projects for all integrated development environments (IDEs) except Visual Studio. It is difficult to work with different versions of Visual Studio.

Causes

Mhook has only a .sln file for Visual Studio 2010. Furthermore, there is no project auto-generation system.

Solution

We have decided to implement CMake which is a popular cross-platform build automation solution. It allows us to easily get project files for different IDEs. without using Visual Studio and its versions.

Hooking a Function with a Conditional Jump in the First 5 Bytes

Hooking functions that contain no suitable first 5 bytes for a hook.

Issue Description

Some functions do not have suitable first five bytes for recording the jump to the hook. For example, when assembling with msvs 2015 in x64 Release with the /MT switch, the free function does not contain suitable first five bytes.

00007FF680497214 48 85 C9         	test	    rcx,rcx 
00007FF680497217 74 37            	je      	_free_base+3Ch (07FF680497250h) 
00007FF680497219 53               	push    	rbx 
00007FF68049721A 48 83 EC 20      	sub     	rsp,20h 
00007FF68049721E 4C 8B C1         	mov         r8,rcx 
00007FF680497221 33 D2            	xor     	edx,edx 

Causes

This situation occurs when the function code assembler contains a conditional or unconditional jump or call to another Windows API function in the first 5 bytes. In this case, mhook cannot transfer this code to its layer, because the jump addressing will be incorrect and these jumps will be invalid. Mhook can handle unconditional jumps but not conditional ones.

Solution

We solved this issue using the above mentioned free function, which has the je operator at the start. In this case, conditional jump should be transferred to the mhook layer and then the instruction and the jump address should be changed so that it points to the same location as before the transfer.

The free function used near je jump that sets a one-byte offset from the current position. The mhook layer can be located farther than the path that can be stored in one byte. That’s why we replaced the jump instruction with je with the rel32 argument (a 32-bit offset from the current position).

The system compiles a new jump address by subtracting the target address in the layer from the current address where the jump used to lead).

This solution is suitable for near je and near jne since their opcodes and opcodes of the corresponding long jumps are almost the same.

Bug: Infinite Recursion

Eliminating a continuous recursion.

Issue Description

When trying to set hooks for certain system functions, various issues have occurred, such as a call stack overflow.

Causes

Functions are called directly inside the mhook. They are called after setting the jump leading to the hook from the system function and before modifying the layer that leads back to the system function.

Solution

We transferred the layer recording higher in the code, so that between the jump setting and the layer modification, there was no calls to the system functions.

Bug: Deadlock

Eliminating a deadlock.

Issue Description

After migrating to NtQuerySystemInformation, deadlocks appear in mhook.

Causes

When migrating to NtQuerySystemInformation, we allocated a dynamic buffer in the heap where thread information is stored. CreateToolhelp32Snapshot handles this itself and returns only HANDLE.

How the whole process worked:

  1. Allocating the buffer to get information about the threads;
  2. Suspending all threads;
  3. Setting the hook;
  4. Clearing the buffer with information about the threads;
  5. Allowing the thread execution.

These sequens contains a hard to detect bug. If any thread manages to grab the free lock, then our attempt to clear the buffer results in a deadlock because the thread, which has captured the free lock, is not active.

To reproduce this bug, you can create several threads that will allocate and free memory in the heap while another separate thread will set and remove hooks:

#include &ltwindows.h&gt
#include &ltvector&gt
#include &ltthread&gt
#include &ltchrono&gt
#include &ltiostream&gt
#include "mhook-lib/mhook.h"
 
using namespace std;
using namespace chrono_literals;
 
auto TrueSystemMetrics = GetSystemMetrics;
 
// This is the function that will replace GetSystemMetrics once the hook is in place
ULONG WINAPI HookGetSystemMetrics(IN int index)
{
	MessageBoxW(nullptr, L"test", L"test", 0);
	return TrueSystemMetrics(index);
}
 
int main()
{
	vector&ltthread&gt threadsToTest;
 
	const int kThreadsCount = 100;
	bool testFinished = false;
 
	for (int i = 0; i &lt kThreadsCount; ++i)
	{
    	threadsToTest.push_back(thread([&amp]()
    	{
        	while (!testFinished)
        	{
            	free(malloc(100));
            	this_thread::sleep_for(10ms);
        	}
    	}));
	}
 
	const int kTriesCount = 1000;
	for (int i = 0; i &lt kTriesCount; ++i)
	{
    	Mhook_SetHook((PVOID*)&TrueSystemMetrics, HookGetSystemMetrics);
    	Mhook_Unhook((PVOID*)&TrueSystemMetrics);
 
    	this_thread::sleep_for(10ms);
    	cout &lt&lt "No deadlocks, go stage " &lt&lt i + 1 &lt&lt endl;
	}
 
	testFinished = true;
 
	for (int i = 0; i &lt kThreadsCount; ++i)
	{
    	threadsToTest[i].join();
	}
 
	cout &lt&lt "Test passed" &lt&lt endl;
 
	return 0;
}

Solution

We found several different solutions for this problem. First, we simply moved the buffer clear until after all threads have been resumed. But then we decided to use VirtualAlloc/VirtualFree instead of malloc/free. Since memory allocation when installing the hook occurs only a few times and out of the loop, it doesn’t lead to any measurable performance losses.

Bug: the Hook Leads to the Wrong Function

Eliminating the bug when different hooks lead to the same handler.

Issue Description

When you install hooks for functions from different modules and the distance in memory between these modules exceeds 2GB, the addresses of hook handlers are recorded incorrectly. For example: let’s set the hook for a function from the module 1, then for a function from the module 2 located in the memory at a distance more than 2GB from the first one). Then install two more hooks for the functions from the first module. As a result, last two hooks lead to the same handler, which is wrong.

Causes

In the BlockAlloc function, while adding a new memory block for the module, the allocated memory moves to the cycled list. In the original code, you should not set the pointer to the previous element for the list head. It remains zero.

And then the following happens:

  1. When searching for a free memory block to set the hook, you will find some block. Since the pointer to the previous element is zero, the pointer of the previous element isn’t being overwritten with the current one.
  2. The code adds the current item to the list of memory blocks used for hooks. However, this element remains in the list of free memory blocks for hooks. The next pointer still points at this element from the previous element.
  3. Every time you try to set a hook in the current module, the first memory block you will find is the one from the point 1 of this list. Although, it is assigned for the hook.

Thus, all other hooks in this module will lead to the handler of the last hook set in this module.

Solution

We have replaced the pointer to the previous element for the list head with the pointer to the last element of this list, as it should be in the cyclic list.

Conclusion

With our improvements, we managed to increase mhook’s productivity by 10 times and the speed of the hook setting process by nearly 3 times. In addition, we easily got project files for different IDEs without using Visual Studio and hooked the functions that didn’t contain suitable first 5 bytes for recording a jump to the necessary hook. Furthermore, we eliminated a set of certain bugs such as a deadlock, infinite recursion, and the bug when the hook led to the wrong function.

You can download the improved mhook version by following this link: https://github.com/apriorit/mhook

Apriorit System Programming team will continue supporting it, so create an issue or send your own pull requests to participate in further development of this mhook version.


Subscribe to updates