Have you ever felt a desire to take some mechanism apart to find out how it works? That desire is the leading force in reverse engineering. This skill is useful for analyzing product security, finding out the purpose of a suspicious .exe file without running it, recovering lost documentation, developing a new solution based on legacy software, etc.
Reverse engineering is critical in cybersecurity, software development, and legacy system modernization. By analyzing software at a low level, businesses can identify security vulnerabilities, ensure software compatibility, recover lost documentation, or even develop new solutions based on existing technologies.
However, performing reverse engineering efficiently requires deep expertise, specialized tools, and adherence to legal and ethical standards.
In this article, you’ll learn how to use reverse engineering to enhance system security, migrate your software, or optimize your app’s performance. We’ll explore key methodologies for reverse engineering Windows software and demonstrate them with a practical step-by-step example of analyzing a Windows application.
What is Windows reverse engineering?
Reverse engineering Windows applications is a complex and highly specialized task that uncovers the internal structure of a piece of hardware or software.
Businesses usually turn to reverse engineers when they don’t have proper documentation of their own software or can’t access the source code of third-party software. Reverse engineers can take compiled code and dissect it to help complete the following tasks:
- Enhance security – Reverse engineering your Windows software can help you identify vulnerabilities and check if your software is prone to breaches or malicious reverse engineering. Additionally, reverse engineering malware like viruses, trojans, and ransomware is a good way to learn how it works and develop effective countermeasures.
- Ensure software compatibility – Reverse engineering allows you to analyze software dependencies, internal structures, and undocumented platform features to make sure your Windows application is compatible with new operating system versions or third-party integrations. By dissecting communication protocols and file formats, you can modernize legacy applications without full redevelopment.
- Maintain regulatory compliance – In finance, the public sector, the healthcare sector, and many other spheres, businesses are required to maintain a deep understanding of how their software operates to meet security and compliance requirements. Reverse engineering can help to uncover hidden security issues, validate security measures, and analyze encryption methods, all of which are critical for complying with laws, regulations, and standards like HIPAA, the GDPR, and PCI DSS.
- Protect intellectual property – Reverse engineering allows businesses to detect unauthorized modifications to their software and verify the integrity of proprietary code. This can help to prevent copyright infringement or find out if such infringement has occured.
- Extend software lifespans – Understanding the internal architecture of legacy applications can help you maintain, upgrade, or migrate software without full redevelopment. Reverse engineering helps you extract logic from undocumented systems, research file formats storing critical data (such as email databases or disk images), and recover lost functionality in order to continue business operations.
- Optimize performance – Reverse engineering provides deep insights into software execution and allows your team to identify performance bottlenecks. By analyzing binary code, your team can optimize resource use and improve system reliability without rewriting the entire application.
While reverse engineering can help your business tackle all these tasks, your team may face some challenges along the way.
First, you must pay attention to the legal boundaries of reverse engineering to make sure that you are not breaking any laws. Many end-user license agreements (EULAs) restrict reverse engineering, but laws such as the US Digital Millennium Copyright Act permit it for improving compatibility with other products. Legal requirements vary across jurisdictions, so your reverse engineering team must ensure compliance before initiating any reverse engineering project.
Second, Windows applications can be difficult to analyze. Reverse engineering Windows software requires extensive knowledge of Windows OS internals, including system calls, memory management, and APIs.
Additionally, software protection mechanisms like obfuscation, anti-debugging, and virtualization can make reverse engineering more difficult and time-consuming if a specialist doesn’t know how to bypass them.
In the next section, we explore the specific knowledge needed to perform reverse engineering tasks effectively and the tools that make this process possible. We also overview the most important techniques used for analyzing and modifying Windows applications at the binary level.
Need to enhance your software’s capabilities?
Reach out to our reverse engineering team and learn how to make your product more efficient, secure, and connected.
What do you need for Windows reverse engineering?
Imagine you’re examining a watch to determine whether it’s mechanical, quartz, or automatic. Understanding the field means knowing these watch types, recognizing that a quartz watch has a battery, and being familiar with a watch’s internal structure. Applying this knowledge requires the right tools (such as a screwdriver) to open the watch and inspect its components.
Similarly, reverse engineering a piece of software requires specialized knowledge and tools. Your team should not only have expertise in application structures, programming languages, and compilers but also know how to solve specific reverse engineering tasks using specific tools. This often demands practical experience with various tools, as well as a deep theoretical understanding of areas like malware analysis, network protocols, and file formats.
To illustrate the depth of knowledge required for reverse engineering, let’s look at some key tasks and the expertise they demand:
Table 1. Reverse engineering tasks and knowledge required
Task | Required expertise | Business impact |
---|---|---|
Reverse engineering network applications | Advanced knowledge in inter-process communication, network protocols, packet analysis, and data exchange patterns | Uncovers vulnerabilities in communication channelsEnsures software adheres to industry-standard protocols |
Decrypting cryptographic algorithms | Expertise in cryptography and algorithms such as RSA, AES, and hashing techniques | Ensures data securityIdentifies weaknesses in encrypted systems |
Researching and analyzing file structures | Deep insight into file systems and understanding of how software interacts with stored data | Helps identify malwareUncovers the internal structure of files |
Special techniques can save a lot of time when reversing special types of software. For example, if your team deals with file interactions, making a test that writes unique type values to a file while logging the offsets and data size to the actual storage file may help them find common patterns in offsets. This will hint at the internal structures of these files.
But specialized knowledge and techniques are not enough. When starting the process of reverse engineering, software developers often use dedicated tools, such as a disassembler, to reveal the underlying algorithms and program logic embedded within the software. Disassembly allows your team to examine the assembly instructions of a compiled executable, offering insights into how the software functions at the lowest level. This foundational knowledge is essential for understanding the behavior of the software in greater detail.
In the next section, we explore why disassembling and decompiling are the key skills for reverse engineering.
Read also
The Evolution of Reverse Engineering: From Manual Reconstruction to Automated Disassembling
Learn about the evolution of reverse engineering tools, from manual reconstruction to automated disassembling.
How to analyze executables: disassembly and decompiling techniques
The first thing a reverse engineer usually does with a piece of software is reconstruct the code that has been compiled. This helps them better understand the program’s internals and identify potential vulnerabilities and behaviors.
There are many different executable file formats, operating systems, and compilers that give different outputs. This diversity of technologies requires reverse engineers to have expertise in various techniques and know when to use them depending on the type of software.
To understand decompiled code, a reverse engineer needs knowledge of assembly language, function calling conventions, how the call stack works, and the concept of stack frames.
Knowing the assembler output for different code samples may help your reversing team in uncovering the original functionality. Let’s consider some reverse engineering examples for the Windows x86 platform.
Let’s say we have the following code:
int count = 0;
for (int i = 0; i < 10; ++i)
{
count++;
}
std::cout << count;
If we compile this code to an executable file, we’ll see this in the disassembler:
004113DE loc_4113DE:
004113DE mov eax, [ebp-14h]
004113E1 add eax, 1
004113E4 mov [ebp-14h], eax
004113E7 loc_4113E7:
004113E7 cmp [ebp-14h], 0Ah
004113EB jge short loc_4113F8
004113ED mov eax, [ebp-8]
004113F0 add eax, 1
004113F3 mov [ebp-8], eax
004113F6 jmp short loc_4113DE
004113F8 loc_4113F8:
004113F8 mov ecx, ds:?cout@std
004113FE push eax
00411400 call ds:basic_ostream@operator<<(int)
00411404 xor eax, eax
00411406 retn
As we can see, the regular cycle has turned into assembly code with comparisons and jumps. Notice that the assembly code doesn’t use the regular assembly loop with the counter in the ecx register. In addition, local variables here are referred to as [ebp-14h] and [ebp-8], respectively.
Let’s see what happens if we compile this code using the release build:
00401000 main proc near
00401000 mov ecx, ds:?cout@std
00401006 push 0Ah
00401008 call ds:basic_ostream@operator<<(int)
0040100E xor eax, eax
00401010 retn
00401010 main endp
This compiled code doesn’t look anything like the assembly code. That’s because of how the code was optimized. Technically, the loop was removed, since it’s not doing anything valuable other than incrementing the count variable to 10. So the optimizer decided just to keep the final value of the count variable and place that value directly as an argument for the count output operator.
The compilers that we use nowadays are very good at optimizing code. That’s why when reverse engineering it’s better to understand the idea behind the code (the principles of the code) rather than to try getting the original code itself. If you understand the idea behind the code, you can write your own version that fits the original task.
In most cases, a reverse engineer needs to understand how compilers work to be able to analyze and reconstruct code written for any type of processor architecture. This is because disassemblers typically only show the assembly instructions, which can be difficult to understand and analyze directly.
However, advancements in tools like IDA, Ghidra, and Radare allow for decompilation into a pseudo-C representation. Pseudo-C code resembles C code, but it might not be compilable due to architecture-specific details or optimizations made by the compiler. However, it offers a much clearer understanding of the program’s logic compared to raw assembly instructions. This simplifies decompilation by providing a higher-level view of the original source code.
Consider the following code snippet written for Windows:
#include <windows.h>
#include <stdio.h>
struct MyDate
{
WORD wYear;
WORD wMonth;
WORD wDay;
};
__declspec(noinline) void CheckTheDate(MyDate & date)
{
if (date.wYear >= 2024 && date.wMonth == 1 && date.wDay == 1)
printf("It's a new year, let's celebrate");
}
int main()
{
SYSTEMTIME sysTime = {0};
GetSystemTime(&sysTime);
MyDate date = {sysTime.wYear, sysTime.wMonth, sysTime.wDay};
CheckTheDate(date);
return 0;
}
Compiling this code in release mode and loading the executable into a decompiler like IDA results in pseudo-C code that partially resembles the original source. IDA will represent it like this:
int __fastcall main(int argc, const char **argv, const char **envp)
{
__int16 v4[4]; // [rsp+20h] [rbp-28h] BYREF
struct _SYSTEMTIME SystemTime; // [rsp+28h] [rbp-20h] BYREF
SystemTime = 0i64;
GetSystemTime(&SystemTime);
v4[0] = SystemTime.wYear;
v4[1] = SystemTime.wMonth;
v4[2] = SystemTime.wDay;
sub_140001070(v4);
return 0;
}
The decompiler can recognize the _SYSTEMTIME structure defined in the Windows header file. However, for programmer-defined structures like MyDate, it might misinterpret the data as an array. In this case, IDA sees v4 in the main function as an integer array (__int16 v4) instead of the intended MyDate structure.
In order to fix this, IDA allows you to create user-defined structures. Let’s add the following definition:
struct MyDate
{
WORD year;
WORD month;
WORD day;
};
After that, we will change the type of v4 from array of int to MyDate. The pseudo-code of main() will start looking like this:
int __fastcall main(int argc, const char **argv, const char **envp)
{
MyDate v4; // [rsp+20h] [rbp-28h] BYREF
struct _SYSTEMTIME SystemTime; // [rsp+28h] [rbp-20h] BYREF
SystemTime = 0i64;
GetSystemTime(&SystemTime);
v4.year = SystemTime.wYear;
v4.month = SystemTime.wMonth;
v4.day = SystemTime.wDay;
sub_140001070(&v4);
return 0;
}
v4 is not an array anymore, so we can continue into sub_140001070.
__int64 __fastcall sub_140001070(_WORD *a1)
{
__int64 result; // rax
result = 2024i64;
if ( *a1 >= 0x7E8u && a1[1] == 1 && a1[2] == 1 )
return sub_140001010("It's a new year, let's celebrate");
return result;
}
This demonstrates how decompiler output relies on the underlying assembly instructions. The decompiler analyzes instructions like cmp word ptr [rcx+2], 1. It also has no information about the type of date, so it decides that there’s some work being done with the WORD array.
But after we change the definition of the function from __int64 __fastcall sub_140001070(_WORD *a1) to __int64 __fastcall sub_140001070(MyDate *date), the code will start looking like this:
__int64 __fastcall sub_140001070(MyDate *date)
{
__int64 result; // rax
result = 2024i64;
if ( date->year >= 0x7E8u && date->month == 1 && date->day == 1 )
return sub_140001010("It's a new year, let's celebrate");
return result;
}
By providing additional information like structure definitions, the reverse engineer guides the decompiler towards a more accurate representation of the initial source code.
Modern reverse engineering is less about working over the assembly code and more about reconstructing pseudo-C into real source code. Yet, to accurately reverse engineer software, your team still needs to understand how exactly the decompiler decides to generate pseudo-C from the specific assembly instructions and what these instructions could have been in the initial source code.
It will be very useful to know what assembly code you’ll get if you compile different operators, structures, and other language constructions. Understanding the resultant assembly code is a good way to start the C++ reverse engineering process, but we won’t get into the technical details here.
Read also
How to Reverse Engineer an iOS App: Tips and Tools
Explore helpful tools and techniques for decompiling iOS apps to enhance their security and reliability.
How to reverse engineer a Windows app: A practical example
Now, we’ll see an example of how to reverse engineer a piece of software. Let’s imagine you have a suspicious executable file. You need to find out what this program does and if it’s safe for users.
Considering the risks, it’s best not to run this executable directly on your main operating system. Instead, use a virtual machine, which provides an isolated environment that helps limit potential damage. Let’s start the application in our virtual machine.
As we can see, this executable file creates a Windows service named TestDriver. It has the kernel type, so we know it’s a driver. But where does it take the driver file from in order to run? We can use ProcessMonitor from Sysinternals Suite to find out. When we open ProcessMonitor, we can set up filters to show us only the file activity from the process we’re interested in. Its activity log looks like this:
The driver file is created by the process that we’re reversing, and this process puts this file in the user’s temp directory. There’s no need to look for the file in the temp folder, since we see that the process deletes it right after use.
So what does the process do with this file? If it unpacks it, we may try to find it in the process’s resource section, since this is a common place to store such data. Let’s look there.
We’ll use another tool — Resource Hacker — to examine the resources. Let’s run it:
Bingo! As we can see from the found resource content, this is probably a Windows executable file, since it starts with an MZ signature and has the string “This program cannot be run in DOS mode.” Let’s check if it’s our driver file. For that purpose, we extract the resource using Resource Hacker, store it as a file, and open it in the disassembler.
INIT: 0001403E ; NTSTATUS _stdcall DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath)
INIT: 0001403E public DriverEntry
INIT: 0001403E DriverEntry proc near
INIT: 0001403E DriverObject = dword ptr 8
INIT: 0001403E RegistryPath = dword ptr 0Ch
INIT: 0001403E
INIT: 0001403E mov edi, edi
INIT: 00014040 push ebp
INIT: 00014041 mov ebp, esp
INIT: 00014043 call sub_14005
INIT: 00014048 pop ebp
INIT: 00014049 jap sub_110F0
INIT: 00014049 DriverEntry endp
As we know, DriverEntry is the entry point for kernel-mode drivers in Windows systems, and IDA identified that the file we extracted from the resources section is indeed an executable and has the DriverEntry function. We can continue our research, as it looks like we’ve found the driver used by the application.
Related project
Reversing Device Firmware to Obtain the Screen Mirroring Protocol
Discover how reverse engineering our client’s original firmware allowed us to reconstruct the secure connection process without original documentation and enable cross-platform compatibility for video mirroring.
How to reverse engineer a Windows driver
To begin reverse engineering the driver, we examine functions that are called from DriverEntry one by one. If we go to sub_14005, we find nothing interesting, since it just sets up the security cookie variable, so we continue to sub_110F0 and find this code:
0001102C push offset SourceString; "\\DosDevices\\C:\\hello.txt"
00011031 lea ecx, [ebp+DestinationString]
00011034 push есх ; DestinationString
00011035 call ds:RtlInitUnicodeString
0001103B mov [ebp+ObjectAttributes.Length], 18h
06011042 mov [ebp+ObjectAttributes.RootDirectory], 0
00011049 mov [ebp+ObjectAttributes.Attributes], 240h
6601185S lea edx, [ebp+DestinationString]
00911053 mov [ebp+ObjectAttributes.ObjectNane], edx
00011056 mov [ebp+ObjectAttributes.SecurityDescriptor], 0
6601185D mov [ebp+ObjectAttributes.SecurityQualityOfService], 0
0001109C push eax ; FileHandle
0001109D call ds:ZwCreateFile
000110B4 push offset aHelloFromDrive ; "Hello from driver\r\n"
000110B9 push 1Eh ; int
000110BB lea ecx, [ebp+Buffer]
000110BE push ecx ; char *
000110BF call sub_11150
00011120 call ds:ZwWriteFile
80011126 mov [ebp+var_2C], eax
90011129 mov ecx, [ebp+Handle]
8001112C push ecx ; Handle
0001112D call ds:ZwClose
Some lines are omitted here for the sake of simplicity, but after the ZwClose call, the driver code just exits without registering any callbacks. This means it doesn’t have any logic besides what we see in the DriverEntry function.
In the first code snippet, a unicode string is created that points to the path C:\hello.txt. After that, the OBJECT_ATTRIBUTES structure is filled with regular values. We know that this structure is often needed when calling functions like ZwCreateFile.
In the second listing, we see that ZwCreateFile is indeed called, which makes us pretty sure that the driver creates the file — and we know where this file is located after it’s created.
From the third and fourth listings, we can see that the driver takes the unicode string and writes it to the buffer (this happens in the sub_11150 function) and that the buffer is written to the file using the ZwWriteFile function. At the end, the driver closes the file using the ZwClose API.
Let’s summarize. We found out that the original program extracts the driver file from its resources, puts it in the temp folder of the current user, creates the Windows service for this driver, and runs it. After that, the program stops and deletes the service and the original driver file from the temp directory.
From this behavior and from analyzing the disassembly, it appears that the driver doesn’t do anything except create a file on the C drive named hello.txt and write the string “Hello from driver”.
Now we need to check if we’re correct. Let’s run the program and check the C drive:
Wonderful! We’ve reverse engineered this simple computer program, and now we know that it’s safe to use.
As you can see, to successfully reverse engineer Windows software, your team needs to know where to look and also have experience with various tools. In the next section, we explore how our experienced reverse engineers can help you dissect your own software for your own business purposes.
Read also
Best Reverse Engineering Tools
Get a list of time-proven reverse engineering tools that the Apriorit team uses for different projects.
How Apriorit can help
At Apriorit, we bring over 20 years of experience in reverse engineering various types of software, including Windows applications. Our team of reverse engineers can help you uncover vulnerabilities, optimize your legacy Windows systems, restore documentation, get access to lost source code, and give you a roadmap to making your app compatible with modern operating system versions. We provide a comprehensive range of reverse engineering-related services that include:
- Static and dynamic code analysis. We can disassemble and decompile Windows executable files (EXE, DLL) to analyze assembly code or reconstructed high-level code. This helps to uncover hidden algorithms, program logic, and potential vulnerabilities, providing you with a detailed understanding of how software works.
- Malware reverse engineering. If you suspect your software has been compromised, our reverse engineers can help you identify malicious code, trace the origins of the attack, and dissect the malware to recommend an effective strategy for protecting your software in the future.
- Vulnerability assessment. Our team specializes in identifying and analyzing security vulnerabilities within Windows applications. By reverse engineering your software, we can uncover potential exploits, malware entry points, and weaknesses. As a result, you’ll get a report with recommendations for eliminating these vulnerabilities.
- Software debugging and patching. If you need to modify or repair a Windows application, our experts can debug compiled code and implement necessary patches. We can fix bugs, resolve incompatibilities, and enhance functionality without requiring access to the original source code.
- Anti-reverse engineering analysis. We can help you protect your application from malicious reverse engineering. Our team will assess whether your code uses appropriate anti-reversing mechanisms like encryption and obfuscation. If it doesn’t, we can implement them.
- Legacy system integration. If you are dealing with legacy systems, we can reverse engineer outdated Windows applications to ensure compatibility with modern platforms, new operating system versions, and third-party integrations.
If you don’t know where to start, our experts can help you define your goals and choose relevant approaches. We’ll also give you a detailed breakdown of the timeline, costs, and resources required for reverse engineering.
Conclusion
Reverse engineering Windows software requires solid technical background and reverse programming experience. In order to perform reverse engineering, your team needs to combine skills in disassembling, network monitoring, debugging, API integration, programming in several languages, working with compilers, and more.
Each of these activities requires specific tools, so your team needs to know when and how to use them to get the optimal results. Reverse engineers also have to be very careful when reversing software in order not to break copyright laws or harm your system. Gaining this kind of expertise in your in-house team can take years, which is why outsourcing reverse engineering tasks can save you much time and many resources.
At Apriorit, we have an experienced team of reverse engineers. Whether you’re looking to improve your program’s security, ensure compatibility, or simply analyze your code structure, our reversing team is prepared to meet the challenge.
Need a team of ethical reverse engineers?
Leverage our skilled reverse engineering team to make your software more secure, resilient, and reliable for your users!