This article considers common tasks, main tools, and basic principles of how to reverse engineer a software, specifically, Windows software. We also discuss a small step-by-step example of reverse engineering an application to illustrate discribed points.
Development Coordinator of Network Security Team
What is software reversing?
Reverse engineering is extracting the information about the principles of the hardware or software architecture and their internal structure. The phrase that is typical for the reverse engineering is “How does it work?” Obviously, if you have the documentation the whole process becomes much more simple, but it’s often happens that there’s no any documentation and you need find the other way to get the result.
So when we may need the software reverse engineering and how it may help us?
There are a lot of the software reverse engineering examples in computer science field:
- researching the network communication protocol;
- finding the algorithms of the malware such as computer viruses, worms, trojan horses, etc.
- researching the file format for storing any kind of information, for example mail bases and disk images;
- checking the ability of your software to resist reverse engineering;
- improving software compatibility with platforms and third-party software;
- using undocumented platform features;
Let’s see if it is hard and what we need to learn how to reverse engineer a software.
What do we need for reversing?
In order to start reverse engineer software you need to have:
- knowledge in the field where you want to apply the reverse engineering;
- Tools that will allow you to apply your knowledge while trying to get the needed information.
Let’s consider an example that is not connected to the software. Let’s say you have a watch and you want to find out if it’s mechanical or works from the battery. The A section says that you definitely should know that there at least two types of watches - mechanical and digital ones. Besides of that, you probably should know that if there is the battery, it is located inside the watch and you can see it if you open the back lid. You should also have the basic knowledge of the watch’s internal structure, how the battery looks like and what tools you need to open the lid. The B section declares that you need to have a screwdriver or other special tool that will give you the chance to apply your knowledge.
If we talk about software reverse engineering process, we consider the huge variety of different knowledge and tools, which, of course, should fit the field that you perform the reversing in.
Theoretical knowledge. Software reverse engineering process
For different software reverse engineering tasks, you would need different knowledge base.
If you reverse any network applications, you should know the principals of the inter-process communications, the network structure itself, the structure of the network packets that you are about to see, connections and the order you are about to see them, etc.
If you are reversing the crypto algorithms, you should have the knowledge in the crypto science and know the basics or the most popular algorithms that are often used in the field - it may help you to save some extra time while researching.
If you are researching the file structure, you need to know the way how different systems or components work with files and basic file concepts. It would be useful if you have an experience in such tasks, since special techniques may save a lot of time for the researcher while reversing special types of file interaction. For example, making a test that writes to the file unique type values while logging the offsets and size of data written to the actual storage file may help you to find common patterns in calculating the offsets. That will give you the hint of internal structure of the files.
Of course, there is common knowledge that will help in most software reverse engineering tasks - the knowledge of the common application structure, programming languages, compilers, etc.
When the software reverse engineering process starts, developers, as a rule, use the disassembler in order to find the actual algorithms and program logic right in place. There are many executable file formats, different compilers (which give different instructions output comparing to other compilers), various operating systems. These all will not allow you to use one technology for reversing different software types.
What is useful to know for better understanding the decompiled code? The knowledge of the assembler language, function calling conventions, stack structure, stack frames concept, etc.
Knowing the assembler output for different code samples may often help you in searching for original ideas. Let’s consider some examples for the Windows x86 platform.
Let’s say we have the code:
If we compile this code to the executable file, we will see something like this in disassembler:
As we may see, the regular cycle turned into the assembly code with comparisons and jumps. Notice that it does not use regular assembly loop with the counter in the ecx register. In addition, you may see that the local variables here are referred as [ebp-14h] and [ebp-8] accordingly. What will happen if we compile this code using Release build?
Yes. This code piece doesn’t look like the first one at all. This is how the optimization works. The compilers, which we use nowadays, show very good results in optimizing the code. That is why in reverse code engineering it’s often a good idea trying not to get the original code that was written by developer, but to understand the idea of the code, the principles of the code, which you are researching, and then just write your own prototype, if it fits the original task, of course.
The techniques of reversing the class from the assembly code you may find in our publication. The article is based on Delphi because of its simplicity, but you can always try to reverse engineer C++ code in the same way.
It will be very useful to know what you will get in assembly code if you compile different operators, structures and other language constructions. I do not add it to avoid overloading with technical details, but that is a good way to start C++ reverse engineering process, for example.
Tool to use for software reverse engineering
I recommend to read our article that covers Application architecture research. In this article, you may find the description of such tools as ProcessMonitor and ProcessExplorer, that are absolutely indispensable in the process of reverse engineering.
I would like to add something to that list. These tools commonly used for Windows software reverse engineering (if you’d like to learn about tools and details for other platforms, read our article about How to reverse engineer iOS software).
You get more details and usage examples in this Best software reverse engineering tools article.
What is disassembler? It is a program that translates the executable file to the assembly language. The most popular disassembler is IDA Pro.
It’s surely very convenient and very powerful tool for disassembling. It has a huge amount of instruments that allow you to solve the task much faster. It may show the function call tree, may parse import and export of the executable and show the needed information about them, it may even show the code in C, making life much easier for those who is not very good in understanding the assembly.
All tools from sysinternals will be useful in Windows software reverse engineering.
TCPView - network sniffer that will show all information about TCP/UDP packets from all processes. Very good while reversing the network protocols.
TDIMon - looks like TCPView, but monitors operations on sockets level,
PortMon - system physical port monitor. Monitors Serial and Parallel ports and all traffic that is going through them.
WinObj - shows all objects in the system in the hierarchical structure. May be useful while reversing an application that works with the synchronization primitives such as Mutexes, Semaphores, etc, and also while reverse engineering kernel mode drivers.
I would like to mention WireShark as the one of the most powerful network sniffers in the field.
This is very useful tool for discovering which functions (APIs) are called by the analyzed application and what behavior it expects from that functions. This tool has a powerful database and allows you to see calls to the huge amount of API functions of not only kernel32 and ntdll but also COM, managed environment, etc. ApiMon provides very convenient mechanisms for filtering.
If you develop the software, you cannot overestimate the profit from using a debugger and ability to see what’s program doing right now. The same benefit you get while debugging the live applications that you are trying to reverse. So which debugger may be useful in reversing?
There are a lot of them, but the most popular are OllyDBG and WinDBG.
OllyDBG is probably the best debugger that you can find in terms of software reverse engineering. It was built basically for reversing needs and it has all needed tools for that - you have the built-in disassembler with the ability to analyze and identify some key data structures, import and export analysis feature, built-in assembling and patching engine, etc. Availability of parsing the API functions and their parameters makes it very easy to reverse the interaction with the system. The stack view is showing a lot of information about call stack. One more important advantage is that you may use it with the applications that are debug-protected - the situation where the usual debuggers just can’t do anything.
Despite its pretty simple interface, Windbg has very powerful tools for debugging. It has built-in disassembler, big amount of different commands, which allow you to know almost everything about the process/system that you are debugging in and, of course, probably the most valuable thing is availability of the kernel-mode debugging, what is the big advantage to reverse engineer drivers, kernel-mode drivers in particular.
This list is not even close to the list of all available tools for reversing, but you have a chance to discover the ones that you need and that fit your task.
Software reverse engineering example from life
Now we’ll see how to reverse engineer a software using a small example. Let’s say that you have an executable file and it is suspicious. You want to find out what this program does and if it is safe for users.
Considering the original task, it would be a good idea not to run it on you work computer and use the virtual machine instead. Let’s start the application.
As we may see, this process creates a Windows service named TestDriver. This service has the kernel type, so we know it is a driver. But where does it take the driver-file in order to run it? We can use FileMon from Sysinternals to find it out. We open FileMon, set up the filters so it’ll show us only the needed process, and look at its log information.
Now we have the information that this file is being created by the process, which we are reversing, and this process puts this file in the temp directory for its user. There’s no need to go to the temp folder trying to find the file since we see that the process deletes it right after using. So where does the process take it? If it unpacks the file, we may try to find it in the process’s resource section since it’s a common place to store such data. Let’s try it. We will use another tool - Resource Hacker to examine the resources. Let’s run it.
Bingo! As we may see from the found resource content, this is probably the Windows executable file since it has the string “This program cannot be run in DOS mode.” Let’s check if it is our driver file. For that purpose, we extract the resource using Resource Hacker and open it in the disassembler.
As we know - DriverEntry is an entry point for the kernel-mode drivers in Windows systems. We can continue our research since it looks like that we found the right driver.
How to reverse engineer driver
We examine functions that are called from DriverEntry one by one. If we go to the
sub_14005, we find nothing interesting so we continue with
sub_110F0 and find the code:
Some lines are missing in sake of simplicity. So. In the first listing, we see the unicode string creation, and this string points to the C:\hello.txt path. After that, we see the structure OBJECT_ATTRIBUTES being filled with regular values - we know that this structure is often needed while calling functions like ZwCreateFile. And from the second listing, we see that ZwCreateFile is called and that makes us pretty sure that the driver creates the file and we know the place where this file is located after creation.
From the third and fourth listings, we may see that the driver takes the string and writes it to the buffer (it happens in sub_11150 function) and that this buffer is going to be written to the file using ZwWriteFile function. At the end, the driver closes the file using ZwClose API.
Let’s summarize. We found out that the original program extracts the driver file from its resources, puts it in the temp folder for the current user, creates the Windows service for this driver and runs it. After that, the program stops and deletes the service and also deletes the original driver file from the TEMP directory. From this behavior and from analyzing the disassembly, we know that the driver doesn’t do anything except creating the file on the C drive that is named hello.txt and writes there a string “Hello from driver”. We need to check if we are right or not. Let’s run the program and check drive C:
Wonderful! We reversed this simple program and now we know that it’s safe for using.
The result that we got, we may have gotten in a many different ways - using debug, using API mon, writing some tests, etc. This only shows that you may find your own ways how to reverse engineer software as long as it works for you.
Good luck with reversing!