This article considers common reverse engineering tasks, the main tools for reverse engineering, and basic principles of how to reverse engineer a piece of software, specifically Windows software. We also offer a small step-by-step example of reverse engineering an application to illustrate these points.

Written by:

Sergii Bratus,
Development Coordinator of Network Security Team

What is software reversing?

Reverse engineering is the process of uncovering the principles behind a piece of hardware or software including its architecture and internal structure. The question that drives reverse engineering is “How does it work?” Obviously, if you have documentation the whole process becomes much simpler, but it often happens that there’s no documentation and you need to find another way to learn how a piece of software works.

When might we need to reverse engineer a piece of software and how might doing so help us?

There are many uses of reverse engineering in the field of computer science, including:

  • Researching network communication protocols
  • Finding algorithms for malware such as computer viruses, worms, trojan horses, etc.
  • Researching the file format for storage of any kind of information, for example email databases and disk images
  • Checking the ability of your own software to resist reverse engineering
  • Improving software compatibility with platforms and third-party software
  • Using undocumented platform features

Let’s see how to reverse engineer software.

What do we need for reverse engineering?

To start reverse engineering software, you need to have:

  1. Knowledge in the field where you want to apply the reverse engineering; and
  2. Tools that will allow you to apply your knowledge while trying to get the needed information.

Let’s consider an example that isn't connected to software. Let’s say you have a watch and you want to find out if it’s mechanical or if it runs from a battery. Statement A says that you definitely should know that there are at least two types of watches – mechanical and battery-powered. Besides that, you probably should know that if there's a battery, it's located inside the watch, and that you can see it if you open the back. You should also have some basic knowledge of a watch’s internal structure and know what the battery looks like and what tools you need to open the watch case. Statement B declares that you need to have a screwdriver or other special tool that will give you the chance to apply your knowledge.

Just like reverse engineering a watch requires a specific skillset and tools, software reverse engineering has its own field-specific knowledge and tools.

Theoretical Knowledge. Software Reverse Engineering Process

For different software reverse engineering tasks, you need different types of knowledge.

If you reverse engineer any network applications, you should know the principles of inter-process communications, the structure of networks, connections, network packets, and so on.

If you're reversing cryptographic algorithms, you should have knowledge of cryptography and know the most popular algorithms that are used in the field – this knowledge may help you save time on researching.

If you're researching file structures, you need to know basic file concepts and how different systems or components work with files. Special techniques can save a lot of time while reversing special types of file interactions. For example, making a test that writes unique type values to a file while logging the offsets and size of data written to the actual storage file may help you to find common patterns in calculating offsets. This will give you a hint about the internal structures of these files.

Of course, there's common knowledge that will help in most software reverse engineering tasks – knowledge of common application structures, programming languages, compilers, and so on.

When beginning the proces of reverse engineering software developers, as a rule, use a disassembler in order to find algorithms and program logic in place. There are many different executable file formats, compilers (which give different output one to the next), and operating systems. This diversity of technologies precludes the use of one single technology for reversing all types of software.

To understand the decompiled code you'll want some knowledge of the assembler language, function calling conventions, stack structure, stack frames concept, and more.

Knowing the assembler output for different code samples may often help you in uncovering the original ideas. Let’s consider some examples for the Windows x86 platform.

Let’s say we have the following code:

int count = 0;
for (int i = 0; i < 10; ++i)
{
count++;
}

If we compile this code to an executable file, we'll see something like this in the disassembler:

004113DE loc_4113DE:
004113DE     mov     eax, [ebp-14h]
004113E1     add     eax, 1
004113E4     mov     [ebp-14h], eax
004113E7 loc_4113E7:
004113E7     cmp     [ebp-14h], 0Ah
004113EB     jge     short loc_4113F8
004113ED     mov     eax, [ebp-8]
004113F0     add     eax, 1
004113F3     mov     [ebp-8], eax
004113F6     jmp     short loc_4113DE
004113F8 loc_4113F8:
                

As we can see, this regular cycle turned into the assembly code with comparisons and jumps. Notice that the assembly code doesn't use the regular assembly loop with the counter in the ecx register. In addition, you may notice that the local variables here are referred to as [ebp-14h] and [ebp-8] accordingly. What will happen if we compile this code using the release build?

00401000 main     proc near
00401000     mov     ecx, ds:?cout@std
00401006     push    0Ah
00401008     call    ds:basic_ostream@operator<<(int)
0040100E     xor     eax, eax
00401010     retn
00401010 main     endp

This piece of code doesn’t look like the first one at all. This is how code optimization works. The compilers that we use nowadays are very good at optimizing code. That's why when reverse engineering it’s often a good idea to try to understand the idea behind the code (the principles of the code) rathen than to try getting the original code itself. If you understand the idea behind the code, then you can just write your own prototype that fits the original task.

You can find techniques for reversing a class from assembly code in . That article is based on Delphi because of its simplicity, but you can always try to reverse engineer C++ code in the same way.

It will be very useful to know what assembly code you'll get if you compile different operators, structures and other language constructions. Understanding resultant assembly code is a good way to start the C++ reverse engineering process, but we won't get into technical details of it here.

Tools to use for Reverse Engineering Sofware

I recommend that you read  on application architecture research for a description of such tools as ProcessMonitor and ProcessExplorer, which are absolutely indispensable in the process of reverse engineering.

However, I would like to add a few more tools to the list in that article. These tools are commonly used for Windows software reverse engineering (if you’d like to learn about tools and details for other platforms, read our article about ).

You can get more details and usage examples in our best software reverse engineering tools article.

Disassemblers

What is a disassembler? In short, it's a program that translates an executable file to assembly language. The most popular disassembler is IDA Pro.

IDA Pro screen

IDA Pro is a very convenient and very powerful tool for disassembly. It has a huge number of instruments that allow you to disassemble quickly. It can show the function call tree, parse import and export of the executable, and show information about them; it can even show the code in C, making life much easier for those who aren't very good at understanding assembly.

Sysinternals utils

All tools from sysinternals are useful in reverse engineering Windows software.

TCPView – network sniffer that shows all information about TCP/UDP packets from all processes; very good when reversing network protocols.

TDIMon – works like TCPView, but monitors operations at the socket level.

PortMon – physical system port monitor; monitors serial and parallel ports and all traffic that goes through them.

WinObj – shows all objects in the system in a hierarchical structure; may be useful when reversing an application that works with synchronization primitives such as mutexes and semaphores and also when reverse engineering kernel mode drivers.

I would also like to mention WireShark, one of the most powerful network sniffers.

APIMonitor

APIMonitor screen

APIMonitor is a very useful tool for discovering which functions (APIs) are called by the analyzed application and what behavior the application expects from those functions. This tool has a powerful database and allows you to see calls to the huge number of API functions of not only kernel32 and ntdll, but also COM, managed environment, and more. ApiMon provides very convenient filtering mechanisms.

Debuggers

The profit from using a debugger to see what the program doing right now is invaluable for any software developer. You get same benefit while debugging live applications that you're trying to reverse. Which debugger is most useful for reverse engineering?

There are a lot of them, but the most popular are OllyDbg and WinDbg.

OllyDbg

OllyDBG screen

OllyDbg is probably the best debugger that you can find for reverse engineering software. It was essentially built for the needs of reversing, and has all needed tools for that purpose – including a built-in disassembler with the ability to analyze and identify key data structures, an import and export analysis feature, and a built-in assembling and patching engine. The ability to parse API functions and their parameters makes it very easy to reverse interactions with a system. The stack view is shows a lot of information about the call stack. One more important advantage is that you may use OllyDbg with applications that are debug-protected – the situations where usual debuggers just can’t do anything.

WinDbg

Windbg screen

Despite its simple interface, WinDbg has powerful tools for debugging. It has a built-in disassembler, many different commands that allow you to know almost everything about the process/system that you're debugging and (probably its most valuable feature) the ability to kernel-mode debugging, what is a big advantage when reverse engineering drivers, kernel-mode drivers in particular.

This list is not even close to complete, but you can discover the tools that you need and that fit your task.

Real Life Software Reverse Engineering Example

Now we’ll see how to reverse engineer a software using a small example. Let’s say that you have a suspicious executable file. You want to find out what this program does and if it's safe for users.

Considering the scenario, it would be a good idea not to run this executable on your work computer, but to use a virtual machine instead. Let’s start the application in our virtual machine.

Process creates a service

As we can see, this process creates a Windows service named TestDriver. This service has the type kernel, so we know it's a driver. But where does it take the driver-file from in order to run it? We can use FileMon from Sysinternals to find out. When we open FileMon, we can set up the filters to show us only the needed process, and then look at its log information:

FileMon information

Now we can see that this drive file is being created by the process that we're reversing, and that this process puts this file in the user's temp directory. There’s no need to go to the temp folder trying to find the file since we see that the process deletes it right after use. So what does the process do with this file? If it unpacks the file, we may try to find the file in the process’s resource section, since this is a common place to store such data. Let’s look there. We'll use another tool – Resource Hacker – to examine the resources. Let’s run it:

Examine resources with Resource Hacker

Bingo! As we can see from the found resource content, this is probably the Windows executable file since it has the string “This program cannot be run in DOS mode.” Let’s check if it's our driver file. For that purpose, we extract the resource using Resource Hacker and open it in the disassembler.

Disassembler screen

As we know – DriverEntry is the entry point for kernel-mode drivers in Windows systems. We can continue our research as it looks like we've found the right driver.

How to reverse engineer the driver

To begin reverse engineering the driver, we examine functions that are called from DriverEntry one by one. If we go to the sub_14005, we find nothing interesting so we continue with sub_110F0 and find the code:

Code piece 1

---------------------------------

Code piece 2

---------------------------------

Code piece 3

---------------------------------

Code piece 4

Keep in mind that some lines are omitted here for the sake of simplicity. In the first listing, we see the creation of a unicode string, and this string points to the path C:\hello.txt. After that, we see the structure OBJECT_ATTRIBUTES being filled with regular values – we know that this structure is often needed when calling functions like ZwCreateFile. In the second listing, we see that ZwCreateFile is called, which makes us pretty sure that the driver creates the file – and we know where this file is located after it's created.

From the third and fourth listings, we can see that the driver takes the unicode string and writes it to the buffer (this happens in the sub_11150 function), and that this buffer will be written to the file using the ZwWriteFile function. At the end, the driver closes the file using the ZwClose API.

Let’s summarize. We found out that the original program extracts the driver file from its resources, puts it in the temp folder of the current user, creates the Windows service for this driver and runs it. After that, the program stops and deletes the service and also deletes the original driver file from the temp directory. From this behavior and from analyzing the disassembly, it appears that the driver doesn’t do anything except create the file on the C drive named hello.txt and write the string “Hello from driver”. Now we need to check if we're correct. Let’s run the program and check the C drive:

Application screen

Wonderful! We've reverse engineered this simple program and now we know that it’s safe to use.

We could have achieved this result in a many different ways – using debugging, or API Mon, writing tests, and so on. You can find your own ways to reverse engineer software that work for you.

Good luck with reversing!

Subscribe to updates