Subscribe to receive all latest blog updates

This article is intended to be a kind of tutorial for reversers, as its author is one of our software reverse engineers. It describes how to restore classes using software reverse engineering. In our development blog, you can learn more about Windows software reverse engineering and iOS reversing.

You can also get details about code reverse engineering tools.

We hope that this piece of our experience will be really useful for reverse engineering developers. If you get interested in our Software Research and Reverse Engineering services - learn more details here or send us RFP.

It's a common knowledge that there are several misperceptions regarding reverse engineering (its legality and ethics). You can figure it out here, and if you would still have any questions - this will help you.


Classes Restoration as a part of Software Reverse Engineering Process

Classes restoration is a difficult step by step code reversing procedure requiring deep OOP understanding as well as knowledge of its organization for specific compiler. The goal of the following article is to show how to get class, its members and methods. We'll start from Delphi, as it's rather easy to work with classes there.

It was mentioned before, class restoration process is a step by step code reversing. It starts from finding constructor, because the memory for the object is allocated there. In addition, we can learn more details about class considering constructor's components.

Finding constructor in Delphi is quite simple. To do that, we have to find a string containing the class name. Below there is an example of the structure, which we can find for TList:

CODE:0040D598 TList dd offset TList_VTBL
CODE:0040D59C dd 7 dup(0)
CODE:0040D5B8 dd offset aTlist ; "TList"
CODE:0040D5BC SizeOfObject dd 10h
CODE:0040D5C0 dd offset off_4010C8
CODE:0040D5C4 dd offset TObject::SafeCallException
CODE:0040D5C8 dd offset nullsub_8
CODE:0040D5CC dd offset TObject::NewInstance
CODE:0040D5D0 dd offset TObject::FreeInstance
CODE:0040D5D4 dd offset sub_40EA08
CODE:0040D5D8 TList_VTBL dd offset TList::Grow
CODE:0040D5DC dd offset unknown_libname_107
CODE:0040D5E0 aTlist db 5,'TList'

We can describe it as an "object descriptor". We pass the pointer to this structure to the constructor, which receives data needed for object creating. Now we will find all locations where the constructor is called by means of XREF on 40D598. Below you can find an example of described call:

CODE:0040E72E mov eax, ds:TList
CODE:0040E733 call CreateClass
CODE:0040E738 mov ds:dword_4A45F8, eax

We have just guessed the constructor function name. The following function will help us to identify if it is really a CreateClass:

CODE:00402F48 CreateClass proc near ; CODE XREF: @BeginGlobalLoading+17p
CODE:00402F48 ; @CollectionsEqual+48p ...
CODE:00402F48 test dl, dl
CODE:00402F4A jz short loc_402F54
CODE:00402F4C add esp, 0FFFFFFF0h
CODE:00402F4F call __linkproc__ ClassCreate
CODE:00402F54
CODE:00402F54 loc_402F54: ; CODE XREF: CreateClass+2j
CODE:00402F54 test dl, dl
CODE:00402F56 jz short locret_402F62
CODE:00402F58 pop large dword ptr fs:0
CODE:00402F5F add esp, 0Ch
CODE:00402F62
CODE:00402F62 locret_402F62: ; CODE XREF: CreateClass+Ej
CODE:00402F62 retn
CODE:00402F62 CreateClass endp

Thus, if we can see __linkproc__ ClassCreate inside, than we can declare that it is a constructor. Therefore, we can consider how the class creation actually happens:

CODE:00403200 __linkproc__ ClassCreate proc near ; CODE XREF: CreateClass+7p
CODE:00403200 ; sub_40AA58+Ap ...
CODE:00403200
CODE:00403200 arg_0 = dword ptr 10h
CODE:00403200
CODE:00403200 push edx
CODE:00403201 push ecx
CODE:00403202 push ebx
CODE:00403203 call dword ptr [eax-0Ch]
CODE:00403206 xor edx, edx
CODE:00403208 lea ecx, [esp+arg_0]
CODE:0040320C mov ebx, fs:[edx]
CODE:0040320F mov [ecx], ebx
CODE:00403211 mov [ecx+8], ebp
CODE:00403214 mov dword ptr [ecx+4], offset loc_403225
CODE:0040321B mov [ecx+0Ch], eax
CODE:0040321E mov fs:[edx], ecx
CODE:00403221 pop ebx
CODE:00403222 pop ecx
CODE:00403223 pop edx
CODE:00403224 retn
CODE:00403224 __linkproc__ ClassCreate endp

So, the command

CODE:0040E72E mov eax, ds:TList

brings contents into EAX to the TList address (TList_VTBL). As we are working with Delphi, we should use the Borland's __fastcall convention (there is a specific order for parameters passing: EAX, EDX, ECX, stack...). Thus, the CreateClass function receives the pointer to the virtual methods table as the first parameter. After that, we'll not change EAX, and it gets into __linkproc__ClassCreate, and here is what we see:

CODE:00403203 call dword ptr [eax-0Ch]

You could wonder where it is passing. The matter is that, the pointer to TList_VTBL=0х40D5D8 is still located in EAX. 0x40D5D8-0xC=40D5CC, and it is

CODE:0040D5CC dd offset TObject::NewInstance

Above you can see the ancestor's constructor. Thus, TObjectT inherits TList. Let's consider it in details:

CODE:00402F0C TObject::NewInstance proc near ; DATA XREF: CODE:004010FCo
CODE:00402F0C ; CODE:004011DCo ...
CODE:00402F0C push eax
CODE:00402F0D mov eax, [eax-1Ch]
CODE:00402F10 call __linkproc__ GetMem
CODE:00402F15 mov edx, eax
CODE:00402F17 pop eax
CODE:00402F18 jmp TObject::InitInstance
CODE:00402F18 TObject::NewInstance endp

EAX value is still the same, thus 0х40D5D8-0x1C=0x40D5BC. So, the object size, stored in 0x40D5BC, is passing to GetMem

CODE:0040D5BC SizeOfObject dd 10h

Thus, the total size of all object members equals 0x10.

The TObject::InitInstance function does nothing special, it simply populates object members with zeros and sets the pointer value to VTBL to newly created object instance. After that, we exit from CreateClass and return the pointer to the object instance into EAX. Accordingly, that is the way we call constructors:

CODE:0040E72E mov eax, ds:TList
CODE:0040E733 call CreateClass
CODE:0040E738 mov ds:dword_4A45F8, eax

Object structure restoration

We already know that the object size is 0x10. The pointer to VTBL takes 0x4 bytes. Remaining 0xC bytes are left for object members, and we have to find them. We should notice that objects cannot be created without any reason, as well as that members can be filled either in constructor (fully or partly), or after creating by Set-methods. In TList constructor, it is populated with zeros through rep stosd (in TObject::InitInstance). Thus, there is no information about the class members in the constructor. So we are going to trace its lifecycle after the creation.

Below you can find an example, where the pointer to the class instance is driven into the dword_4A45F8 global variable. Thus, we will simply set breakpoint on reading from dword_4A45F8 and look at the object methods being called. First event:

CODE:0041319D mov eax, [ebp+var_4]
CODE:004131A0 mov edx, ds:pTList
CODE:004131A6 mov [eax+30h], edx ; copied a pointer to the instance of an object
CODE:004131A9 jmp short loc_4131BD
.............
CODE:004131BD
CODE:004131BD loc_4131BD: ; CODE XREF: sub_4130BC+EDj
CODE:004131BD xor eax, eax
CODE:004131BF push ebp
CODE:004131C0 push offset loc_413276
CODE:004131C5 push dword ptr fs:[eax]
CODE:004131C8 mov fs:[eax], esp
CODE:004131CB mov eax, [ebp+var_4]
CODE:004131CE mov edx, [eax+18h]
CODE:004131D1 mov eax, [ebp+var_4]
CODE:004131D4 mov eax, [eax+30h] ;?implicit passing of a pointer to the object itself?
CODE:004131D7 call Classes::TList::Add(void *)

Now look into Classes::TList::Add:

CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near
CODE:0040EA28 ; CODE XREF: @RegisterClass+9Bp
CODE:0040EA28 ; @RegisterIntegerConsts+20p ...
CODE:0040EA28 push ebx
CODE:0040EA29 push esi
CODE:0040EA2A push edi
CODE:0040EA2B mov edi, edx
CODE:0040EA2D mov ebx, eax ; a kind of This
CODE:0040EA2F mov esi, [ebx+8] ; addressing to the object member №1
CODE:0040EA32 cmp esi, [ebx+0Ch] ; addressing to the object member №3
CODE:0040EA35 jnz short loc_40EA3D
CODE:0040EA37 mov eax, ebx
CODE:0040EA39 mov edx, [eax] ;addressing to TList->pVTBL
CODE:0040EA3B call dword ptr [edx]
CODE:0040EA3D
CODE:0040EA3D loc_40EA3D: ; CODE XREF: Classes::TList::Add(void *)+Dj
CODE:0040EA3D mov eax, [ebx+4] ; addressing to the object member №2
CODE:0040EA40 mov [eax+esi*4], edi
CODE:0040EA43 inc dword ptr [ebx+8]
CODE:0040EA46 mov eax, esi
CODE:0040EA48 pop edi
CODE:0040EA49 pop esi
CODE:0040EA4A pop ebx
CODE:0040EA4B retn
CODE:0040EA4B __fastcall Classes::TList::Add(void *) endp

Thus, we have found the last 3 members. Their total size is 4 bytes. During IDA Pro class restoration, we will use structures to simplify the work with classes.

After using the structure listed below:

00000000 TList_obj struc ; (sizeof=0X10)
00000000 pVTBL dd ?
00000004 Property1 dd ?
00000008 Property2 dd ?
0000000C Property3 dd ?
00000010 TList_obj ends

everything becomes more understandable:

CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near
CODE:0040EA28 ; CODE XREF: @RegisterClass+9Bp
CODE:0040EA28 ; @RegisterIntegerConsts+20p ...
CODE:0040EA28 push ebx
CODE:0040EA29 push esi
CODE:0040EA2A push edi
CODE:0040EA2B mov edi, edx
CODE:0040EA2D mov ebx, eax
CODE:0040EA2F mov esi, [ebx+TList_obj.Property2]
CODE:0040EA32 cmp esi, [ebx+TList_obj.Property3]
CODE:0040EA35 jnz short loc_40EA3D
CODE:0040EA37 mov eax, ebx
CODE:0040EA39 mov edx, [eax+TList_obj.pVTBL]
CODE:0040EA3B call dword ptr [edx] ;TList::Grow
CODE:0040EA3D
CODE:0040EA3D loc_40EA3D: ; CODE XREF: Classes::TList::Add(void *)+Dj
CODE:0040EA3D mov eax, [ebx+TList_obj.Property1]
CODE:0040EA40 mov [eax+esi*4], edi
CODE:0040EA43 inc [ebx+TList_obj.Property2]
CODE:0040EA46 mov eax, esi
CODE:0040EA48 pop edi
CODE:0040EA49 pop esi
CODE:0040EA4A pop ebx
CODE:0040EA4B retn
CODE:0040EA4B __fastcall Classes::TList::Add(void *) endp

Thus, it is easy to figure out that:

CODE:0040EA3B call dword ptr [edx]

is TList::Grow,

because

CODE:0040D5D8 pVTBL dd offset TList::Grow

Now, let's consider class members in details. Please, take a look at the code below:

CODE:0040EA3D mov eax, [ebx+TList_obj.Property1]
CODE:0040EA40 mov [eax+esi*4], edi
CODE:0040EA43 inc [ebx+TList_obj.Property2]

we see that Property2 is intended to count elements in the list, as it is increased each time new element is added.

As for Property1, it is a pointer to the array of list elements. Property2 is an index of this array. Property3 is the maximum ammount of elements in the list (the TList::Grow method is called only if Property2 == Property3). We figured it out using only our logic. Now, when everything is clear, it's time to search in Help to give names to members:

CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near
CODE:0040EA28 ; CODE XREF: @RegisterClass+9Bp
CODE:0040EA28 ; @RegisterIntegerConsts+20p ...
CODE:0040EA28 push ebx
CODE:0040EA29 push esi
CODE:0040EA2A push edi
CODE:0040EA2B mov edi, edx
CODE:0040EA2D mov ebx, eax
CODE:0040EA2F mov esi, [ebx+TList_obj.Count]
CODE:0040EA32 cmp esi, [ebx+TList_obj.Capacity]
CODE:0040EA35 jnz short loc_40EA3D
CODE:0040EA37 mov eax, ebx
CODE:0040EA39 mov edx, [eax+TList_obj.pVTBL]
CODE:0040EA3B call dword ptr [edx]
CODE:0040EA3D
CODE:0040EA3D loc_40EA3D: ; CODE XREF: Classes::TList::Add(void *)+Dj
CODE:0040EA3D mov eax, [ebx+TList_obj.Items]
CODE:0040EA40 mov [eax+esi*4], edi
CODE:0040EA43 inc [ebx+TList_obj.Count]
CODE:0040EA46 mov eax, esi

CODE:0040EA48 pop edi
CODE:0040EA49 pop esi
CODE:0040EA4A pop ebx
CODE:0040EA4B retn
CODE:0040EA4B __fastcall Classes::TList::Add(void *) endp

As we have restored the class structure, we can start working with its methods.

Class methods restoration

In order to restore software components, we should consider the following method types:

  • public, private, or protected;
  • virtual or non-virtual;
  • static.

Unfortunately, we are not able to find static methods, as they look like common procedures after compilation. Also, we cannot detect affiliation of such function with a specific class. But there is no much sense in this search. If a function is called in the class methods, we'll see it during code extraction. Otherwise, it is gonna be a time wasting. However, we can easily find virtual functions, as all of them are in VTBL.

Another one interesting question is how to find non-virtual methods for step by step class method restoration. According to OOP, when we call object methods, they implicitly receive the pointer to object as the first parameter. Thus, if method was declared as __fastcall, the pointer to the object will be pushed into EAX. However, for the __cdecl or __stdcall methods, it's the first parameter in the stack. And where should the pointer to the object be stored? The answer is simple. In dword_4A45F8. We can simply find a lot of non-virtual methods on XREF to 4A45F8. After that we can set a breakpoint on 4A45F8 and trace the copying of a pointer to the instance in order to find out where other method calls can take place.

As we use global variables, everything looks rather simple in our example. But what should we do in case of using local variable or code being not executed (for example, researching of the driver or not allowed for execution code)? In this case, we need specific approach, consisting of the following steps:

1) Find all points of constructor calls

For each call

2) Trace where the pointer to the object instance is written (local variable)
3) Consider the function that has called the constructor to detect all object methods calls
4) In case of fail, consider the next constructor call; otherwise, look for all XREF to the found method. Thus, we can find calls, which are in the constructor. As we know that the first parameter is the pointer to an object, we can go to each XREF and look where else the pointer to an object was used. Than we'll repeat described algorithm for all code levels, until we reach a method or a deadlock.
5) Review the next found method.

For example, if we have found the Classes::TList::Add method, than on one of the XREF we can find Classes::TList::Add method:

CODE:0040F020 TThreadList::Add proc near ; CODE XREF: TCanvas::`...'+9Ep
CODE:0040F020 ; Graphics::_16725+C4p
CODE:0040F020
CODE:0040F020 var_4 = dword ptr -4
CODE:0040F020
CODE:0040F020 push ebp
CODE:0040F021 mov ebp, esp
CODE:0040F023 push ecx
CODE:0040F024 push ebx
CODE:0040F025 mov ebx, edx
CODE:0040F027 mov [ebp+var_4], eax
CODE:0040F02A mov eax, [ebp+var_4]
CODE:0040F02D call TThreadList::LockList
CODE:0040F032 xor eax, eax
CODE:0040F034 push ebp
CODE:0040F035 push offset loc_40F073
CODE:0040F03A push dword ptr fs:[eax]
CODE:0040F03D mov fs:[eax], esp
CODE:0040F040 mov eax, [ebp+var_4]
CODE:0040F043 mov eax, [eax+4]
CODE:0040F046 mov edx, ebx
CODE:0040F048 call TList::IndexOf
CODE:0040F04D inc eax
CODE:0040F04E jnz short loc_40F05D
CODE:0040F050 mov eax, [ebp+var_4]
CODE:0040F053 mov eax, [eax+4]
CODE:0040F056 mov edx, ebx
CODE:0040F058 call Classes::TList::Add(void *)

Thus, we have found the TList::IndexOf method.

After that, we can see that TList is a member of the TthreadList object method. There is nothing to look at there. We'll assume that there are no more XREFs to Classes::TList::Add. Let's consider the TList::IndexOf method and work with its XREFs.

One of them brings us here:

CODE:0040EE38 TList::Remove proc near ; CODE XREF: TThreadList::Remove+28p
CODE:0040EE38 ; TCollection::RemoveItem+Bp ...
CODE:0040EE38 push ebx
CODE:0040EE39 push esi
CODE:0040EE3A mov ebx, eax
CODE:0040EE3C mov eax, ebx
CODE:0040EE3E call TList::IndexOf
CODE:0040EE43 mov esi, eax
CODE:0040EE45 cmp esi, 0FFFFFFFFh
CODE:0040EE48 jz short loc_40EE53
CODE:0040EE4A mov edx, esi
CODE:0040EE4C mov eax, ebx
CODE:0040EE4E call TList::Delete
CODE:0040EE53
CODE:0040EE53 loc_40EE53: ; CODE XREF: TList::Remove+10j
CODE:0040EE53 mov eax, esi
CODE:0040EE55 pop esi
CODE:0040EE56 pop ebx
CODE:0040EE57 retn
CODE:0040EE57 TList::Remove endp

Thus, TList::Delete and TList::Remove were successfully found.

And so on for all XREFs and variables containing pointers to the class instance.

Below you can find an example of looking through the variable:

CODE:0041319D mov eax, [ebp+var_4]
CODE:004131A0 mov edx, ds:pTList
CODE:004131A6 mov [eax+30h], edx ;a pointer to the instance of an object is being copied
CODE:004131A9 jmp short loc_4131BD

We see below:

CODE:00413236 mov eax, [eax+30h]
CODE:00413239 mov edx, [ebp+var_10]
CODE:0041323C call TList::Get

How can we determine public and private methods? We can start working on that only when all methods are found. Private methods can be called only inside other object methods. Thus, we should consider XREF.

During methods searching procedure, we advise to enumerate them. As soon as you find a method, you should name it Object1::Method1, Object1::Method2, and so on. After detecting all methods, you can start elements type and number restoration.

Determination of the method arguments number

For __cdecl and __stdcall, there are several things to pay attention to. You simply need to know how many of them were detected during IDA Pro class restoration, and subtract the first one (the first one is a pointer to the object instance, and others are method arguments). The situation with is more complicated. First, we need to remember the sequence of arguments: EAX,EDX,ECX,stack.

The analysis begins when we check how many arguments transmitted via stack IDA has detected. If there is at least one, we add 3 to it (3 of them are register's, and one more for stack). As the first argument is allocated for This, we need to subtract it. Thus, we finally get the summary number of arguments.

If there are no stack arguments, we should take a look at the function beginning. In Delphi, each __fastcall function begins with copying from EAX, EDX, and ECX registers in such way:

mov esi, edx ; first parameter
mov ebx, eax ; pThis
mov edi, ecx ; second parameter

According to the number of registers being copied, we can define the number of arguments. For example:

CODE:0040EBE0 TList::Get proc near ; CODE XREF: @GetClass+1Dp
CODE:0040EBE0 ; @UnRegisterModuleClasses+24p ...
CODE:0040EBE0
CODE:0040EBE0 var_4 = dword ptr -4
CODE:0040EBE0
CODE:0040EBE0 push ebp
CODE:0040EBE1 mov ebp, esp
CODE:0040EBE3 push 0
CODE:0040EBE5 push ebx
CODE:0040EBE6 push esi
CODE:0040EBE7 mov esi, edx
CODE:0040EBE9 mov ebx, eax
CODE:0040EBEB xor eax, eax

There are two arguments, one of them is pThis. Thus, TList::Get has one argument.

CODE:004198CC push ebp
CODE:004198CD mov ebp, esp
CODE:004198CF add esp, 0FFFFFF8Ch
CODE:004198D2 push ebx
CODE:004198D3 push esi
CODE:004198D4 push edi
CODE:004198D5 mov [ebp+var_C], ecx
CODE:004198D8 mov [ebp+var_8], edx
CODE:004198DB mov [ebp+var_4], eax

There are three arguments, one of them is pThis, so total amount of arguments is two.

Notice that we restore the number of initial method arguments, described in Delphi, and in IDA, naturally, while declaring the function type we should write all the arguments in consideration with This.

Try to determine the types of arguments on your own :)

Ready to hire experienced reverse engineering team to work on your projects like classes restoration? Just contact us and we will provide you all details!