Here is the first article on our site. It's written by our reverse engineer and, in fact, is a kind of a lesson for reversers.
This piece of our company experience may be useful for you, that is why this text is placed here. If you would like to use our services in reverse engineering area, please write
to discuss your task.
There are some misperceptions regarding reverse engineering (its legacy, ethic and so on) and you can clear all for yourself here and much more - here
Classes restoration is a complicated procedure which requires knowledge of
OOP and the way this OOP is organized in specific compiler. Our task is to get
class, its methods and members. Let?s begin with Delphi, because it?s relatively
easy to find a class here.
Class restoration starts with looking for constructor, because here is the
memory for object is being allocated and also we can gain some insight into
constructor?s components.
It?s easy to find a constructor in Delphi ? we just need to look for a string
in which the class name occurs. For example, for TList the next structure can be
found:
This is, if we can say so, an ?object descriptor?. Pointer to it is being
passed to the constructor. The constructor takes from it the data required for
object creation. Using XREF on 40D598 we can find all the places where the
constructor is being called. Here is an example of one of such calls:
loads contents into EAX to the address of TList, i.e. it?s TList_VTBL. Since
we use Delphi, here is the Borland?s convention of __fastcall is being used
(parameters are being passed in the next order: EAX, EDX, ECX, stack...). It
means that the pointer to the virtual methods table is being passed to the
function CreateClass as a first parameter. Further EAX is not changing and gets
into __linkproc__ClassCreate, and here we see:
CODE:00403203 call dword ptr [eax-0Ch]
Where is it going? The pointer to TList_VTBL=0х40D5D8 is still lies in EAX.
0x40D5D8-0xC=40D5CC, and this is
CODE:0040D5CC dd offset TObject::NewInstance
This is the ancestor?s constructor. So, TList is inherited by TObject. Let?s
look what is in the depth:
The value of EAX is the same, so 0х40D5D8-0x1C=0x40D5BC.
Thus, the object size which is stored in 0x40D5BC, is being passed into GetMem
CODE:0040D5BC SizeOfObject dd 10h
So, the total size of object members =0x10.
The function TObject::InitInstance doesn?t do anything special, it?s just
stuffs object members with zero and sets the value of pointer to VTBL in the
just created instance of the object. Then the exit from CreateClass will happen
and the pointer to the instance of the object will be returned into EAX. That?s
why the call of constructors looks like:
We have known the object size already. It?s 0x10, where 0x4 bytes were taken
by the pointer to VTBL. But there are 0xC bytes left and they contain object
members, so we need to find them. Here an intuition is required. First of all,
objects can?t be created for no particular reason and members can be filled
either in constructor (fully or partly) or after creating by Set-methods. Our
TList in the constructor is being stuffed with zero through rep stosd (in
TObject::InitInstance). So there is no info about class members in the
constructor. Thus let?s trace life cycle after the creation.
In our example the pointer to the instance of the class is being driven into
global variable dword_4A45F8. So we can just set breakpoint on reading from
dword_4A45F8 and look at how the object methods will be called. First event:
CODE:0041319D mov eax, [ebp+var_4]
CODE:004131A0 mov edx, ds:pTList
CODE:004131A6 mov [eax+30h], edx ; copied a pointer to the instance of an
object
CODE:004131A9 jmp short loc_4131BD
.............
CODE:004131BD
CODE:004131BD loc_4131BD: ; CODE XREF: sub_4130BC+EDj
CODE:004131BD xor eax, eax
CODE:004131BF push ebp
CODE:004131C0 push offset loc_413276
CODE:004131C5 push dword ptr fs:[eax]
CODE:004131C8 mov fs:[eax], esp
CODE:004131CB mov eax, [ebp+var_4]
CODE:004131CE mov edx, [eax+18h]
CODE:004131D1 mov eax, [ebp+var_4]
CODE:004131D4 mov eax, [eax+30h] ;?implicit passing of a pointer to the
object itself?
CODE:004131D7 call Classes::TList::Add(void *)
Now look into Classes::TList::Add:
CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near
CODE:0040EA28 ; CODE XREF: @RegisterClass+9Bp
CODE:0040EA28 ; @RegisterIntegerConsts+20p ...
CODE:0040EA28 push ebx
CODE:0040EA29 push esi
CODE:0040EA2A push edi
CODE:0040EA2B mov edi, edx
CODE:0040EA2D mov ebx, eax ; a kind of This
CODE:0040EA2F mov esi, [ebx+8] ; addressing to the object member №1
CODE:0040EA32 cmp esi, [ebx+0Ch] ; addressing to the object member №3
CODE:0040EA35 jnz short loc_40EA3D
CODE:0040EA37 mov eax, ebx
CODE:0040EA39 mov edx, [eax] ;addressing to TList->pVTBL
CODE:0040EA3B call dword ptr [edx]
CODE:0040EA3D
CODE:0040EA3D loc_40EA3D: ; CODE XREF: Classes::TList::Add(void *)+Dj
CODE:0040EA3D mov eax, [ebx+4] ; addressing to the object member №2
CODE:0040EA40 mov [eax+esi*4], edi
CODE:0040EA43 inc dword ptr [ebx+8]
CODE:0040EA46 mov eax, esi
CODE:0040EA48 pop edi
CODE:0040EA49 pop esi
CODE:0040EA4A pop ebx
CODE:0040EA4B retn
CODE:0040EA4B __fastcall Classes::TList::Add(void *) endp
That is? 3 last members have been found. All of them are of 4 bytes size. To
simplify the work with classes in IDA Pro we use structures. Classes are the
same structures, anyway:)))
we can say that Property2 is a counter for the list elements, because it
increases when an element is added.
And Property1 is a pointer to the array of list elements. Property 2 in
this array is an index. Property 3 is the maximum number of the elements in a
list, as method TList::Grow is being called just when Property2==Property3. We
found out this by using logic. Now, when all is clear, we may look in Help and
give names to the members:
CODE:0040EA48 pop edi
CODE:0040EA49 pop esi
CODE:0040EA4A pop ebx
CODE:0040EA4B retn
CODE:0040EA4B __fastcall Classes::TList::Add(void *) endp
We have restored the structure, let?s look into the class methods.
Looking for the class methods
Methods can be: public/private (protected), virtual/non-virtual and static.
Static methods can?t be found because after the compilation was made they
look like common procedures. Affiliation of such function with a specific class
is also impossible to determine. But is there a sense in such search? If the
function is called somewhere in the class methods, it, anyway, will be viewed
while the code is being extracted. Otherwise, it is wasting of time.
Virtual functions are easy to find to ? they all are in VTBL.
But how we should look for non-virtual ones? Let?s think of OOP: when the
object methods are called, the pointer to the object itself is implicitly passed
to them. In fact, it means that each method accepts the pointer to the object as
its first parameter. I.e., if the method was declared as __fastcall, the pointer
to the object will be pushed into EAX. But for __cdecl or __stdcall methods it?s
the first parameter in the stack. Let?s look on where is the pointer to the
object is stored?absolutely right! In dword_4A45F8. On XREF to 4A45F8 we can
find lots of non-virtual methods. Further we can set a breakpoint on 4A45F8 and
trace the copying of a pointer to the instance to find where else the call of
methods can take place. All is easy in our example, because global variable is
used. But what we should do, if the local variable is used or if the code can?t
be executed (for example, we research driver?s code or the code is not allowed
for execution)? Here we need a specific method.
Step-by-step:
1) we have to find all the points of constructor?s calls
For each call
2) trace where the pointer to the instance of an object is being written
(local variable)
3) looking through the function which has called the constructor for all the
calls of the object methods
4) if there are no such calls, look at the next call of the constructor,
otherwise look for all XREF to the method that had been found. In such way we
can find calls that are not beside the constructor. And, as we know that the
first parameter is the pointer to an object, we can go to each XREF and look
where else the pointer to an object was used. And in such way we are going up
the levels of the code, till we reach a deadlock or the method that had been
found.
5) reviewing the next method that had been found
For example, we have found Classes::TList::Add method. On one of the XREF we
find Classes::TList::Add method here:
Further we see that we are in the method of TthreadList object and TList is
its member. Here we have nothing to look at. Let?s assume that there are no more
XREF to Classes::TList::Add. Go in TList::IndexOf method and look at its XREF.
One of them directs us here:
CODE:0040EE38 TList::Remove proc near ; CODE XREF: TThreadList::Remove+28p
CODE:0040EE38 ; TCollection::RemoveItem+Bp ...
CODE:0040EE38 push ebx
CODE:0040EE39 push esi
CODE:0040EE3A mov ebx, eax
CODE:0040EE3C mov eax, ebx
CODE:0040EE3E call TList::IndexOf
CODE:0040EE43 mov esi, eax
CODE:0040EE45 cmp esi, 0FFFFFFFFh
CODE:0040EE48 jz short loc_40EE53
CODE:0040EE4A mov edx, esi
CODE:0040EE4C mov eax, ebx
CODE:0040EE4E call TList::Delete
CODE:0040EE53
CODE:0040EE53 loc_40EE53: ; CODE XREF: TList::Remove+10j
CODE:0040EE53 mov eax, esi
CODE:0040EE55 pop esi
CODE:0040EE56 pop ebx
CODE:0040EE57 retn
CODE:0040EE57 TList::Remove endp
So, TList::Delete and TList::Remove are found.
And so forth for all XREF and variables that contain a pointer to the
instance of a class.
Here is an example of looking through the variable:
CODE:0041319D mov eax, [ebp+var_4]
CODE:004131A0 mov edx, ds:pTList
CODE:004131A6 mov [eax+30h], edx ;a pointer to the instance of an object is
being copied
CODE:004131A9 jmp short loc_4131BD
How we can identify public or private methods? We can try to do that only
when all the set of methods is found. Private methods are called only inside the
other object methods. I.e. we should look at XREF.
While looking for methods we advise to number them first. It means as you
find the method, you name it Object1::Method1, Object1::Method2 and so on, and
when all the methods are found you may begin restoration of type and number of
elements.
Determination of the number of method' arguments
For __cdecl и __stdcall there are few things to tell about, you just need to
look on how much of them have IDA found and subtract the 1 (i.e. the 1 is a
pointer to the instance of an object, and others are method arguments). There
are more complications for __fastcall. First we need to remember the sequence
order of arguments: EAX,EDX,ECX,stack .
The analysis begins with how much arguments that had been transmitted via
stack does IDA have counted. If there are at least one, we add to it 3 (3
register?s plus the ones for stack). As first argument is allocated for This, we
need to subtract the 1 from the number. The summary value is the net number of
arguments.
If there are no stack arguments, we look at the beginning of the function.
Delphi tries not to spoil arguments values, so each __fastcall function begins
with copying from registers EAX, EDX and ECX in such way:
mov esi, edx ; first parameter
mov ebx, eax ; pThis
mov edi, ecx ; second parameter
Depending on the number of registers that are being copied, one can conclude
what is the number of arguments. For example:
There are 3 arguments, 1 of them is for pThis, so total is 2 arguments.
We should remind you that we restore the number of arguments in initial
method which is described in Delphi, and in IDA, naturally, while declaring the
function type we should write all the arguments in consideration with This.