|
RISC processors are used in many small devices such as PDA, mobile phones, clever coffee-machines etc. There is a big variety of assemblers for RISC processors, but the most frequent one now is ARM. I am going to talk about ARM 7 since I had a deal with them.
Disassembling and The Analysis of ARM Processors
Let's begin with ARM architecture. ARM processor has a total of 37 registers:
31 general-purpose 32-bit registers and 6 status registers. Set of available
registers depends on processors state. ARM state executes 32-bit instructions,
Thumb executes 16-bit ones.
In ARM state 18 registers are available: directly accessible R0-R15, CPSR
(current program status register), SPSR (status of saved program). 3 of directly
accessible registers can be called service-purpose:
(R13) SP - stack pointer;
(R14) LR - link register, the special register for storage of the return
address when procedures are being called. I.e. LR is not saved in the stack - it
just lies in the register.
(R15) PC - a pointer to the current command. It is possible to write to it by
ordinary mov changing thereby the address of the next command to be executed.
In Thumb state 13 registers are available: R0-R8, R13-R15, CPSR, SPSR. Transition between the states doesn?t change the contents of the registers. Entry into Thumb state can be achieved by executing a BX instruction with the
state bit (bit 1) set in the operand register. Entry into APM state can be
achieved by executing a BX instruction with the state bit (bit 0) set in the
operand register. Set of commands in both states differs, but many commands are still similar.
Commands of Thumb state have length of 2 bytes, ARM - 4 bytes. The description
of commands of Thumb and ARM states can be taken
here.
It's especially interesting that many commands operate with several registers
at once. For example:
ADD R3, SP, #4
That maps to
R3:=SP+4
Or, for example, a command of storing the registers to the stack:
PUSH {R2-R4, R7, LR}
It is not an analogue of pushad in x86 assembler. Just in ARM assembler it is
possible to push the list of registers onto the stack in such way.
The data in memory can be either little endian (as at Intel) or big endian
(as at Motorola). So, while investigating a code it is necessary to be
determined with the data type.
There is a pile of compilers for development of programs for ARM:
http://heanet.dl.sourceforge.net/sourceforge/gnude/gnude-arm-win.exe - GNU
compiler with all consequences - all through command line + debugging through
gdb.
http://www.goldroad.co.uk/grARM.html - unpretentious ARM assembler.
http://www.arm.com/support/downloads/index.html - official tools for ARM?s
develpment. Here you can only buy them.
http://www.iar.com/ - alternative to IDA for ARM. 30-day's trial version is
offered.
Specificity of ARM assembler generated by C++ ARM compilers.
Naturally, on analysis of different weavings code person faces not with the
code written on pure assembler, but with C++ ARM compiler generated on the code,
and of course it?s a surprise for those who had accustomed to x86 assembler.
Functions calls
There are no call conventions (cdecl, stdcall and so on) at all! All the
functions use the convention similar to Borland's fastcall. I.e. firstly
registers, and if it isn't enough of them, parameters are being passed via
stack.
For example:
ROM:0001F4E2 MOV R0, SP
ROM:0001F4E4 MOV R2, *6
ROM:0001F4E6 ADD R1, R4, *0
ROM:0001F4E8 BL memcmp
The order of parameters passing maps to registers? numbers, i.e. R0 is the
first, R1 is the second, R2 is the third. That is for
int memcmp (
const void *buf1,
const void *buf2,
size_t count
);
buf1 = R0
buf2 = R1
count = R2
value returned by the function is being passed via R0:
ROM:0001F4E2 MOV R0, SP
ROM:0001F4E4 MOV R2, *6
ROM:0001F4E6 ADD R1, R4, *0
ROM:0001F4E8 BL memcmp
ROM:0001F4EC CMP R0, *0
ROM:0001F4EE BNE loc_1F4F4
Here is the call with passing via the stack:
ROM:000BCDEC MOV R2, *0
ROM:000BCDEE STR R2, [SP]
ROM:000BCDF0 MOV R2, *128
ROM:000BCDF2 MOV R3, *128
ROM:000BCDF4 MOV R1, *14
ROM:000BCDF6 MOV R0, *0
ROM:000BCDF8 BL FillBoxColor
So, R0-R3 contain coordinates and the fifth parameter (color) is being stored
to the stack.
The number of operands can be determined only analytically, i.e. we have to
analyze the function call and its prologue. Partly, info on the arguments
quantity can be received reasoning from which registers from function onset are
being stored to the stack. For example, in Thumb state the processor operates
with registers R0-R7 and service-purpose ones. So, after having noticed a
function, which begins with
ROM:00059ADA getTextBounds
ROM:00059ADA PUSH {R4-R7, LR},
you can assume that it gets arguments via R0, R1, R2, R3 and SP. Further on a
call:
ROM:0005924E ADD R0, SP, *0x14
ROM:00059250 ADD R1, SP, *0x6C
ROM:00059252 ADD R2, SP, *0x68
ROM:00059254 ADD R3, SP, *0x64
ROM:00059256 BL getTextBounds
we see that only R0-R3 are used. That means that 4 parameters are being
passed.
Transitions
As usual, transitions aka jumps can be conditional and unconditional. The
transitions themselves can be relative and register. At that, register ones are
often used for switching between Thumb/ARM state. Unconditional short
transitions are embedded as B command (branch). And long ones - via register
transition BX (Branch with exchange). Function calls are being performed via BL
(Branch with link), i.e. transition with storing the return address to LR. Also
it is possible to change the performance address by writing in PC register:
ADD PC, *0x64
But C compilers usually do not work in such way. They use writing in PC only
in branchings.
Branches
Also called switch. They are embedded rather originally:
ROM:0027806E CMP R2, *0x4D; 'M'
ROM:00278070 BCS loc_27807A
ROM:00278072 ADR R3, word_27807C
ROM:00278074 ADD R3, R3, R2
ROM:00278076 LDRH R3, [R3, R2]
ROM:00278078 ADD PC, R3
ROM:0027807A
ROM:0027807A loc_27807A
ROM:0027807A B loc_278766
ROM:0027807C word_27807C DCW 0xAA, 0xBE, 0xC6, 0x180, 0x186; 0
ROM:0027807C DCW 0x190, 0x1A0, 0x1A8, 0x1DE, 0x1E4; 5
ROM:0027807C DCW 0x1B0, 0x212, 0x276, 0x1FE, 0x294; 10
First there is a check of the case number takes place. It must be less than
0x4D. If the case number is higher, switch on default case happens, i.e. on
loc_27807A.
Further the address of branches table word_27807C is being taken. In this
table lie offsets, not branches addresses! And further on a case index the
necessary offset is being extracted and being added to PC. That is for case 0
there will be a switching to the address
0x278078 (current value PC) +0xAA (offset from the table) + 0x4 (!!!) =
0x278126.
We have to add 4 because of ARM processors? characteristics: when an
operation with PC register is being performed, the result is higher by 4 (as it
is written in documentation - " to ensure it is word aligned ").
Access to memory
In Thumb state processor can address to memory in +/-256 bytes limit.
Therefore access to memory occurs not directly, but via register loading. I.e.
it is impossible to address directly to 0x974170, but it can be done via the
register. For example:
ROM:00277FF6 LDR R0, =unk_974170
ROM:00277FF8 LDR R0, [R0]
We have received value to the address 0x974170. But we haven't finished yet!
The address of a variable (0x974170) is stored nearby within the 256 bytes
limit:
ROM:00278044 off_278044 DCD unk_974170
That is, in fact, opcode of LDR command contains an operand offset for LDR
command relatively to the current address.
There is an artful property of optimization: if any address can be received
relatively to another already used in the current function, then it can be get
by arithmetic operations or indirect access. It means that if function, for
example, wants to use one variable on the address 0x100000, and another one on
the address 0x100150, then the compiler can make access either through two
separate addresses or through the following code:
LDR R0, =0x100000
ADD R0, *0xFF
ADD R0, *0x51
LDR R0, [R0]
In x86 it would be treated as the reference to a substructure within the
other structure. But here we see usual optimization. What for? To minimize
access to memory. I.e. arithmetics works faster than data loading. As a matter
of fact, the whole ARM assembler code abounds in different register
calculations. Actually, as many as 16 registers were made just for this - to
address less often to memory and the stack. For this reason stack variables can
be met only in very big functions. Working with the stack differs nothing from
the analogous procedure in x86.
Code investigation in IDA
On loading ARM binary images it is necessary to load them as binary files since
they do not have a unified structure. On loading you have to specify type of the
processor. If the processor for which the code was written is absent in the list
of processor modules, then you can load an image file and specify the general
type of ARM processor (little endian) or ARMB (big endian). Further it is
necessary to create ROM and RAM segments. There is no unified approach. This
must be done in depending of an image and architecture of each separate ARM
processor. For example, for ARM7 the memory card has nearly the next look:
0x0 - 0x8000 of RAM processor
0x8000 - 0x1000000 ROM
0x1000000 - 0x..... - SRAM (here looking how much of it the device has)
Now we can start the analysis of a code. A point of an input in the weaving
code in many devices (in particular, in mobile phones) = 0x8000. The processor
starts from ARM state so that a code on the 0x8000 address is equal to the code
of ARM state. Processor module IDA is rather primitive and very frequently in
attempt of the analysis of such switching, plenty of Thumb code is being
transformed in ARM (and on the contrary). Manually to switch a state of a code
you can by pressing ALT-G and entering zero in the field Value for ARM state and
1 - for Thumb. |