RISC processors are used in many small devices such as PDA, mobile phones, clever coffee-machines etc. There is a big variety of assemblers for RISC processors, but the most frequent one now is ARM. I am going to talk about ARM 7 since I had dealt with them.

 

Disassembling and Analysis of ARM Processors

Let's begin with ARM architecture. ARM processor has a total of 37 registers: 31 general-purpose 32-bit registers and 6 status registers. Set of available registers depends on processors state. ARM state executes 32-bit instructions, Thumb executes 16-bit ones.

In ARM state 18 registers are available: directly accessible R0-R15, CPSR (current program status register), SPSR (status of saved program). 3 of directly accessible registers can be called service-purpose:

(R13) SP - stack pointer;

(R14) LR - link register, the special register for storage of the return address when procedures are being called. I.e. LR is not saved in the stack - it just lies in the register.

(R15) PC - a pointer to the current command. It is possible to write to it by ordinary mov changing thereby the address of the next command to be executed.

In Thumb state 13 registers are available: R0-R8, R13-R15, CPSR, SPSR. Transition between the states doesn?t change the contents of the registers. Entry into Thumb state can be achieved by executing a BX instruction with the state bit (bit 1) set in the operand register. Entry into APM state can be achieved by executing a BX instruction with the state bit (bit 0) set in the operand register. Set of commands in both states differs, but many commands are still similar. Commands of Thumb state have length of 2 bytes, ARM - 4 bytes. The description of commands of Thumb and ARM states can be taken here.

It's especially interesting that many commands operate with several registers at once. For example:

ADD R3, SP, #4

That maps to

R3:=SP+4

Or, for example, a command of storing the registers to the stack:

PUSH {R2-R4, R7, LR}

It is not an analogue of pushad in x86 assembler. Just in ARM assembler it is possible to push the list of registers onto the stack in such way.

The data in memory can be either little endian (as at Intel) or big endian (as at Motorola). So, while investigating a code it is necessary to be determined with the data type.

There is a pile of compilers for development of programs for ARM:

http://heanet.dl.sourceforge.net/sourceforge/gnude/gnude-arm-win.exe - GNU compiler with all consequences - all through command line + debugging through gdb.

http://www.goldroad.co.uk/grARM.html - unpretentious ARM assembler.

http://www.arm.com/support/downloads/index.html - official tools for ARM's development. Here you can only buy them.

http://www.iar.com/ - alternative to IDA for ARM. 30-day's trial version is offered.

To learn more about tools for Windows, read our Software reverse engineering tools article.

Specificity of ARM assembler generated by C++ ARM compilers.

Naturally, on analysis of different weavings code person faces not with the code written on pure assembler, but with C++ ARM compiler generated on the code, and of course it?s a surprise for those who had accustomed to x86 assembler.

Functions calls

There are no call conventions (cdecl, stdcall and so on) at all! All the functions use the convention similar to Borland's fastcall. I.e. firstly registers, and if it isn't enough of them, parameters are being passed via stack.

For example:

ROM:0001F4E2 MOV R0, SP
ROM:0001F4E4 MOV R2, *6
ROM:0001F4E6 ADD R1, R4, *0
ROM:0001F4E8 BL memcmp

The order of parameters passing maps to registers? numbers, i.e. R0 is the first, R1 is the second, R2 is the third. That is for

int memcmp (
const void *buf1,
const void *buf2,
size_t count
);
buf1 = R0
buf2 = R1
count = R2

value returned by the function is being passed via R0:

ROM:0001F4E2 MOV R0, SP
ROM:0001F4E4 MOV R2, *6
ROM:0001F4E6 ADD R1, R4, *0
ROM:0001F4E8 BL memcmp
ROM:0001F4EC CMP R0, *0
ROM:0001F4EE BNE loc_1F4F4

Here is the call with passing via the stack:

ROM:000BCDEC MOV R2, *0
ROM:000BCDEE STR R2, [SP]
ROM:000BCDF0 MOV R2, *128
ROM:000BCDF2 MOV R3, *128
ROM:000BCDF4 MOV R1, *14
ROM:000BCDF6 MOV R0, *0
ROM:000BCDF8 BL FillBoxColor

So, R0-R3 contain coordinates and the fifth parameter (color) is being stored to the stack.

The number of operands can be determined only analytically, i.e. we have to analyze the function call and its prologue. Partly, info on the arguments quantity can be received reasoning from which registers from function onset are being stored to the stack. For example, in Thumb state the processor operates with registers R0-R7 and service-purpose ones. So, after having noticed a function, which begins with

ROM:00059ADA getTextBounds
ROM:00059ADA PUSH {R4-R7, LR},

you can assume that it gets arguments via R0, R1, R2, R3 and SP. Further on a call:

ROM:0005924E ADD R0, SP, *0x14
ROM:00059250 ADD R1, SP, *0x6C
ROM:00059252 ADD R2, SP, *0x68
ROM:00059254 ADD R3, SP, *0x64
ROM:00059256 BL getTextBounds

we see that only R0-R3 are used. That means that 4 parameters are being passed.

Transitions

As usual, transitions aka jumps can be conditional and unconditional. The transitions themselves can be relative and register. At that, register ones are often used for switching between Thumb/ARM state. Unconditional short transitions are embedded as B command (branch). And long ones - via register transition BX (Branch with exchange). Function calls are being performed via BL (Branch with link), i.e. transition with storing the return address to LR. Also it is possible to change the performance address by writing in PC register:

ADD PC, *0x64

But C compilers usually do not work in such way. They use writing in PC only in branchings.

Branches

Also called switch. They are embedded rather originally:

ROM:0027806E CMP R2, *0x4D; 'M'
ROM:00278070 BCS loc_27807A
ROM:00278072 ADR R3, word_27807C
ROM:00278074 ADD R3, R3, R2
ROM:00278076 LDRH R3, [R3, R2]
ROM:00278078 ADD PC, R3
ROM:0027807A
ROM:0027807A loc_27807A
ROM:0027807A B loc_278766
ROM:0027807C word_27807C DCW 0xAA, 0xBE, 0xC6, 0x180, 0x186; 0
ROM:0027807C DCW 0x190, 0x1A0, 0x1A8, 0x1DE, 0x1E4; 5
ROM:0027807C DCW 0x1B0, 0x212, 0x276, 0x1FE, 0x294; 10

First there is a check of the case number takes place. It must be less than 0x4D. If the case number is higher, switch on default case happens, i.e. on loc_27807A.

Further the address of branches table word_27807C is being taken. In this table lie offsets, not branches addresses! And further on a case index the necessary offset is being extracted and being added to PC. That is for case 0 there will be a switching to the address

0x278078 (current value PC) +0xAA (offset from the table) + 0x4 (!!!) = 0x278126.

We have to add 4 because of ARM processors? characteristics: when an operation with PC register is being performed, the result is higher by 4 (as it is written in documentation - " to ensure it is word aligned ").

Access to memory

In Thumb state processor can address to memory in +/-256 bytes limit. Therefore access to memory occurs not directly, but via register loading. I.e. it is impossible to address directly to 0x974170, but it can be done via the register. For example:

ROM:00277FF6 LDR R0, =unk_974170
ROM:00277FF8 LDR R0, [R0]

We have received value to the address 0x974170. But we haven't finished yet! The address of a variable (0x974170) is stored nearby within the 256 bytes limit:

ROM:00278044 off_278044 DCD unk_974170

That is, in fact, opcode of LDR command contains an operand offset for LDR command relatively to the current address.

There is an artful property of optimization: if any address can be received relatively to another already used in the current function, then it can be get by arithmetic operations or indirect access. It means that if function, for example, wants to use one variable on the address 0x100000, and another one on the address 0x100150, then the compiler can make access either through two separate addresses or through the following code:

LDR R0, =0x100000
ADD R0, *0xFF
ADD R0, *0x51
LDR R0, [R0]

In x86 it would be treated as the reference to a substructure within the other structure. But here we see usual optimization. What for? To minimize access to memory. I.e. arithmetics works faster than data loading. As a matter of fact, the whole ARM assembler code abounds in different register calculations. Actually, as many as 16 registers were made just for this - to address less often to memory and the stack. For this reason stack variables can be met only in very big functions. Working with the stack differs nothing from the analogous procedure in x86.

Code investigation in IDA

On loading ARM binary images it is necessary to load them as binary files since they do not have a unified structure. On loading you have to specify type of the processor. If the processor for which the code was written is absent in the list of processor modules, then you can load an image file and specify the general type of ARM processor (little endian) or ARMB (big endian). Further it is necessary to create ROM and RAM segments. There is no unified approach. This must be done in depending of an image and architecture of each separate ARM processor. For example, for ARM7 the memory card has nearly the next look:

0x0 - 0x8000 of RAM processor
0x8000 - 0x1000000 ROM
0x1000000 - 0x..... - SRAM
(here looking how much of it the device has)

Now we can start the analysis of a code. A point of an input in the weaving code in many devices (in particular, in mobile phones) = 0x8000. The processor starts from ARM state so that a code on the 0x8000 address is equal to the code of ARM state. Processor module IDA is rather primitive and very frequently in attempt of the analysis of such switching, plenty of Thumb code is being transformed in ARM (and on the contrary). Manually to switch a state of a code you can by pressing ALT-G and entering zero in the field Value for ARM state and 1 - for Thumb.

Want to get more reversing tips - read our article about MacOS reverse engineering.

 

Read also: Business Analyst Responsibilities.

Subscribe to updates