blank Skip to main content

Reverse Engineering IoT Firmware: Where to Start

Internet of Things (IoT) devices are already a significant part of our day-to-day life, work environments, hospitals, government facilities, and vehicle fleets. They are represented by Wi-Fi printers, smart door locks, alarm systems, and so on. In 2020, the average US resident had access to more than ten connected devices. But users who choose IoT devices for their usefulness also need to be sure these devices are secure.

Since IoT devices are usually connected to internal home or corporate networks, compromising such devices can provide criminals with access to the entire system. During the first six months of 2021, there were around 1.5 billion attacks on smart devices, with attackers looking to steal data, mine cryptocurrency, or build botnets.

One way to ensure decent security for IoT devices is to perform reverse engineering activities that will help you better understand the way particular devices are built and allow you to perform further analysis of a device and its firmware.

In this article, we show a practical example of reverse engineering firmware for a smart air purifier, highlighting the importance of researching its architecture. This article will be helpful for development teams working on cybersecurity projects who want to learn about the nuances and steps for reverse engineering IoT devices.

Contents:

The importance of researching the firmware architecture

1. Determine the architecture

2. Choose the disassembler tool

3. Load the firmware

4. Study Xtensa architecture features

5. Reverse engineer Xtensa code in IDA

Conclusion

The importance of researching the firmware architecture

The process of reverse engineering IoT firmware varies significantly depending on the device under research.

IoT devices evolve quite fast, and the dominating architecture in the market changes all the time. Less than ten years ago, the most popular choices were mainly x86 or ARM, and less likely MIPS or PowerPC. But now there are a great variety of microcontroller architectures you need to know to reverse engineer embedded devices: Tricore, rh850, i8051, PowerPC VLE, etc.

Going deep into learning a single architecture isn’t enough to succeed in IoT reverse engineering. And if it’s necessary for developers to start reverse engineering as fast as possible, they should start by learning the basics of the firmware’s architecture and structure.

This is exactly what we want to describe in this article: the way reverse engineers can study new architectures and the format of firmware they have never seen before.

For this article, we used a firmware dump of the Xiaomi Air Purifier 3H. We chose it because it’s a firmware dump of the ESP32 CPU, which is the Tensilica Xtensa architecture. This is a pretty exotic choice of architecture, but it’s common in IoT devices that require Wi-Fi communication. You can find the firmware we will reverse engineer as an example for this article (ESP-32FW.bin) on this GitHub page.

The challenge for this case is that there’s no existing decompiler for the firmware architecture and disassemblers barely support it. However, this is a pretty accurate example of what reverse engineers face nowadays.

The IoT firmware reverse engineering process consists of the following five stages:

Key stages of the reverse engineering process for target firmware

1. Determine the architecture

The first question to ask before reverse engineering IoT devices is how one can know the architecture of the firmware they need to reverse engineer.

The most straightforward way to find out is to read the datasheet for the CPU and learn the answer from there. But there are situations when all you have is the firmware itself. In this case, you can use one of two options:

1. String search may allow you to find some leftover compilation strings that contain information about the compiler name and architecture.

2. Binary pattern search requires you to know instructions that are often used in different types of microcontroller architectures. You can search the firmware for binary patterns common for a specific architecture and then try to load the firmware into a disassembler that supports such an architecture to validate your guess.

Once you determine the architecture type, you can start choosing the toolset for further reversing. For ESP-32FW.bin, we already know that it’s going to be the Tensilica Xtensa architecture, so we need to select the disassembler we’re going to use for the research.

Related services

Professional Reverse Engineering

2. Choose the disassembler tool

After researching an appropriate disassembler that could support Xtensa, we ended up with three options: IDA, Ghidra, and Radare.

We decided to try using Ghidra and IDA first because we already have vast experience successfully applying these tools for different reverse engineering projects. And since IDA doesn’t have a decompiler for Xtensa, only a CPU module for the disassembler, we decided to first try working with Ghidra (we used version 10.0).

Ghidra doesn’t support Xtensa by default, so we needed to install the Tensilica Xtensa module for Ghidra first.

The disassembler for Xtensa works, but there are some issues with the decompiler, as you can see in the screenshot below:

Warning regarding unimplemented instructions in the Ghidra decompiler
Screenshot 1. Warning regarding unimplemented instructions in the Ghidra decompiler

After some time disassembling, we realized that Ghidra’s processor module for Xtensa had trouble determining the instruction length in multiple cases. Therefore, we dropped Ghidra and went to IDA (we used version 7.7).

It was challenging at first to find Xtensa in the list of processor modules, but finally we found it here:

Xtensa in IDA’s list of processor modules
Screenshot 2. Xtensa in IDA’s list of processor modules

The processor module in IDA appeared to be stable enough, so we decided to stick with IDA.

Read also:
9 Best Reverse Engineering Tools for 2021 [Updated]

3. Load the firmware

The first step is to load the firmware to the right image base address so that all of the pointers that are global variables are resolved to valid addresses. To do this, it’s necessary to learn where the code is in the binary.

We start by loading the firmware at the base address 0 and try to mark as much code as possible. To be able to properly mark the code in IDA, we need to learn the typical instruction sequences common to Xtensa firmware. To find out which instructions to use in the function prologs, we took a sample from GitHub: esp8266/Arduino: ESP8266 core for Arduino.

It appears the compiler uses the following instruction: entry a1, XX

This instruction translates into byte sequences such as 36 41 00 / 36 61 00 / 36 81 00 depending on the value of the XX argument.

By implementing a simple IDA script to search for such a pattern, it’s possible to mark about 90% of the code:

Results of marking the code in IDA
Screenshot 3. Results of marking the code in IDA

Once we’ve found the code, it’s time to explore and see whether it looks correct.

Looking at the screenshot below, it’s obvious that something is wrong. The string resources are referenced properly, but call8 instructions point to strings, not the code. And some of the call8 instructions point to non-existent addresses. Usually this means that the image base is wrong and the firmware must be loaded to some other base address, not 0.

Discovering that call8 instructions point to strings and non-existent addresses
Screenshot 4. Discovering that call8 instructions point to strings and non-existent addresses

A common way to determine the base address is to:

  1. Pick a string.
  2. Use the low part of this string’s address to find the code which references to it.
  3. Find the difference between the real string address and the address we see in the code. Thus, we can understand how to shift the address of the code to match the current address of the string.

In this case, we found that the base address must be 0x3F3F0000, but even when using it the call8 instructions are still invalid. This could mean that the binary data is segmented and that the code from the flash memory is being mapped to RAM in pieces. Thus, it will be necessary to split the firmware into pieces and load these pieces into IDA in appropriate segments.

We looked at the strings in the firmware and discovered it was indeed segmented:

Screenshot 5. Proof of firmware segmentation
Screenshot 5. Proof of firmware segmentation

After additional research, we discovered the ESP IDF framework. Since our target firmware contains some version of this framework, we can try to use its source code to learn about the firmware structure.

We found an interesting bootloader_utility_load_partition_table() function in the bootloader_utility.c source code file within ESP IDF, which means the firmware must contain a partition table.

The bootloader_utility_load_partition_table() function that shows the firmware must contain a partition table
Screenshot 6. The bootloader_utility_load_partition_table() function that shows the firmware must contain a partition table

To identify the partition table, we continued exploring the source code and finally found the esp_partition_table_verify() function, which is called by the bootloader_utility_load_partition_table() function:

Discovering the esp_partition_table_verify() function
Screenshot 7. Discovering the esp_partition_table_verify() function

So there must be ESP_PARTITION_MAGIC and ESP_PARTITION_MAGIC_MD5:

#define ESP_PARTITION_MAGIC 0x50AA
#define ESP_PARTITION_MAGIC_MD5 0xEBEB

Binary search for AA 50 gave us good results:

Successful results of binary search for AA 50
Screenshot 8. Successful results of binary search for AA 50

Both ESP_PARTITION_MAGIC and ESP_PARTITION_MAGIC_MD5 can be seen nearby. And most likely sub_3F3F4848 is esp_partition_table_verify().

Since we already know where the esp_partition_table_verify function is, we are able to find the bootloader_utility_load_partition_table function and the ESP_PARTITION_TABLE_OFFSET file offset:

Finding bootloader_utility_load_partition_table and ESP_PARTITION_TABLE_OFFSET
Screenshot 9. Finding bootloader_utility_load_partition_table and ESP_PARTITION_TABLE_OFFSET
Finding the value of the offset
Screenshot 10. Finding the value of the offset

ESP_PARTITION_TABLE_OFFSET is the file offset in the ESP32-FW.bin file. Now we just need to know the structure of the partition table entries. The source code of the ESP IDF framework helps us again:

typedef struct {
    uint32_t offset;
    uint32_t size;
} esp_partition_pos_t;
 
/* Structure which describes the layout of the partition table entry.
 * See docs/partition_tables.rst for more information about individual fields.
 */
typedef struct {
    uint16_t magic;
    uint8_t  type;
    uint8_t  subtype;
    esp_partition_pos_t pos;
    uint8_t  label[16];
    uint32_t flags;
} esp_partition_info_t;

We’ve imported these structures to IDA and applied them to the partition table data:

Importing structures to IDA and applying them to the partition table data
Screenshot 11. Importing structures to IDA and applying them to the partition table data

As you can see, esp_partition_pos_t.offset is the file offset for each partition, and we can now split ESP32-FW.bin into the partitions. 

But how can we load each of the partitions to the appropriate address? It appears there’s an image_load() function that is responsible for mapping the firmware partitions onto address space:

Mapping firmware partitions onto address space
Screenshot 12. Mapping firmware partitions onto address space

And each partition has a header:

typedef struct {
    uint8_t magic;              /*!< Magic word ESP_IMAGE_HEADER_MAGIC */
    uint8_t segment_count;      /*!< Count of memory segments */
    uint8_t spi_mode;           /*!< flash read mode (esp_image_spi_mode_t as uint8_t) */
    uint8_t spi_speed: 4;       /*!< flash frequency (esp_image_spi_freq_t as uint8_t) */
    uint8_t spi_size: 4;        /*!< flash chip size (esp_image_flash_size_t as uint8_t) */
    uint32_t entry_addr;        /*!< Entry address */
    uint8_t wp_pin;            /*!< WP pin when SPI pins set via efuse (read by ROM bootloader,
                                * the IDF bootloader uses software to configure the WP
                                * pin and sets this field to 0xEE=disabled) */
    uint8_t spi_pin_drv[3];     /*!< Drive settings for the SPI flash pins (read by ROM bootloader) */
    esp_chip_id_t chip_id;      /*!< Chip identification number */
    uint8_t min_chip_rev;       /*!< Minimum chip revision supported by image */
    uint8_t reserved[8];       /*!< Reserved bytes in additional header space, currently unused */
    uint8_t hash_appended;      /*!< If 1, a SHA256 digest "simple hash" (of the entire image) is appended after the checksum.
                                 * Included in image length. This digest
                                 * is separate to secure boot and only used for detecting corruption.
                                 * For secure boot signed images, the signature
                                 * is appended after this (and the simple hash is included in the signed data). */
} __attribute__((packed))  esp_image_header_t;

Next, each partition is split into segments. And after the header, you can see a structure that is followed by the actual data:

typedef struct {
    uint32_t load_addr;     /*!< Address of segment */
    uint32_t data_len;      /*!< Length of data */
} esp_image_segment_header_t;

Here, esp_image_segment_header_t.load_addr is the virtual address for the segment data in the CPU address space.

The segments within the partition look like this:

esp_image_header_t

esp_image_segment_header_t

<segment data>

esp_image_segment_header_t

<segment data>

...

Now, having full information about the segments, we can split the partitions into segments and load them to the appropriate addresses in IDA. We can do this extraction work manually or try to automate it via the IDA loader plugin.

Nevertheless, it appears that such a loader is already implemented for Ghidra.

Read also:
How to Reverse Engineer an iOS App and macOS Software

4. Study Xtensa architecture features

Now that we have all the segments loaded to the appropriate addresses, we can start the reverse engineering.

But to do it efficiently, we need to learn more about the Xtensa architecture, including:

  1. Argument order in instructions
  2. Execution specifics of conditional jumps
  3. Compiler calling convention
  4. Stack organization

The first thing to explore is the argument order in instructions. For example: MOV R1, R2. You can find these kinds of instructions in all architectures, yet this may mean either copying R1 to R2 or copying R2 to R1. Thus, it’s crucial to know where the source code is and where the destination register is in the instructions. You can find the Xtensa instruction set description on GitHub.

As for the MOV instruction, in Xtensa, it means that R2 is copied to R1. Thus, the first argument will be the destination in most simple instructions, such as math-related ones. For example, the instruction addi a14, a1, 0x38 would mean that a14 = a1 + 0x38.

But for instructions that store data, it will be the opposite. For example, the instruction s32i.n  a5, a1, 0x10 means that the value of a5 must be stored at the address (a1 + 0x10).

The second thing to learn is the way conditional jumps are done. There are two ways to do it:

  1. Use a dedicated instruction for the comparison operation which sets the flags register and then the conditional jump.
  2. Use a single instruction that does all those actions at once.

Xtensa does the latter: beqz a10, loc_400E1C54

A single instruction is used to check if a10 equals zero, and then it either jumps to loc_400E1C54 or doesn’t.

The third step is to examine the calling convention used by the compiler: the way arguments are passed to the function and how the value is returned.

Xtensa passes arguments in quite an unusual way. Arguments are put into registers before the call instruction. But the registers in which they appear within the function are not the same as those they were in before the call:

Argument indexRegister before the callRegister after the call
0a10a2
1a11a3
2a12a4

Here’s an example of how to pass arguments to a function on the assembler level:

movi.n  a12, 0x14
l32r    a11, off_40080490
mov.n   a10, a1
l32r    a8, memcpy
callx8  a8

Here we have three arguments:

  • a10 is a destination address
  • a11 is a source address
  • a12 is the size to copy

Yet as soon as the code enters the memcpy function, these values are automatically transferred into the a2, a3, and a4 registers respectively.

The same trick is used for returned values. Inside the memcpy function, the value is stored in the a2 register, yet after returning from the function, the value appears in a10.

Here’s what return 0 looks like:

mov.n   a2, 0
retw.n

And this is what checking the returned value looks like:

call8   jsmi_parse_params
bnez.n  a10, loc_400E1B15

benz.n checks the value of the a10 register upon returning from the call.

Finally, it’s necessary to learn how the stack is organized

Xtensa uses the a1 register to create the stack frame. Each function starts with the entry instruction: entry a1,0xC0, where 0xC0 is the size of the stack frame, i.e. the amount of stack the function requires for the stack variables.

And often, the functions start with initializing stack variables:

movi.n  a5, 0
s32i.n  a5, a1, 0x10
s32i.n  a5, a1, 0x14
s32i.n  a5, a1, 0x18
s32i.n  a5, a1, 0x1C
s32i.n  a5, a1, 0x20
s32i.n  a5, a1, 0x24
s32i.n  a5, a1, 0x28
s32i.n  a5, a1, 0x2C
s32i.n  a5, a1, 0x30
s32i.n  a5, a1, 0x34

The zero value from the a5 register is being written in stack variables based on the a1 register.

After gaining all necessary knowledge about the Xtensa architecture, we can finally start reversing its code.

Read also:
How to Reverse Engineer Software (Windows) the Right Way

5. Reverse engineer Xtensa code in IDA

Xtensa isn’t the most popular architecture and doesn’t have a full feature list in contrast to ARM, MIPS, and PowerPC. Therefore, there will be some limitations in the IDA processor module which we need to overcome.

The major limitations of the Xtensa processor module in IDA are:

  • No automatic comments for function arguments
  • Stack frame is not created automatically
  • Some ESP32 functions belong to IROM, so there are calls to hardcoded addresses
  • Some Xtensa instructions are not disassembled

Let’s discuss some tricks to overcome these challenges.

5.1. Type system and comments for function arguments

A type system for Xtensa is available starting from IDA 7.7. Having an available type system in IDA is very important, as it makes reversing convenient. In particular, it allows you to import the definitions of C structures and specify the function prototypes used by IDA to put automatic comments near the instructions that transfer the function arguments.

However, if you don’t have a type system, there’s a workaround.

First, let’s look at what functions look like when there’s a type system:

The way functions look when there’s an available type system
Screenshot 13. The way functions look when there’s an available type system

The function prototype is set with the names and types of the arguments so that IDA can use this information to comment the arguments at the call site:

Function prototype
Screenshot 14. Function prototype

But there will be no such thing for Xtensa. An alternative way is to use the repeatable comments feature in IDA. If you set a repeatable comment at the very beginning of the function, it will be shown at all of the call sites.

Setting a repeatable comment
Screenshot 15. Setting a repeatable comment
The repeatable comment is shown at all of the call sites
Screenshot 16. The repeatable comment is shown at all of the call sites

Thus, we can use this feature to define function arguments:

Using the repeatable comments feature to define function arguments
Screenshot 17. Using the repeatable comments feature to define function arguments

The call site will look like this:

Call site
Screenshot 18. Call site

You may select the register name in the comment and IDA will highlight it in the code. Thus, you can easily find an argument value.

Read also:
The Evolution of Reverse Engineering: From Manual Reconstruction to Automated Disassembling

5.2. Recover the stack frame

To recover the stack frame, you’ll need to manually specify the stack size and then show IDA where it’s used by pressing K at each instruction that works with the stack.

Let’s explore the config_router_safe function, for example:

The config_router_safe function
Screenshot 19. The config_router_safe function

It’s obvious that the stack frame size here is 0xC0. We use this value in the stack settings for the function (Alt+P):

Using the 0xC0 value (the stack frame size)
Screenshot 20. Using the 0xC0 value (the stack frame size)

Visually, nothing will happen, but if you go to the stack frame for the function by pressing Ctrl+K, you’ll notice that stack space is now allocated:

Stack space is allocated
Screenshot 21. Stack space is allocated

The next thing to do is specify the stack shift using the entry instruction. Before doing that, we suggest enabling the stack pointer visualization as shown in the screenshot below:

Enabling the stack pointer visualization
Screenshot 22. Enabling the stack pointer visualization

Now, the code should look like this:

Code after enabling the stack pointer visualization
Screenshot 23. Code after enabling the stack pointer visualization

000 is the current stack pointer shift value, and we need to shift it by 0xC0. To do that, set the cursor at the entry instruction and press Alt+K to see the following window, where you can specify the desired difference between the old and new stack pointer:

Shifting the current stack pointer value by 0xC0
Screenshot 24. Shifting the current stack pointer value by 0xC0

As the result of this operation, the code will look like this:

Code after shifting the current stack pointer shift value
Screenshot 25. Code after shifting the current stack pointer shift value

Now, if you start pressing K at each instruction that works with the a1 register, IDA will create stack variables:

IDA creates new stack variables
Screenshot 26. IDA creates new stack variables

It’s also possible to write an IDA script to automate these actions.

Read also:
Discovering and Mitigating Security Vulnerabilities in Routers: A Practical Guide

5.3. Calls to IROM

It’s not uncommon to see calls into some low-level API situated in the IROM part of the CPU and not in the firmware. In such a case, the firmware is just linked with a special linker definitions file containing defined IROM function addresses.

During reversing, IROM function calls look like this:

IROM function calls
Screenshot 27. IROM function calls

40058E4C is the address within IROM. But it’s impossible to know which function is called from the firmware. So it’s necessary to inspect the ESP32 toolchain to find the linker definitions.

The IDE for the ESP32 chip is Espressif IDE. And searching for the IROM addresses within the IDE files brings us to: C:\Espressif\frameworks\esp-idf-v4.4.2\components\esptool_py\esptool\flasher_stub\ld\rom_32.ld

The ESP32 ROM address table
Screenshot 28. The ESP32 ROM address table

These values can be easily converted into the enum data type:

Converting values into the enum data type
Screenshot 29. Converting values into the enum data type

Then, we need to import into IDA so that enum can be applied to the IROM address values:

Applying enum to the IROM address values
Screenshot 30. Applying enum to the IROM address values

If we add the repeatable comment near the IROM address, it’ll make everything much easier to read:

Code after adding the repeatable comment near the IROM address
Screenshot 31. Code after adding the repeatable comment near the IROM address

Read also:
How to Reverse Engineer a Proprietary File Format: A Brief Guide with Practical Examples

5.4. Unrecognized instructions

It often happens that the processor module has been implemented for some specific variant of the instruction set. And then manufacturers create new CPUs which have a 99% compatible instruction set of over ten new instructions that nobody expected to have initially. So tools like IDA, Ghidra, and Radare may not be able to disassemble some new instructions.

The proper way to overcome this challenge is to extend the processor module and add support for new instructions. This requires profound knowledge of disassembler APIs, which are not that easy to comprehend.

Let’s discuss a possible workaround for a case when you just want IDA to create the function despite the existence of some unrecognized instruction. Say IDA doesn’t know about the RER instruction and fails to create the function in case it contains RER opcodes:

IDA fails to create the function in case it contains RER opcodes
Screenshot 32. IDA fails to create the function in case it contains RER opcodes

You can press P as many times you like. Nothing will happen but errors appearing in the console window:

Errors in the console window
Screenshot 33. Errors in the console window

However, it doesn’t mean that IDA can’t create instructions which follow RER instructions. You can skip three bytes of the RER instruction and create the code afterwards:

Creating code after skipping three bytes of the RER instruction
Screenshot 34. Creating code after skipping three bytes of the RER instruction

Next, you can select the whole piece of code from entry till retw.n and press P:

Selecting the whole piece of code from entry till retw.n
Screenshot 35. Selecting the whole piece of code from entry till retw.n

After that, IDA will create the function:

IDA creates a function
Screenshot 36. IDA creates a function

Usually, extended instructions that were not recognized by the disassembler don’t make too much difference during reversing. What can cause problems are new instructions that perform actions like a call, a jump, or a load/store, as the code flow is lost and the references to data are missing.

Related services

Professional Reverse Engineering

Conclusion

Researching unknown hardware architectures before moving to business logic is essential for projects that involve reverse engineering IoT firmware. Even though it can take reverse engineers a few weeks to learn the architecture, such profound research helps to improve the speed of further work in the long run.

At Apriorit, we have a professional reverse engineering team with rich experience using various reverse engineering tools and techniques. Having expertise in various fields including cybersecurity, cryptography, and embedded software, we can help your business with a reverse engineering project of any complexity.

Enhance your IoT project’s security with professional reverse engineering services from Apriorit’s top engineers. Contact us to start discussing your project!

Tell us about your project

Send us a request for proposal! We’ll get back to you with details and estimations.

By clicking Send you give consent to processing your data

Book an Exploratory Call

Do not have any specific task for us in mind but our skills seem interesting?

Get a quick Apriorit intro to better understand our team capabilities.

Book time slot

Contact us