Guide on IoT Firmware Reverse Engineering [Complete Process]

Internet of Things (IoT) devices are already a significant part of our day-to-day life, work environments, hospitals, government facilities, and vehicle fleets. They are represented by Wi-Fi printers, smart door locks, alarm systems, and so on. In 2020, the average US resident had access to more than ten connected devices. But users who choose IoT devices for their usefulness also need to be sure these devices are secure.

Since IoT devices are usually connected to internal home or corporate networks, compromising such devices can provide criminals with access to the entire system. During the first six months of 2021, there were around 1.5 billion attacks on smart devices, with attackers looking to steal data, mine cryptocurrency, or build botnets.

One way to mitigate security risks of IoT is to perform reverse engineering activities to research the way particular devices are built and perform further analysis of a device and its firmware.

In this article, we show a practical example of reverse engineering firmware for a smart air purifier, highlighting the importance of researching its architecture. This article will be helpful for development teams working on cybersecurity projects who want to learn about the nuances and steps for reverse engineering IoT devices.

The importance of researching the firmware architecture

The process of reverse engineering IoT firmware varies significantly depending on the device under research.

IoT devices evolve quite fast, and the dominating architecture in the market changes all the time. Less than ten years ago, the most popular choices were mainly x86 or ARM, and less likely MIPS or PowerPC. But now there are a great variety of microcontroller architectures you need to know to reverse engineer embedded devices: Tricore, rh850, i8051, PowerPC VLE, etc.

Going deep into learning a single architecture isn’t enough to succeed in IoT reverse engineering. And if it’s necessary for developers to start reverse engineering as fast as possible, they should start by learning the basics of the firmware’s architecture and structure.

This is exactly what we want to describe in this article: the way reverse engineers can study new architectures and the format of firmware they have never seen before.

For this article, we used a firmware dump of the Xiaomi Air Purifier 3H. We chose it because it’s a firmware dump of the ESP32 CPU, which is the Tensilica Xtensa architecture. This is a pretty exotic choice of architecture, but it’s common in IoT devices that require Wi-Fi communication. You can find the firmware we will reverse engineer as an example for this article (ESP-32FW.bin) on this GitHub page.

The challenge for this case is that there’s no existing decompiler for the firmware architecture and disassemblers barely support it. However, this is a pretty accurate example of what reverse engineers face nowadays.

The IoT firmware reverse engineering process consists of the following five stages:

Key stages of the reverse engineering process for target firmware

1. Determine the architecture

The first question to ask before reverse engineering IoT devices is how one can know the architecture of the firmware they need to reverse engineer.

The most straightforward way to find out is to read the datasheet for the CPU and learn the answer from there. But there are situations when all you have is the firmware itself. In this case, you can use one of two options:

1. String search may allow you to find some leftover compilation strings that contain information about the compiler name and architecture.

2. Binary pattern search requires you to know instructions that are often used in different types of microcontroller architectures. You can search the firmware for binary patterns common for a specific architecture and then try to load the firmware into a disassembler that supports such an architecture to validate your guess.

Once you determine the architecture type, you can start choosing the toolset for further reversing. For ESP-32FW.bin, we already know that it’s going to be the Tensilica Xtensa architecture, so we need to select the disassembler we’re going to use for the research.

Plan a new IoT project or want to improve an existing one?

Deliver an advanced solution by leveraging Apriorit’s skills and experience in embedded software development and reverse engineering.

2. Choose the disassembler tool

After researching an appropriate disassembler that could support Xtensa, we ended up with three options: IDA, Ghidra, and Radare.

We decided to try using Ghidra and IDA first because we already have vast experience successfully applying these tools for different reverse engineering projects. And since IDA doesn’t have a decompiler for Xtensa, only a CPU module for the disassembler, we decided to first try working with Ghidra (we used version 10.0).

Ghidra doesn’t support Xtensa by default, so we needed to install the Tensilica Xtensa module for Ghidra first.

The disassembler for Xtensa works, but there are some issues with the decompiler, as you can see in the screenshot below:

*Screenshot 1. Warning regarding unimplemented instructions in the Ghidra decompiler*

After some time disassembling, we realized that Ghidra’s processor module for Xtensa had trouble determining the instruction length in multiple cases. Therefore, we dropped Ghidra and went to IDA (we used version 7.7).

It was challenging at first to find Xtensa in the list of processor modules, but finally we found it here:

*Screenshot 2. Xtensa in IDA’s list of processor modules*

The processor module in IDA appeared to be stable enough, so we decided to stick with IDA.

Read also

9 Best Reverse Engineering Tools for 2023 [Updated]

Discover the top programs for reversing and explore practical examples of using them. Leverage the power of reverse engineering to improve solution’s security, maintain legacy code, and expand software compatibility.

Learn more

9 Best Reverse Engineering Tools for 2023 [Updated]

3. Load the firmware

The first step is to load the firmware to the right image base address so that all of the pointers that are global variables are resolved to valid addresses. To do this, it’s necessary to learn where the code is in the binary.

We start by loading the firmware at the base address 0 and try to mark as much code as possible. To be able to properly mark the code in IDA, we need to learn the typical instruction sequences common to Xtensa firmware. To find out which instructions to use in the function prologs, we took a sample from GitHub: esp8266/Arduino: ESP8266 core for Arduino.

It appears the compiler uses the following instruction: entry a1, XX

This instruction translates into byte sequences such as 36 41 00 / 36 61 00 / 36 81 00 depending on the value of the XX argument.

By implementing a simple IDA script to search for such a pattern, it’s possible to mark about 90% of the code:

*Screenshot 3. Results of marking the code in IDA*

Once we’ve found the code, it’s time to explore and see whether it looks correct.

Looking at the screenshot below, it’s obvious that something is wrong. The string resources are referenced properly, but call8 instructions point to strings, not the code. And some of the call8 instructions point to non-existent addresses. Usually this means that the image base is wrong and the firmware must be loaded to some other base address, not 0.

*Screenshot 4. Discovering that call8 instructions point to strings and non-existent addresses*

A common way to determine the base address is to:

Pick a string.
Use the low part of this string’s address to find the code which references to it.
Find the difference between the real string address and the address we see in the code. Thus, we can understand how to shift the address of the code to match the current address of the string.

In this case, we found that the base address must be 0x3F3F0000, but even when using it the call8 instructions are still invalid. This could mean that the binary data is segmented and that the code from the flash memory is being mapped to RAM in pieces. Thus, it will be necessary to split the firmware into pieces and load these pieces into IDA in appropriate segments.

We looked at the strings in the firmware and discovered it was indeed segmented:

*Screenshot 5. Proof of firmware segmentation*

After additional research, we discovered the ESP IDF framework. Since our target firmware contains some version of this framework, we can try to use its source code to learn about the firmware structure.

We found an interesting bootloader_utility_load_partition_table() function in the bootloader_utility.c source code file within ESP IDF, which means the firmware must contain a partition table.

*Screenshot 6. The bootloader_utility_load_partition_table() function that shows the firmware must contain a partition table*

To identify the partition table, we continued exploring the source code and finally found the esp_partition_table_verify() function, which is called by the bootloader_utility_load_partition_table() function:

*Screenshot 7. Discovering the esp_partition_table_verify() function*

So there must be ESP_PARTITION_MAGIC and ESP_PARTITION_MAGIC_MD5:

#define ESP_PARTITION_MAGIC 0x50AA
#define ESP_PARTITION_MAGIC_MD5 0xEBEB

Binary search for AA 50 gave us good results:

*Screenshot 8. Successful results of binary search for AA 50*

Both ESP_PARTITION_MAGIC and ESP_PARTITION_MAGIC_MD5 can be seen nearby. And most likely sub_3F3F4848 is esp_partition_table_verify().

Since we already know where the esp_partition_table_verify function is, we are able to find the bootloader_utility_load_partition_table function and the ESP_PARTITION_TABLE_OFFSET file offset:

*Screenshot 9. Finding bootloader_utility_load_partition_table and ESP_PARTITION_TABLE_OFFSET*

*Screenshot 10. Finding the value of the offset*

ESP_PARTITION_TABLE_OFFSET is the file offset in the ESP32-FW.bin file. Now we just need to know the structure of the partition table entries. The source code of the ESP IDF framework helps us again:

typedef struct {
    uint32_t offset;
    uint32_t size;
} esp_partition_pos_t;
 
/* Structure which describes the layout of the partition table entry.
 * See docs/partition_tables.rst for more information about individual fields.
 */
typedef struct {
    uint16_t magic;
    uint8_t  type;
    uint8_t  subtype;
    esp_partition_pos_t pos;
    uint8_t  label[16];
    uint32_t flags;
} esp_partition_info_t;

We’ve imported these structures to IDA and applied them to the partition table data:

*Screenshot 11. Importing structures to IDA and applying them to the partition table data*

As you can see, esp_partition_pos_t.offset is the file offset for each partition, and we can now split ESP32-FW.bin into the partitions.

But how can we load each of the partitions to the appropriate address? It appears there’s an image_load() function that is responsible for mapping the firmware partitions onto address space:

*Screenshot 12. Mapping firmware partitions onto address space*

And each partition has a header:

typedef struct {
    uint8_t magic;              /*!< Magic word ESP_IMAGE_HEADER_MAGIC */
    uint8_t segment_count;      /*!< Count of memory segments */
    uint8_t spi_mode;           /*!< flash read mode (esp_image_spi_mode_t as uint8_t) */
    uint8_t spi_speed: 4;       /*!< flash frequency (esp_image_spi_freq_t as uint8_t) */
    uint8_t spi_size: 4;        /*!< flash chip size (esp_image_flash_size_t as uint8_t) */
    uint32_t entry_addr;        /*!< Entry address */
    uint8_t wp_pin;            /*!< WP pin when SPI pins set via efuse (read by ROM bootloader,
                                * the IDF bootloader uses software to configure the WP
                                * pin and sets this field to 0xEE=disabled) */
    uint8_t spi_pin_drv[3];     /*!< Drive settings for the SPI flash pins (read by ROM bootloader) */
    esp_chip_id_t chip_id;      /*!< Chip identification number */
    uint8_t min_chip_rev;       /*!< Minimum chip revision supported by image */
    uint8_t reserved[8];       /*!< Reserved bytes in additional header space, currently unused */
    uint8_t hash_appended;      /*!< If 1, a SHA256 digest "simple hash" (of the entire image) is appended after the checksum.
                                 * Included in image length. This digest
                                 * is separate to secure boot and only used for detecting corruption.
                                 * For secure boot signed images, the signature
                                 * is appended after this (and the simple hash is included in the signed data). */
} __attribute__((packed))  esp_image_header_t;

Next, each partition is split into segments. And after the header, you can see a structure that is followed by the actual data:

typedef struct {
    uint32_t load_addr;     /*!< Address of segment */
    uint32_t data_len;      /*!< Length of data */
} esp_image_segment_header_t;

Here, esp_image_segment_header_t.load_addr is the virtual address for the segment data in the CPU address space.

The segments within the partition look like this:

esp_image_header_t

esp_image_segment_header_t

<segment data>

esp_image_segment_header_t

<segment data>

...

Now, having full information about the segments, we can split the partitions into segments and load them to the appropriate addresses in IDA. We can do this extraction work manually or try to automate it via the IDA loader plugin.

Nevertheless, it appears that such a loader is already implemented for Ghidra.

Read also

How to Reverse Engineer an iOS App and macOS Software

Benefit from reversing activities to research complicated software issues and improve software protection. Learn how to decompile macOS software and iOS apps.

Learn more

4. Study Xtensa architecture features

Now that we have all the segments loaded to the appropriate addresses, we can start the reverse engineering.

But to do it efficiently, we need to learn more about the Xtensa architecture, including:

Argument order in instructions
Execution specifics of conditional jumps
Compiler calling convention
Stack organization

The first thing to explore is the argument order in instructions. For example: MOV R1, R2. You can find these kinds of instructions in all architectures, yet this may mean either copying R1 to R2 or copying R2 to R1. Thus, it’s crucial to know where the source code is and where the destination register is in the instructions. You can find the Xtensa instruction set description on GitHub.

As for the MOV instruction, in Xtensa, it means that R2 is copied to R1. Thus, the first argument will be the destination in most simple instructions, such as math-related ones. For example, the instruction addi a14, a1, 0x38 would mean that a14 = a1 + 0x38.

But for instructions that store data, it will be the opposite. For example, the instruction s32i.n a5, a1, 0x10 means that the value of a5 must be stored at the address (a1 + 0x10).

The second thing to learn is the way conditional jumps are done. There are two ways to do it:

Use a dedicated instruction for the comparison operation which sets the flags register and then the conditional jump.
Use a single instruction that does all those actions at once.

Xtensa does the latter: beqz a10, loc_400E1C54

A single instruction is used to check if a10 equals zero, and then it either jumps to loc_400E1C54 or doesn’t.

The third step is to examine the calling convention used by the compiler: the way arguments are passed to the function and how the value is returned.

Xtensa passes arguments in quite an unusual way. Arguments are put into registers before the call instruction. But the registers in which they appear within the function are not the same as those they were in before the call:

Argument index	Register before the call	Register after the call
0	a10	a2
1	a11	a3
2	a12	a4
	…	…

Here’s an example of how to pass arguments to a function on the assembler level:

movi.n  a12, 0x14
l32r    a11, off_40080490
mov.n   a10, a1
l32r    a8, memcpy
callx8  a8

Here we have three arguments:

a10 is a destination address
a11 is a source address
a12 is the size to copy

Yet as soon as the code enters the memcpy function, these values are automatically transferred into the a2, a3, and a4 registers respectively.

The same trick is used for returned values. Inside the memcpy function, the value is stored in the a2 register, yet after returning from the function, the value appears in a10.

Here’s what return 0 looks like:

mov.n   a2, 0
retw.n

And this is what checking the returned value looks like:

call8   jsmi_parse_params
bnez.n  a10, loc_400E1B15

benz.n checks the value of the a10 register upon returning from the call.

Finally, it’s necessary to learn how the stack is organized.

Xtensa uses the a1 register to create the stack frame. Each function starts with the entry instruction: entry a1,0xC0, where 0xC0 is the size of the stack frame, i.e. the amount of stack the function requires for the stack variables.

And often, the functions start with initializing stack variables:

movi.n  a5, 0
s32i.n  a5, a1, 0x10
s32i.n  a5, a1, 0x14
s32i.n  a5, a1, 0x18
s32i.n  a5, a1, 0x1C
s32i.n  a5, a1, 0x20
s32i.n  a5, a1, 0x24
s32i.n  a5, a1, 0x28
s32i.n  a5, a1, 0x2C
s32i.n  a5, a1, 0x30
s32i.n  a5, a1, 0x34

The zero value from the a5 register is being written in stack variables based on the a1 register.

After gaining all necessary knowledge about the Xtensa architecture, we can finally start reversing its code.

Related project

Developing Software for a Drone Battery Charging and Data Management Unit

Explore the success story of developing an MVP of the drone battery recharging kit: embedded software for the single-board computer, an iOS application, and cloud infrastructure to support the system.

Project details

5. Reverse engineer Xtensa code in IDA

Xtensa isn’t the most popular architecture and doesn’t have a full feature list in contrast to ARM, MIPS, and PowerPC. Therefore, there will be some limitations in the IDA processor module which we need to overcome.

The major limitations of the Xtensa processor module in IDA are:

No automatic comments for function arguments
Stack frame is not created automatically
Some ESP32 functions belong to IROM, so there are calls to hardcoded addresses
Some Xtensa instructions are not disassembled

Let’s discuss some tricks to overcome these challenges.

5.1. Type system and comments for function arguments

A type system for Xtensa is available starting from IDA 7.7. Having an available type system in IDA is very important, as it makes reversing convenient. In particular, it allows you to import the definitions of C structures and specify the function prototypes used by IDA to put automatic comments near the instructions that transfer the function arguments.

However, if you don’t have a type system, there’s a workaround.

First, let’s look at what functions look like when there’s a type system:

*Screenshot 13. The way functions look when there’s an available type system*

The function prototype is set with the names and types of the arguments so that IDA can use this information to comment the arguments at the call site:

But there will be no such thing for Xtensa. An alternative way is to use the repeatable comments feature in IDA. If you set a repeatable comment at the very beginning of the function, it will be shown at all of the call sites.

*Screenshot 15. Setting a repeatable comment*

*Screenshot 16. The repeatable comment is shown at all of the call sites*

Thus, we can use this feature to define function arguments:

*Screenshot 17. Using the repeatable comments feature to define function arguments*

The call site will look like this:

You may select the register name in the comment and IDA will highlight it in the code. Thus, you can easily find an argument value.

Read also

How to Reverse Engineer Software (Windows) the Right Way

Find out how reversing can help you improve your software security and efficiency. Explore a step-by-step example of reverse engineering an application.

Learn more

5.2. Recover the stack frame

To recover the stack frame, you’ll need to manually specify the stack size and then show IDA where it’s used by pressing K at each instruction that works with the stack.

Let’s explore the config_router_safe function, for example:

It’s obvious that the stack frame size here is 0xC0. We use this value in the stack settings for the function (Alt+P):

*Screenshot 20. Using the 0xC0 value (the stack frame size)*

Visually, nothing will happen, but if you go to the stack frame for the function by pressing Ctrl+K, you’ll notice that stack space is now allocated:

*Screenshot 21. Stack space is allocated*

The next thing to do is specify the stack shift using the entry instruction. Before doing that, we suggest enabling the stack pointer visualization as shown in the screenshot below:

*Screenshot 22. Enabling the stack pointer visualization*

Now, the code should look like this:

*Screenshot 23. Code after enabling the stack pointer visualization*

000 is the current stack pointer shift value, and we need to shift it by 0xC0. To do that, set the cursor at the entry instruction and press Alt+K to see the following window, where you can specify the desired difference between the old and new stack pointer:

*Screenshot 24. Shifting the current stack pointer value by 0xC0*

As the result of this operation, the code will look like this:

*Screenshot 25. Code after shifting the current stack pointer shift value*

Now, if you start pressing K at each instruction that works with the a1 register, IDA will create stack variables:

*Screenshot 26. IDA creates new stack variables*

It’s also possible to write an IDA script to automate these actions.

Read also

Discovering and Mitigating Security Vulnerabilities in Routers: A Practical Guide

Ensure reliable and secure work of routers to enhance your embedded software projects by learning the key attack vectors against routers and best practices to secure them.

Learn more

5.3. Calls to IROM

It’s not uncommon to see calls into some low-level API situated in the IROM part of the CPU and not in the firmware. In such a case, the firmware is just linked with a special linker definitions file containing defined IROM function addresses.

During reversing, IROM function calls look like this:

40058E4C is the address within IROM. But it’s impossible to know which function is called from the firmware. So it’s necessary to inspect the ESP32 toolchain to find the linker definitions.

The IDE for the ESP32 chip is Espressif IDE. And searching for the IROM addresses within the IDE files brings us to: C:\Espressif\frameworks\esp-idf-v4.4.2\components\esptool_py\esptool\flasher_stub\ld\rom_32.ld

*Screenshot 28. The ESP32 ROM address table*

These values can be easily converted into the enum data type:

*Screenshot 29. Converting values into the enum data type*

Then, we need to import into IDA so that enum can be applied to the IROM address values:

*Screenshot 30. Applying enum to the IROM address values*

If we add the repeatable comment near the IROM address, it’ll make everything much easier to read:

*Screenshot 31. Code after adding the repeatable comment near the IROM address*

Read also

The Evolution of Reverse Engineering: From Manual Reconstruction to Automated Disassembling

Handle security tasks of any complexity efficiently and quickly by fully automating reverse engineering activities. Discover the key techniques, tools, and methods recommended by our cybersecurity researchers.

Learn more

5.4. Unrecognized instructions

It often happens that the processor module has been implemented for some specific variant of the instruction set. And then manufacturers create new CPUs which have a 99% compatible instruction set of over ten new instructions that nobody expected to have initially. So tools like IDA, Ghidra, and Radare may not be able to disassemble some new instructions.

The proper way to overcome this challenge is to extend the processor module and add support for new instructions. This requires profound knowledge of disassembler APIs, which are not that easy to comprehend.

Let’s discuss a possible workaround for a case when you just want IDA to create the function despite the existence of some unrecognized instruction. Say IDA doesn’t know about the RER instruction and fails to create the function in case it contains RER opcodes:

*Screenshot 32. IDA fails to create the function in case it contains RER opcodes*

You can press P as many times you like. Nothing will happen but errors appearing in the console window:

*Screenshot 33. Errors in the console window*

However, it doesn’t mean that IDA can’t create instructions which follow RER instructions. You can skip three bytes of the RER instruction and create the code afterwards:

*Screenshot 34. Creating code after skipping three bytes of the RER instruction*

Next, you can select the whole piece of code from entry till retw.n and press P:

*Screenshot 35. Selecting the whole piece of code from entry till retw.n*

After that, IDA will create the function:

Usually, extended instructions that were not recognized by the disassembler don’t make too much difference during reversing. What can cause problems are new instructions that perform actions like a call, a jump, or a load/store, as the code flow is lost and the references to data are missing.

Read also

How to Reverse Engineer a Proprietary File Format: A Brief Guide with Practical Examples

Make sure your team knows how to improve your software’s compatibility by finding the way to help it process closed file formats.

Learn more

Conclusion

Researching unknown hardware architectures before moving to business logic is essential for projects that involve reverse engineering IoT firmware. Even though it can take reverse engineers a few weeks to learn the architecture, such profound research helps to improve the speed of further work in the long run.

At Apriorit, we have a professional reverse engineering team with rich experience using various reverse engineering tools and techniques. Having expertise in various fields including cybersecurity, cryptography, and embedded software, we can help your business with a reverse engineering project of any complexity.

Want to enhance your IoT project’s security?

Hire professional reverse engineers and IoT developers from Apriorit to deliver a reliable and protected solution.

Reverse Engineering IoT Firmware: Where to Start

The importance of researching the firmware architecture

1. Determine the architecture

2. Choose the disassembler tool

9 Best Reverse Engineering Tools for 2023 [Updated]

3. Load the firmware

How to Reverse Engineer an iOS App and macOS Software

4. Study Xtensa architecture features

Developing Software for a Drone Battery Charging and Data Management Unit

5. Reverse engineer Xtensa code in IDA

5.1. Type system and comments for function arguments

How to Reverse Engineer Software (Windows) the Right Way

5.2. Recover the stack frame

Discovering and Mitigating Security Vulnerabilities in Routers: A Practical Guide

5.3. Calls to IROM

The Evolution of Reverse Engineering: From Manual Reconstruction to Automated Disassembling

5.4. Unrecognized instructions

How to Reverse Engineer a Proprietary File Format: A Brief Guide with Practical Examples

Conclusion

Protect Sensitive Data Using Code Obfuscation in Android Apps (with a Practical Example)

A Practical Guide to Monitoring and Securing Microsoft 365 Data with Graph API

AI for Dynamic Pricing: Benefits, Use Cases, and Implementation Challenges

Shifting to Post-Quantum Cryptography in Automotive Projects: Reasons, Challenges, Considerations