Microcontrollers have limited protection against cybersecurity threats and attacks. As a result, the security of Internet of Things (IoT) devices and embedded systems that rely on them can be compromised. To improve the security of IoT devices or their firmware, you need to know exactly what microcontrollers are used. Having this knowledge opens new possibilities […]
Microcontrollers have limited protection against cybersecurity threats and attacks. As a result, the security of Internet of Things (IoT) devices and embedded systems that rely on them can be compromised.
To improve the security of IoT devices or their firmware, you need to know exactly what microcontrollers are used. Having this knowledge opens new possibilities for deeper software analysis and, therefore, more efficient improvement of your solution’s performance and security. In this article, we show a way to identify a microcontroller model using firmware analysis.
This article will be useful for embedded software developers and reverse engineering specialists looking for efficient ways to automate the process of microcontroller identification.
These days, people daily use various embedded systems and IoT devices that are filled with all sorts of microcontrollers. Microcontrollers, or microcontroller units (MCUs), are small integrated circuits designed to execute a specific operation in an embedded system.
Engineers often use MCUs in medical IoT devices, automotive systems, and even the space industry. Whether a device is measuring your heart rate, detecting smoke in the air, or managing the energy consumption of your smart car, there will be a whole set of microcontrollers involved in the process.
A basic microcontroller usually consists of three core components:
- Input/output (I/O) peripherals
Microcontrollers often have limited memory and bandwidth resources and are dedicated to a specific task. Most microcontrollers are custom-made and don’t have a frontend operating system.
Having limited resources, microcontrollers can become an easy target for cybercriminals. That’s why, to ensure an IoT device’s proper protection, it’s vital to address specific vulnerabilities and weaknesses of the microcontrollers the device relies on.
When you have a custom device you’ve built from scratch, you’ll probably know all the ins and outs of it. However, if your project is poorly documented or you have to work with an IoT device you know nothing about, the first task is to identify the models of microcontrollers it uses.
How to identify a microcontroller model?
Trying to distinguish different binaries from one another can be challenging, especially if the binaries use a lot of similar code or perform a similar function. And when it comes to identifying binaries related to microcontrollers, you can easily run into functions performing similar tasks, which can make things even more obscure than in an executable binary.
However, there’s an efficient way to not only identify the model of a microcontroller but to automate the process.
First, let’s take a look at the key steps you need to take in order to identify the model of an MCU:
To automate this process and be able to quickly and easily identify microcontroller models, you need to:
- Automate the generation of the C-style pseudo code. You can do this using tools like IDA-Pro or Ghidra.
- Gather all of the headers for microcontrollers you want to search from your microcontroller’s code or database.
Now let’s see how to identify unknown microcontrollers using this approach.
In this section, we discuss how you can identify the model of a microcontroller by analyzing the hardware addresses of its peripheral devices.
Since peripheral devices use hardware access, you should be able to see access to static hardware addresses in the binary code of the microcontroller, and these addresses should be defined in a C header. Also, because these addresses are hardcoded, they shouldn’t be heavily impacted by the address space layout randomization (ASLR) in the binary.
To follow our guide, you’ll need some knowledge of C, Python, and the following tools:
We’ll be working with the S32 PPC microcontrollers from NXP, specifically the MPC5 timer demo. However, keep in mind that the approach described below can be applied to any MCUs you come across. The only time this approach may turn out to be inefficient is if there’s a microcontroller very similar to the model of your MCU; for instance, if two or more MCU models were made by the same company and have the same peripheral devices.
Now let’s get started.
Take the MPC5 microcontroller demo and look at its source code in IDA-Pro. You should see some references to peripheral devices in a struct-like pattern. When you dive deeper, you will see references to the MPC5744P.h header with such definitions:
These are the hardcoded references to the microcontroller’s peripheral devices.
Did you notice that they are structs? This means they will look slightly different in memory due to padding and alignment.
These struct addresses will be the key source of information for you, so you need to get all of them without spending too much time and effort by doing some parsing.
As finding and copying hardware addresses manually is a tedious and time-consuming task, you can try automating this process.
Since struct addresses are definitions, you can use GCC to parse them with the following command:
contains the path to your microcontroller’s header files.
Executing this command allows for dumping all the #definitions preprocessor values. You can then filter the received information by processing your target structures with the grep utility.
Once you have the output from GCC, you can use regular expressions (regex) to filter the definition names and addresses. In our example, we’ll be working with regex in Python, using the RE Python module.
We chose to work with Python tools because Python is easy to read and intuitive to learn. There’s also no need to learn memory management, in contrast to some other languages, so we can accomplish our task simply and conveniently.
Here’s how you can filter data with regex in Python:
Note: If you run into some scenarios where you need to evaluate definitions (
), you will need a preprocessor. A preprocessor will get the #definitions values and parse out these values pre-compilation so you don’t have to search by hand for them through multiple headers.
Here’s how you can do it using Python’s py-c-preprocessor:
Now, let’s go back to our analysis.
If you have your hardware addresses ordered properly, you should see where the address of each new peripheral device begins and thus be able to find the peripheral struct in the binary.
However, trying to parse the binary in a hex editor might be difficult. Instead, you can open the binary and analyze it in IDA Pro.
Once you open the binary in IDA Pro, you can generate a file with C-style pseudo code by clicking file → produce file → C file. The file with the C-style pseudo code will hold the addresses you’ll need for microcontroller model identification.
Once you have the C file, you can begin searching for the peripheral device addresses. If you look at the file with the C-style pseudo code, you’ll see multiple lines with
addresses. Note that these addresses are slightly off in the pseudo code because of the size of the information being stored.
You should be able to simply load the header file, parse and sort the definitions, and then search for them in the file with C-style pseudo code. As in the previous step, you can use regex to parse the C file and use the parse_memory_locations_from_C_file function to filter out the addresses you need:
Then you need to pass the sorted addresses you pulled from the header file and the addresses found in the C-style pseudo code file to the perform_search function.
Note: Technically, if you know the size of the struct, you can automatically find all structs with the same size at the exact location. However, we won’t describe this process, as it goes beyond the scope of the current guide.
Once you’ve searched for peripheral addresses in the file with the C-style pseudo code, you can try to identify your microcontroller. To do so, compare the hardware addresses in the header of your binary file to the addresses found in the C file. By calculating the weighted score for the different microcontrollers, you’ll be able to identify the one you’re dealing with.
Here are the results from our example:
The microcontroller with the highest weighted score is the one you’re looking for. In our case, it’s the MPC5744P.h microcontroller.
To see the full code for the script used in our guide, go to the Apriorit GitHub page.
Microcontrollers are a vital part of today’s IoT devices. Knowing which microcontrollers a particular device contains is necessary to effectively improve the device’s performance and security.
With reverse engineering tools and techniques, developers can automate the process of microcontroller identification using firmware analysis, saving their time and effort for more complex and important tasks.