How to Identify a Microcontroller Model Using Firmware Analysis

Microcontrollers have limited protection against cybersecurity threats and attacks. As a result, the security of Internet of Things (IoT) devices and embedded systems that rely on them can be compromised.

To improve the security of IoT devices or their firmware, you need to know exactly what microcontrollers are used. Having this knowledge opens new possibilities for deeper software analysis and, therefore, more efficient improvement of your solution’s performance and security. In this article, we show a way to identify a microcontroller model using firmware analysis.

This article will be useful for embedded software developers and reverse engineering specialists looking for efficient ways to automate the process of microcontroller identification.

What a microcontroller is and how to identify one

These days, people daily use various embedded systems and IoT devices that are filled with all sorts of microcontrollers. Microcontrollers, or microcontroller units (MCUs), are small integrated circuits designed to execute a specific operation in an embedded system.

Engineers often use MCUs in medical IoT devices, automotive systems, and even the space industry. Whether a device is measuring your heart rate, detecting smoke in the air, or managing the energy consumption of your smart car, there will be a whole set of microcontrollers involved in the process.

A basic microcontroller usually consists of three core components:

Processor
Memory
Input/output (I/O) peripherals

Microcontrollers often have limited memory and bandwidth resources and are dedicated to a specific task. Most microcontrollers are custom-made and don’t have a frontend operating system.

Having limited resources, microcontrollers can become an easy target for cybercriminals. That’s why, to ensure an IoT device’s proper protection, it’s vital to address specific vulnerabilities and weaknesses of the microcontrollers the device relies on.

When you have a custom device you’ve built from scratch, you’ll probably know all the ins and outs of it. However, if your project is poorly documented or you have to work with an IoT device you know nothing about, the first task is to identify the models of microcontrollers it uses.

How to identify a microcontroller model?

Trying to distinguish different binaries from one another can be challenging, especially if the binaries use a lot of similar code or perform a similar function. And when it comes to identifying binaries related to microcontrollers, you can easily run into functions performing similar tasks, which can make things even more obscure than in an executable binary.

However, there’s an efficient way to not only identify the model of a microcontroller but to automate the process.

First, let’s take a look at the key steps you need to take in order to identify the model of an MCU:

how to identify the model of a microcontroller

To automate this process and be able to quickly and easily identify microcontroller models, you need to:

Automate the generation of the C-style pseudo code. You can do this using tools like IDA-Pro or Ghidra.
Gather all of the headers for microcontrollers you want to search from your microcontroller’s code or database.

Now let’s see how to identify unknown microcontrollers using this approach.

Need help with a non-trivial IT project?

Let Apriorit’s embedded software developers and reverse engineering specialists help you overcome all technical challenges for your product.

Identifying the MCU model based on hardware addresses

In this section, we discuss how you can identify the model of a microcontroller by analyzing the hardware addresses of its peripheral devices.

Since peripheral devices use hardware access, you should be able to see access to static hardware addresses in the binary code of the microcontroller, and these addresses should be defined in a C header. Also, because these addresses are hardcoded, they shouldn’t be heavily impacted by the address space layout randomization (ASLR) in the binary.

To follow our guide, you’ll need some knowledge of C, Python, and the following tools:

We’ll be working with the S32 PPC microcontrollers from NXP, specifically the MPC5 timer demo. However, keep in mind that the approach described below can be applied to any MCUs you come across. The only time this approach may turn out to be inefficient is if there’s a microcontroller very similar to the model of your MCU; for instance, if two or more MCU models were made by the same company and have the same peripheral devices.

Now let’s get started.

1. Analyze the microcontroller source code

Take the MPC5 microcontroller demo and look at its source code in IDA-Pro. You should see some references to peripheral devices in a struct-like pattern. When you dive deeper, you will see references to the MPC5744P.h header with such definitions:

#define SRAM0_START 0x40000000UL;
#define ADC_0 (*(volatile struct ADC_tag *) 0xFBE00000UL)
#define ADC_1 (*(volatile struct ADC_tag *) 0xFFE04000UL)
#define ADC_2 (*(volatile struct ADC_tag *) 0xFBE08000UL)
#define ADC_3 (*(volatile struct ADC_tag *) 0xFFE0C000UL)

These are the hardcoded references to the microcontroller’s peripheral devices.

Did you notice that they are structs? This means they will look slightly different in memory due to padding and alignment.

These struct addresses will be the key source of information for you, so you need to get all of them without spending too much time and effort by doing some parsing.

How to Reverse Engineer a Proprietary File Format: A Brief Guide with Practical Examples

Discover how to improve your software’s compatibility by finding a way for processing closed file formats.

Learn more

2. Parse and sort information from headers

As finding and copying hardware addresses manually is a tedious and time-consuming task, you can try automating this process.

Since struct addresses are definitions, you can use GCC to parse them with the following command:

Bash

gcc -I -E -dM /path/to/file

where

Bash

/path/to/file

contains the path to your microcontroller’s header files.

Executing this command allows for dumping all the #definitions preprocessor values. You can then filter the received information by processing your target structures with the grep utility.

Once you have the output from GCC, you can use regular expressions (regex) to filter the definition names and addresses. In our example, we’ll be working with regex in Python, using the RE Python module.

We chose to work with Python tools because Python is easy to read and intuitive to learn. There’s also no need to learn memory management, in contrast to some other languages, so we can accomplish our task simply and conveniently.

Here’s how you can filter data with regex in Python:

Python

def strip_defs_addrs(self, definitions_list):
    jlist = []
    
    for definition in definitions_list:
        variable_name = re.findall("(?<=#define )\w+", definition)
        address = re.findall("0[xX][0-9a-fA-F]+(?:[-'!` ]?[0-9a-fA-F]+)", definition)
    
        if not address:
        continue
    
        jlist.append({'definition': variable_name[0], 'address': address[0]})
    
    return jlist

Note: If you run into some scenarios where you need to evaluate definitions (

#define SRAM0_START BASE_ADDR + SOMEADDR;

), you will need a preprocessor. A preprocessor will get the #definitions values and parse out these values pre-compilation so you don’t have to search by hand for them through multiple headers.

Here’s how you can do it using Python’s py-c-preprocessor:

Python

from preprocessor import Preprocessor
p = Preprocessor()
p.include('/path/to/your/header.h')
print(p.expand('SRAM0_START'))

Now, let’s go back to our analysis.

3. Dump the binary to C-style pseudo code

If you have your hardware addresses ordered properly, you should see where the address of each new peripheral device begins and thus be able to find the peripheral struct in the binary.

However, trying to parse the binary in a hex editor might be difficult. Instead, you can open the binary and analyze it in IDA Pro.

Once you open the binary in IDA Pro, you can generate a file with C-style pseudo code by clicking file → produce file → C file. The file with the C-style pseudo code will hold the addresses you’ll need for microcontroller model identification.

Related project

Developing and Supporting a CRM System for a Medical Transportation Company

Explore the success story of improving data security and ensuring compliance with industry requirements for our client’s CRM system.

Project details

4. Search for the peripheral addresses in the pseudo code

Once you have the C file, you can begin searching for the peripheral device addresses. If you look at the file with the C-style pseudo code, you’ll see multiple lines with

MEMORY[ 0xDEADBEEF] = 0

addresses. Note that these addresses are slightly off in the pseudo code because of the size of the information being stored.

You should be able to simply load the header file, parse and sort the definitions, and then search for them in the file with C-style pseudo code. As in the previous step, you can use regex to parse the C file and use the parse_memory_locations_from_C_file function to filter out the addresses you need:

def parse_memory_locations_from_C_file(self, c_file):
   regex_addr_definition = "MEMORY\[(.*)\] = \d"
findings = []
 
   with open(c_file, "r") as code_file:
       lines = code_file.readlines()
 
       for line in lines:
           address = re.findall(regex_addr_definition, line)
 
           if not address:
               continue
 
findings.append(address[0])
 
       return findings

Then you need to pass the sorted addresses you pulled from the header file and the addresses found in the C-style pseudo code file to the perform_search function.

def perform_search(self, sorted_header_definitions, found_psuedo_code_addrs):
   finding_count = 0
 
for idx, definition in enumerate(sorted_header_definitions):
 
       for finding in found_psuedo_code_addrs:
           try:
               int_mem_def = int(sorted_header_definitions[idx]['address'], 16)
               next_int_mem_def = int(sorted_header_definitions[idx + 1]['address'], 16)
               int_finding = int(finding, 16)
 
               if int_finding > int_mem_def:
                   if int_finding < next_int_mem_def:
                       print("Finding in pseudo code: ", finding, "perf",
sorted_header_definitions[idx]['definition'], " between ",
sorted_header_definitions[idx]['address'], " and ",
sorted_header_definitions[idx + 1]['address'])
                       self.total_finding_score += 1
finding_count += 1
break
 
except IndexError as error:
               pass
 
return finding_count

Note: Technically, if you know the size of the struct, you can automatically find all structs with the same size at the exact location. However, we won’t describe this process, as it goes beyond the scope of the current guide.

The Evolution of Reverse Engineering: From Manual Reconstruction to Automated Disassembling

Handle security tasks of any complexity efficiently and quickly by fully automating reverse engineering activities. Discover the key techniques, tools, and methods recommended by our cybersecurity researchers.

Learn more

5. Identify the microcontroller model

Once you’ve searched for peripheral addresses in the file with the C-style pseudo code, you can try to identify your microcontroller. To do so, compare the hardware addresses in the header of your binary file to the addresses found in the C file. By calculating the weighted score for the different microcontrollers, you’ll be able to identify the one you’re dealing with.

Here are the results from our example:

Matches found for MPC5744P.h
Finding in pseudo code: 0xFC040010 perf INTC_0 between 0xFC040000 and 0xFC050000
Finding in pseudo code: 0xFC050010 perf SWT_0 between 0xFC050000 and 0xFC068000
Finding in pseudo code: 0xFC068018 perf STM_0 between 0xFC068000 and 0xFC07C000
Finding in pseudo code: 0xFFFB0108 perf PLLDIG between 0xFFFB0100 and 0xFFFB8000
Finding in pseudo code: 0xFFFB8030 perf MC_ME between 0xFFFB8000 and 0xFFFC0000
Finding in pseudo code: 0xFFFC0240 perf SIUL2 between 0xFFFC0000 and 0xFFFD0000
Completed
 
Weighted score: MPC5646C.h 24%
Weighted score: MPC5601D.h 18%
Weighted score: MPC5744P.h 35%  <--------- This is our winner.
Weighted score: MPC5777C.h 24%

The microcontroller with the highest weighted score is the one you’re looking for. In our case, it’s the MPC5744P.h microcontroller.

To see the full code for the script used in our guide, go to the Apriorit GitHub page.

Conclusion

Microcontrollers are a vital part of today’s IoT devices. Knowing which microcontrollers a particular device contains is necessary to effectively improve the device’s performance and security.

With reverse engineering tools and techniques, developers can automate the process of microcontroller identification using firmware analysis, saving their time and effort for more complex and important tasks.

At Apriorit, we have experts who know how to reverse engineer software and can help you with a project of any complexity. With our experts in embedded and IoT development and reverse engineering, you can ensure the top-notch performance and security of your company’s embedded solutions.

Want to achieve top-notch security for your IoT software?

Improve the performance and security of your embedded solutions — get in touch with Apriorit’s experts in embedded and IoT development and reverse engineering.