Logo
blank Skip to main content
green-orange-angles_20_12

Extending IDA’s Capabilities with Python: A Practical Example of Disassembling an Xtensa Instruction

Reverse engineering rare architectures can bring multiple benefits to a project, but it’s an extremely challenging task. Even if a development team has the necessary skills for reversing, they may lack the tools to do it. On one of our projects, we needed to work with Xtensa instructions that aren’t supported by available reverse engineering tools yet.

In this article, we explore how to extend IDA’s capabilities to overcome this challenge and share how to implement an IDA plugin for disassembling an Xtensa instruction. This guide will be helpful for development leaders who face similar challenges on their projects and want to improve their knowledge of reversing tools.

Why create custom IDA plugins?

In one of our previous articles, we discussed the importance of researching the firmware architecture. In that instance, we managed to find a disassembler tool that supported the architecture of the device we needed to analyze. Now, we want to tackle the issue of a reverse engineering tool not supporting the architecture you’re working with on your project. Let’s explore IDA as an example for this article.

Interactive Disassembler (IDA) is a software disassembler that generates assembly language source code from machine-executable code and performs automatic code analysis. This reverse engineering tool offers wide disassembling and debugging functionality via numerous plugins.

IDA also uses a set of plugins called processor modules to convert raw byte code into disassembled text. And each plugin is designed for a specific hardware architecture.

Since the availability of disassembling plugins highly depends on an architecture’s popularity, some processor modules are updated more often than others. And support for new instructions may take time for IDA developers to implement.

So what can you do if there’s no plugin for the architecture you’re currently working with?

The good news is that it’s not necessary to wait for IDA developers to implement support for a specific instruction set. You can create custom IDA plugins yourself or implement a relevant plugin via the IDA SDK using Python.
Let’s explore an example of implementing an IDA plugin for reverse engineering purposes and use the new Xtensa architecture instructions, which are not supported by IDA 7.7 at the time of writing. Since these instructions are not disassembled in IDA, you see them as raw bytes:

xtensa-instructions-as-raw-bytes-in-ida
Screenshot 1. Xtensa instructions are shown as raw bytes in IDA

But if you use other software that supports the disassembling of the new Xtensa instructions, like the Lauterbach Trace32 simulator, you can see that these bytes are the Quotient Unsigned (QUOU) instruction:

xtensa-instructions-like-quou-instructions-in-lauterbach-trace32-simulator
Screenshot 2. Xtensa instructions look like the QUOU instruction in the Lauterbach Trace32 simulator

Once you know what these bytes are, you can find the description of the QUOU instruction and implement a plugin for IDA to extend the capabilities of the existing processor module. Let’s explore how you can do it.

Have a challenging reverse engineering task at hand?

Achieve the expected software development results even when tricky tasks seem to block your way. Leverage Apriorit’s expertise in reverse engineering to overcome non-trivial challenges.

Adding instructions to the IDA processor module with a new plugin

Let’s use the NECromancer plugin, which extends IDA’s processor module for the NEC V850 CPU.

The goal of using this plugin is to hook the event handler of the processor module and execute your own processing routine instead of the existing one. This plugin will allow the processor module to work with unknown instructions that it fails to process by default.

Let’s take a look at an empty plugin. Here’s the minimum code you need to register the plugin in the IDA engine and hook the processor module:

Python
class XtensaESP(plugin_t):
    flags = PLUGIN_PROC | PLUGIN_HIDE
    comment = ""
    wanted_hotkey = ""
    help = "Adds support for additional Xtensa instructions"
    wanted_name = "XtensaESP"
 
    def __init__(self):
        self.prochook = None
 
    def init(self):
        if ph_get_id() != PLFM_XTENSA:
            return PLUGIN_SKIP
 
        self.prochook = xtensa_idp_hook_t()
        self.prochook.hook()
        print ("%s initialized." % XtensaESP.wanted_name)
        return PLUGIN_KEEP
 
    def run(self, arg):
        pass
 
    def term(self):
        if self.prochook:
            self.prochook.unhook()
 
#--------------------------------------------------------------------------
def PLUGIN_ENTRY():
    return XtensaESP()

To make sure IDA will run the plugin only when the Xtensa CPU processor module is loaded, the plugin performs the following check:

Python
if ph_get_id() != PLFM_XTENSA

The NECromancer plugin also requires the xtensa_idp_hook_t hook class to install the handler for processor module events. Here’s what the hook class body looks like:

Python
class xtensa_idp_hook_t(IDP_Hooks):
    def __init__(self):
        IDP_Hooks.__init__(self)
 
    def ev_ana_insn(self, insn):
        pass
 
    def ev_out_mnem(self, outctx):
        pass
 
    def ev_out_operand(self, outctx, op):
        pass

The key elements of this code snippet are:

  • The ev_ana_insn method, which helps you analyze bytecode and create instruction classes
  • The ev_out_mnem method, which allows you to create the visual representation of the instruction, i.e. to generate the disassembler text
  • The ev_out_operand method, which implements the generation of instruction operands as text for disassembling

Let’s implement all three methods one by one.

Related project

Developing Software for a Drone Battery Charging and Data Management Unit

Discover behind-the-scenes details of delivering a stable MVP of an embedded solution that helped our client gather first feedback and improve their devices.

Project details

1. Implementing the ev_ana_insn method

The goal of using the NECromancer plugin is to add support for the QUOU (Quotient Unsigned) instruction. This means you need to know how the CPU actually parses the bytes that represent the QUOU instruction.

You can find this information in the Xtensa Instruction Set Architecture (ISA) Reference Manual [PDF]:

  • Instruction word:
instruction-word-1
  • Required configuration option: 32-bit Integer Divide Option
  • Assembler syntax: QUOU ar, as, at
  • Description: QUOU performs a 32-bit unsigned division of the content of address register as by the content of address register at and writes the quotient to address register ar. If the content of address register at is 0, QUOU raises an Integer Divide by Zero exception instead of writing a result.

In this particular case, you don’t need to know what the instruction does in detail. The goal is to understand how the CPU knows that a set of bytes is actually the QUOU instruction.

IDA showed this QUOU instruction as a sequence of bytes: 0xC0, 0x22, 0xC2. The first byte of the instruction — 0xC0 — is represented in the documentation as the following:

first-byte

Let’s explain what this means:

  1. The four top bytes marked as t have the value of 0xC.
  2. The lower four bytes are always equal to 0.
  3. The value of t is the index of the register used as the third argument of the instruction.
  4. 0xC = 12, which means that the third argument is a12.

The second byte of the instruction specifies two more arguments marked as r and s:

second-byte

In our case, the second byte is 0x22, which means r = 0x2 and s = 0x2. Thus, both the first and second operands are a2.

Finally, the third byte is 0xC2. According to the documentation, it will always be a constant:

third-byte

Since 1100 0010 = 0xC2, you can use this byte for QUOU instruction identification.

Now, everything is ready to start implementing the ev_ana_insn method.

First, create the new instruction ID, which will allow the IDA engine to distinguish the new instructions from the existing instructions:

Python
class NewInstructions:
    (NN_quou,
    NN_last) = range(CUSTOM_INSN_ITYPE, CUSTOM_INSN_ITYPE+2)

Second, let’s use the IDs starting from the CUSTOM_INSN_ITYPE value.

Finally, run the instruction analysis method, which looks like this:

Python
def ev_ana_insn(self, insn):
    buf = get_bytes(insn.ea, 3)
    if buf[2] == 0xC2 and (buf[0] & 0xF) == 0:
       insn.itype = NewInstructions.NN_quou
       insn.size = 3
       return True
  
    return False

Let’s explain what this code does:

  • The get_bytes() function reads the raw bytes from the binary at the address which IDA expects to be the next instruction.
  • Then, it checks if the instruction bytes actually look like the QUOU instruction.

In this case, we check that the third byte is 0xC2 and the lower 4 bytes are 0 in the first byte, as defined in the documentation. Finally, you need to fill in the insn argument of the ev_ana_insn method with the information about the QUOU instruction: at least specify the instruction ID and the size of the instruction in bytes.

Then, the ev_ana_insn method must return True if it was able to find the instruction at the suggested address; otherwise, it must return False.

Even if you keep your efforts to the absolute minimum as shown above, IDA will already be able to recognize the new instruction. But we’d also like to show you how to make IDA know about the instruction arguments too, because otherwise the instruction will be shown as if it had no arguments. To do that, you’ll need to introduce improvements to the ev_ana_insn() method:

Python
def ev_ana_insn(self, insn):
    buf = get_bytes(insn.ea, 3)
    if buf[2] == 0xC2 and (buf[0] & 0xF) == 0:
       insn.itype = NewInstructions.NN_quou
       insn.size = 3
  
       insn.Op1.type = o_reg
       insn.Op1.reg = buf[1] >> 4
       insn.Op2.type = o_reg
       insn.Op2.reg = buf[1] & 0xF
       insn.Op3.type = o_reg
       insn.Op3.reg = buf[0] >> 4
 
       return True
  
    return False

This new piece of code implements the definition of the arguments for the instruction. These are the r, s, and t values that the code is extracting from the instruction bytes.

Once the parsing is done, it’s time to set up the output of the disassembled text.

Read also

The Power of Python: Pros and Cons, Integration Capabilities, and Use Cases

Benefit from using Python for various activities on your project: development, data analysis, integrations, and more. Analyze practical use cases of working with Python to enhance your project’s chances for success.

Learn more

2. Implementing the ev_out_mnem method

To generate the disassembler text, you can completely reuse the existing code of the NECromancer plugin for the ev_out_mnem method:

Python
DEBUG_PLUGIN = True
 
NEWINSN_COLOR = COLOR_MACRO if DEBUG_PLUGIN else COLOR_INSN
 
class NewInstructions:
   (NN_quou,
   NN_last) = range(CUSTOM_INSN_ITYPE, CUSTOM_INSN_ITYPE+2)
    
   lst = {NN_quou:"quou"}
     
def ev_out_mnem(self, outctx):
        insntype = outctx.insn.itype
        global NEWINSN_COLOR
 
        if (insntype >= CUSTOM_INSN_ITYPE) and (insntype in NewInstructions.lst):
            mnem = NewInstructions.lst[insntype]
            outctx.out_tagon(NEWINSN_COLOR)
            outctx.out_line(mnem)
            outctx.out_tagoff(NEWINSN_COLOR)
 
            # TODO: how can MNEM_width be determined programmatically?
            MNEM_WIDTH = 8
            width = max(1, MNEM_WIDTH - len(mnem))
            outctx.out_line(' ' * width)
 
            return True
        return False

Let’s explain the main points of this example:

  • ev_out_mnem obtains the instruction name from the NewInstructions class and andoutctx.out_line to display the text in the IDA disassembly window.
  • The DEBUG_PLUGIN flag changes the color of the text. You can set it to the default color or to the color of Macros to make the new instructions stand out, which is quite convenient when debugging the plugin.
  • ev_out_mnem outputs eight spaces to prepare the engine for outputting instruction arguments. Thus, everything — code, spaces, comments to the code — will be aligned.

3. Implementing the ev_out_operand method

To implement the generation of instruction operands, you can also reuse the ready-to-go code for the ev_out_operand method:

Python
def ev_out_operand(self, outctx, op):
        insn = outctx.insn
        if insn.itype in [NewInstructions.NN_ld_hu, NewInstructions.NN_st_h]:
            if op.type == o_displ:
                outctx.out_value(op, OOF_ADDR)
                outctx.out_register(ph_get_regnames()[op.reg])
                return True
        return False

This code checks if the instruction is the one you’ve added earlier. If the instruction contains arguments, it’s required to print the operand. Then, you’ll get the name of the register and the plugin will print it.

Now, all preparations are finished, and you can move to testing the plugin you’ve created.

Read also

Practical Comparison of the Most Popular API Hooking Libraries: Microsoft Detours, EasyHook, Nektra Deviare, and Mhook

Explore the pros, cons, and application examples of C and Python hooking libraries to use in your reverse engineering project.

Learn more
Comparison of API hooking libraries

Testing the plugin

Place the plugin in the \IDA\plugins directory so IDA can run it. Then, load the IDA database that contains undefined bytes, set the cursor at it, and press C to create the instruction:

creating-new-quou-instruction-in-ida
Screenshot 3. Creating the new QUOU instruction in IDA

Once you’ve added support for the QUOU instruction, IDA will automatically recognize this instruction whenever it encounters it. The color of the instruction here is different from the rest of the instructions because we enabled the DEBUG_PLUGIN flag for this example. If you decide to disable the flag, the color of the instruction will be the same as the rest of the code.

The full NECromancer plugin source code of our example is below:

Python
from ida_lines import COLOR_INSN, COLOR_MACRO 
from ida_idp import CUSTOM_INSN_ITYPE, IDP_Hooks, ph_get_regnames, ph_get_id, PLFM_XTENSA
from ida_bytes import get_bytes
from ida_idaapi import plugin_t, PLUGIN_PROC, PLUGIN_HIDE, PLUGIN_SKIP, PLUGIN_KEEP
from ida_ua import o_displ, o_reg, o_imm, dt_dword, OOF_ADDR
from struct import unpack

DEBUG_PLUGIN = True

NEWINSN_COLOR = COLOR_MACRO if DEBUG_PLUGIN else COLOR_INSN

class NewInstructions:
    (NN_quou,
    NN_muluh) = range(CUSTOM_INSN_ITYPE, CUSTOM_INSN_ITYPE+2)
    
    lst = {NN_quou:"quou",
           NN_muluh:"muluh"}
#--------------------------------------------------------------------------
class xtensa_idp_hook_t(IDP_Hooks):
    def __init__(self):
        IDP_Hooks.__init__(self)

    def decode_instruction(self, insn):
        buf = get_bytes(insn.ea, 3)
        #print("%08X bytes %X %X %X" % (insn.ea , buf[2] , buf[1] , buf[0]))
        if buf[2] == 0xC2 and (buf[0] & 0xF) == 0:
            insn.itype = NewInstructions.NN_quou
            insn.size = 3
            insn.Op1.type = o_reg
            insn.Op1.reg = buf[1] >> 4
            insn.Op2.type = o_reg
            insn.Op2.reg = buf[1] & 0xF 
            insn.Op3.type = o_reg
            insn.Op3.reg = buf[0] >> 4
            return True
            
        if buf[2] == 0xA2 and (buf[0] & 0xF) == 0:
            insn.itype = NewInstructions.NN_muluh
            insn.size = 3
            insn.Op1.type = o_reg
            insn.Op1.reg = buf[1] >> 4
            insn.Op2.type = o_reg
            insn.Op2.reg = buf[1] & 0xF 
            insn.Op3.type = o_reg
            insn.Op3.reg = buf[0] >> 4
            return True
             
        return False

    def ev_ana_insn(self, insn):
        return self.decode_instruction(insn)

    def ev_out_mnem(self, outctx):
        insntype = outctx.insn.itype
        global NEWINSN_COLOR

        if (insntype >= CUSTOM_INSN_ITYPE) and (insntype in NewInstructions.lst):
            mnem = NewInstructions.lst[insntype]
            outctx.out_tagon(NEWINSN_COLOR)
            outctx.out_line(mnem)
            outctx.out_tagoff(NEWINSN_COLOR)

            # TODO: how can MNEM_width be determined programmatically?
            MNEM_WIDTH = 8
            width = max(1, MNEM_WIDTH - len(mnem))
            outctx.out_line(' ' * width)

            return True
        return False


    def ev_out_operand(self, outctx, op):
        insn = outctx.insn
        if insn.itype in [NewInstructions.NN_ld_hu, NewInstructions.NN_st_h]:
            if op.type == o_displ:
                outctx.out_value(op, OOF_ADDR)
                outctx.out_register(ph_get_regnames()[op.reg])
                return True
        return False
#--------------------------------------------------------------------------
class XtensaESP(plugin_t):
    flags = PLUGIN_PROC | PLUGIN_HIDE
    comment = ""
    wanted_hotkey = ""
    help = "Adds support for additional Xtensa instructions"
    wanted_name = "XtensaESP"

    def __init__(self):
        self.prochook = None

    def init(self):
        if ph_get_id() != PLFM_XTENSA:
            return PLUGIN_SKIP

        self.prochook = xtensa_idp_hook_t()
        self.prochook.hook()
        print ("%s initialized." % XtensaESP.wanted_name)
        return PLUGIN_KEEP

    def run(self, arg):
        pass

    def term(self):
        if self.prochook:
            self.prochook.unhook()
#--------------------------------------------------------------------------
def PLUGIN_ENTRY():
    return XtensaESP()

And that is how you implement a plugin for IDA using Python. Now you have it: IDA is able to disassemble the new Xtensa instruction.

Conclusion

Although tools like IDA support the majority of popular architectures, knowing how to extend the capabilities of reverse engineering tools will extend your opportunities. This knowledge will also help you when working with new architectures and solving non-trivial tasks.

But to create custom IDA plugins using Python or any other language, you need an experienced specialist.

At Apriorit, we have a professional team of mature reverse engineering experts who can help you with projects of any complexity. Our developers have deep knowledge in reversing undocumented APIs, various file formats, and IoT firmware, along with extensive expertise in improving cybersecurity and software interoperability of client solutions.

Need rare reverse engineering skills?

Reach out to Apriorit to make your project benefit from the power of reverse engineering, Python, and other technologies.

Have a question?

Ask our expert!

Tell us about your project

Send us a request for proposal! We’ll get back to you with details and estimations.

By clicking Send you give consent to processing your data

Book an Exploratory Call

Do not have any specific task for us in mind but our skills seem interesting?

Get a quick Apriorit intro to better understand our team capabilities.

Book time slot

Contact us