Reverse engineering is the direct opposite of building or engineering an application: you break things down bit by bit to see how they actually work. Developers incorporate reverse engineering techniques to solve tasks from investigating bugs in code to ensuring smooth and easy legacy code maintenance.
When reverse engineering software, the operating system it was created for should be one of the first things you pay attention to. In this article, we describe the basic concepts of reverse engineering macOS software and iOS apps. This tutorial will be useful for developers who want to know more about macOS and iOS reverse engineering.
Why do we need reverse engineering? The answer is rather simple.
When you build a piece of software, you usually have all of the source code available and can take a look at the source code at any time. So figuring out how a particular process or feature works shouldn’t be too much of a challenge.
But what if you have an executable and you need to figure out how it works without access to any source code? The solution is obvious: you need to reverse engineer it.
There are several reasons why you might need to use reverse engineering:
- To research complicated software issues
- To improve software compatibility with third-party solutions and formats
- To improve interactions between software and the platform
- To provide easy maintenance of legacy code
- And more
Below, we take a closer look at the basic structure of an executable, briefly cover reversing Objective-C and Swift code, list several of the most popular tools for reverse engineering macOS and iOS apps, and give some reverse engineering tips for a number of use cases.
Let’s start with some basics that you need to know before you try to reverse engineer your first executable.
If you’ve finally decided to reverse engineer binary, then you should understand that some parts of it probably contain executable code. Therefore, before you even start reversing a piece of software, you need to learn the executable binary structure.
Executable binary format
In the world of Mach kernel-based operating systems, it’s common to use the Mach-O executable format. These executables can be inside thin or fat binary files. Here’s how these two types of binaries differ:
- A thin binary contains a single Mach-O executable
- A fat binary may contain many Mach-O executables
We use fat binaries to merge executable code in one single file for different CPU instruction sets.
Here’s the basic structure of a Mach-O executable:
Let’s take a closer look at each component.
Every binary begins with a header. This is a key part of every executable for macOS and iOS. It’s the first part of the executable read by the loader during image loading.
A fat binary begins with a fat header, while a thin binary begins with a mach header. Every header starts with a magicnumber used to identify it.
A fat header describes the locations of mach headers for executables in a binary. A mach header describes general information about the current executable file.
A mach header contains load commands that represent several things crucial for image loading:
- Segments and sections of the executable and its mapping to virtual memory
- Paths to the linked dynamic libraries
- Location of tables of symbols
- Code signature
Segments are typically large pieces of an executable file mapped by a loader to some location in the virtual address space.
In the image above, you can see a lot of information about the chosen segment:
- Offset in the current executable
- Size of the region appointed for segment mapping
- Segment attributes
All segments consist of sections. A section is part of the segment that’s intended to store some specific type of content. For example, the __text section of the __TEXT segment contains executable code, and the __la_symbol_ptr section of the DATA segment contains a table of pointers to so-called lazy external symbols.
Every dynamic library dependency is described by a load command containing the path to the dynamic library binary file and its version.
In addition, load commands contain the following information critical for the operation of executable code:
- Location of symbol tables
- Location of import and stub tables
- Location of the table with information for the dynamic loader
The main symbol table contains all symbols used in the current executable. Every locally or externally defined symbol or even stub (which can be generated for an external call that executes through an import table) is mentioned here. This table is divided into three parts, showing whether the symbol is debug, local, or external. Every entry in the main symbol table represents a particular part of the executable code by specifying the offset of its name in the string table, type, section ordinal, and other type-specific information.
There’s a string table that contains names of symbols defined in the main symbol table. There’s also a dynamic symbol table that links import table entries to the appropriate symbol. In addition, there’s one more table that contains information used by the dynamic loader for every external symbol.
Code signature data
A code signature can also be rather helpful when reverse engineering a binary. While a code signature is one of the poorly documented (but still open-source) parts of an executable, its content can be displayed by means of the codesign tool (see the image below).
Code signature data contains a number of important elements:
- Code directory
- Сode signing requirements
- Description of sealed resources
- Code signature
Let’s take a closer look at each element.
The code directory is a structure that contains miscellaneous information (hash algorithm, table size, size of code pages, etc.) and a table of hashes. The table itself consists of two parts: positive and negative.
The positive part of the table of hashes contains hashes of executable code pages.
The negative part optionally contains hashes of such code signature parts as code signing requirements, resources, and entitlements, as well as a hash of the Info.plist file.
Code signing requirements, resources, and entitlements are just bytestreams of the appropriate files located inside a bundle.
The code signature is an encrypted code directory represented in CMS format.
One more thing you should pay special attention to before you even start reverse engineering a macOS or iOS app is the architecture it was designed for. Modern desktop devices usually use x86-64 CPUs. Mobile devices use ARMv7, ARMv7s, ARMv8-A, ARMv8.2-A, ARMv8.3-A, and ARM64 CPUs.
Knowledge of instruction sets is important when reverse engineering algorithms. In addition, it’s good to be familiar with calling conventions and some things specific to ARM-based systems on a chip (SoC), like thumb mode and opcodes format.
Nowadays, all system frameworks and dynamic libraries are merged into a single file called the shared cache. This file is located at the following address: /System/Library/Caches/com.apple.dyld/.
These are the basic things you need to know about before doing any reverse engineering. Now let’s talk about the tools that can help you on this journey.
Below are standard command-line tools for reverse engineering iOS and macOS apps. These tools are available out of the box on Mac:
- lldb is a powerful debugger used in Xcode. You can use this tool to reverse engineer and debug code written in C++, Objective-C, and C. lldb allows you to debug code on both actual iOS devices and simulators.
- otool is a console tool for browsing and editing in mach-o executables. It displays specified parts of libraries and object files.
- nm is a console tool for browsing names and symbols in mach-o executables.
- codesign is a useful tool for working with code signatures. It provides comprehensive information on code signatures and allows for creating and manipulating them.
In addition, there are several third-party reverse engineering utilities:
Let’s look closer at each of these utilities.
IDA (Interactive DisAssembler) is one of the most famous and widely used reverse engineering tools. IDA is a disassembler and debugger that’s suitable for performing complex research of executables. It’s a cross-platform tool that runs on macOS, Windows, and Linux.
IDA can be used for disassembling software designed for macOS, Windows, and Linux platforms. The program has a free evaluation version with limited functionality. There’s also a paid version, IDA Pro, which supports a wider range of processors and plugins.
MachOView is a utility that works similarly to the otool and nm console tools. The key difference is that MachOView does have a GUI, so you can browse the structure of mach-o files in a more comfortable way. In fact, MachOView was used to make most of the screenshots you see in this article. MachOView is free to use, but unfortunately, it isn’t always stable.
Class-dump is a free command-line utility for analyzing the Objective-C segment of mach-o files. With class-dump, you can get pretty much the same information as from otool but in the form of standard Objective-C declarations. In particular, class-dump creates declarations for classes, categories, and protocols.
Hopper is an interactive tool for disassembling, decompiling, and debugging software and applications. Similarly to IDA, Hopper has a free version with a limited set of features in addition to a paid version. Hopper was designed for Linux and macOS and works best for retrieving Objective-C specific information from the analyzed binary.
Dsc_extractor is Apple’s own open-source tool for extracting libraries and frameworks from dyld_shared_cache. When extracting data, the utility saves the locations and original names of all extracted objects.
Ghidra is an open-source reverse engineering framework provided by the NSA. It supports macOS, Windows, and Linux. Ghidra can be used as a decompiler, as well as a tool for performing such tasks as assembling/disassembling, graphing, and scripting code. It can be customized with the help of scripts and plugins written in Java or Python.
9 Best Reverse Engineering Tools for 2019
Now, let’s look at some of the specifics of reverse engineering code written in particular programming languages. Within this article, we focus on the peculiarities of reverse engineering solutions written in Objective-C and Swift.
How to reverse engineer Objective-C code
Objective-C is commonly used for developing applications for macOS and iOS. It relies on a specific C runtime, which somewhat simplifies the process of reverse engineering.
Let’s consider a simple code from an actual application:
If we compile and then decompile this code into pure C, we’ll get a result that looks something like this:
This example demonstrates the basics of object allocation and messaging. Every call to every method is performed by calling the runtime:
The first argument, named self, can be found as a pointer to an object (which obviously should be derived from NSObject). The second argument, named op, is a pointer to the so-called selector. These two arguments are mandatory. The specification of other arguments needed for a particular method is optional.
The runtime operates two types: SEL and IMP, which are a selector and implementation respectively. The selector represents the human-readable name of a method. In the context of our example, the selector is initWithInt. The implementation is a pointer to a С function, and it looks like this:
As we can see, this is an almost regular C function except for its unusual name that contains “+” or “-” (which distinguishes static methods from non-static methods), the name of a class, the selector string, and two extra arguments (self and _cmd).
From this point of view, the main purpose of the objc_msgSend function is to find the implementation for a given selector and object and call it by passing all specified arguments.
The described approach brings at least two specific nuances into the reverse engineering process:
- It’s impossible to find a direct call to a method implementation. The selector is the key for searching implementations by name.
- The human-readable selector is a great hint for understanding executable code. All selector names reside in the __objc_methname section of the __TEXT segment.
How to reverse engineer Swift code
Swift uses the same runtime as Objective-C, so the process of reverse engineering Swift code is similar to reversing Objective-C code.
Now, let’s move to practical tips and techniques that can simplify the process of reverse engineering macOS and iOS apps.
Below, we provide a list of short but helpful techniques that can make reverse engineering of iOS/macOS apps a bit easier for you.
Case 1. Reversing open-source code
Before reverse engineering anything, check out the Apple Open Source website. A lot of things are available as source code on this platform. For example, the structure of the Code Signature part of mach-o can be understood by inspecting the codesign tool, whose sources are publicly available.
Case 2. Getting an executable to reverse engineer
For macOS software, the simplest case is reverse engineering an executable from an IPA file or an app. The executable can be obtained in an obvious way from an application.
And when you decompile an IPA file, it’s just a regular zip archive with a certain structure. An executable can be found inside the Payload/*.app subdirectory of the archive. In such form, any executable can be traced by any reverse engineering tool described above.
A more tricky case is reverse engineering the iOS part. Usually, we need to have a jailbroken device. If jailbreaking is not an option, there’s still a possibility to get the file from the filesystem using the Document Interaction functionality.
Case 3. Reversing emulator binaries
If there’s no chance to get binary from a device, there’s still a possibility to get it from an iOS simulator. The tricky part is that the simulator is x86 and its code differs from the iOS code on a real device. Nevertheless, interfaces of daemons and frameworks are the same as on a true iOS device.
Case 4. Finding the cause of application-specific issues
If there’s some kind of application-specific issue we need to investigate, we can get a crash report and a stack trace. In such cases, we need to understand the common logic around the issue. This can look like a complicated task because of the private functions displayed in stack trace, such as ___lldb_unnamed_function. The universal way to locate such a private function using a disassembler or MachOView is to find its offset that correlates with the __text section. We can usually get the function address from trace, while the segment address can be found with the help of the debugger command:
A great hint for understanding the internal structure of an executable is using tracking strings. Function names can be often tracked by strings passed to the system log. The principal application delegate can be found by inspecting arguments of NS- or UIApplicationMain functions.
Case 5. Reverse engineering using private or internal functionality
Usually, we have a public API as a starting point, and thus we know the framework we should explore. In some cases, we can use debug symbols instead of a framework binary. We can find these symbols at ~/Library/Developer/Xcode/iOS DeviceSupport/.
It is expected that internals are not exported. With Objective-C code, even internal code can be executed using the low-level obj-c runtime. If it can't be, then there’s a possibility to dump class declarations using otool or class-dump and use these internals without confusing the linker.
Case 6. Communicating with a daemon
A framework frequently appears to be a proxy between the application and a daemon. Examples of such client-server tandems are MobileInstallation.framework and installd. When someone makes a call to MobileInstallation.framework, it delegates most of the work to installd using rpc.
The first remote procedure call (rpc) that macOS and iOS use is the mach interprocess communication facility. The second is cross-process communication (xpc), which also uses mach messages behind the scenes, though it’s much higher-level.
Xpc runs in a restricted environment by default. Any capabilities must be whitelisted by a set of entitlements; otherwise, privileged tasks aren’t permitted. This limitation is what often makes xpc hard to use and turns the low-level mach into a much better option.
Reverse engineering is a helpful approach developers can use for investigating and analyzing software code to research malware, fix software issues, ensure software compatibility, simplify support for undocumented legacy code, etc. To reverse engineer a piece of software, you need to know the basic binary executable structure and have a set of tools for browsing and disassembling executables. For macOS and iOS solutions, you can use standard command-line tools available on Mac and third-party utilities.
At Apriorit, we have a dedicated team of researchers and developers who can help you investigate and improve your product. Get in touch with us to receive a preliminary estimate for your research project.