Skip to content

How to add support for a new architecture

Peter Matula edited this page May 4, 2023 · 8 revisions

Introduction

RetDec stands for Retargetable Decompiler. As such, it aims to support many different architectures and tries to make adding new ones as easy as possible. This article lists the needed steps to implement support for a new architecture. Not all of the actions are strictly obligatory, but it is a good idea to do them all anyway.

Basic information

RetDec uses Capstone disassembler in the process of translation of binary data to their intermediate representation (LLVM IR). Therefore, RetDec can conceivably support all the architectures Capstone supports. Using a different disassembler is not unimaginable, but would be extremely difficult in the current design.

The bulk of work on adding a new architecture is isolated to the capstone2llvmir library and can be done without any knowledge of the rest of the decompilation process. However, in order to produce good-quality results, it might be necessary to implement architecture-specific analyses.

List of actions

  1. When in doubt, contact us (the RetDec team). If you are serious about implementing a new architecture, we will gladly help you.
  2. Browse RetDec issues and try to find an issue asking for adding support for the architecture you are interested in (e.g. #9).
    • If such an issue exists, carefully read the comments. Maybe there are some problems you should be aware of. Maybe someone is already working on it. If you decide to work on it yourself, please let us and others know by writing a comment or providing a link to your fork.
    • If such an issue doesn't exist yet, create it (e.g. #494), and let us/others know you are working on it.
  3. Take a look at capstone-dumper. If the architecture you plan to implement is not yet supported, implement a module for it. Take inspiration from other modules - they are quite straightforward. This is beneficial because:
    • You will get familiar with Capstone in general, and Capstone's module for the specific architecture.
    • The dumping tool will come in handy later when you work on instruction translation (it takes a lot of experiments).
    • Others will be able to inspect instructions for this architecture. (Please send us a pull request once you are finished.)
  4. Add a new module to the capstone2llvmir library.
    • Study the general design of the library.
    • Enable the new architecture in deps/capstone/CMakeLists.txt (e.g. -DCAPSTONE_ARM64_SUPPORT=ON).
    • Take heavy inspiration in modules for other architectures. They are all very similar in design. Keep the same design and just adapt it to the specifics of the selected architecture.
    • Write unit tests as you go. Again, just adapt what is already used in the existing modules.
    • Don't try to implement all/many instructions right away. Implement only the core set and move to the other steps in this list. You can come back later and add more instructions when necessary. Keep in mind that some instructions should not be implemented at all.
    • There might be some tricky specifics in the architecture you have chosen to support (e.g. x86 FPU). If so, definitely contact the RetDec authors and discuss possible solutions - deep knowledge about the decompilation process might be necessary.
  5. Add support for the new architecture to capstone2llvmirtool.
    • This is very similar to capstone-dumper, but instead of dumping info about an instruction, it translate it to a sequence of LLVM IR using the capstone2llvmir library.
  6. Add and enable the new architecture throughout RetDec. (We can help you with this.)
  7. Now, it should be possible to decompile binaries for the new architecture. It is time to test on real inputs.
    • Get familiar with the regression tests framework.
    • Add at least a basic regression test for the new architecture - test that decompilation succeeds and produces some output.
    • Continue adding increasingly complex tests, inspect decompilation outputs and:
      • If they are ok, write a test that checks the output quality.
      • If there is a problem, report it (create an issue).
  8. At this point (decompilation of a new architecture is producing some results), your goal has been achieved. If you want, you can try to solve the reported problems, or test the new module on complex real-world binaries, or add another architecture :-). But from this point on, RetDec can handle new binaries and its output quality will get better over time as more and more issues are reported and solved.