Aarch64 decompilation #533

MatejKastak · 2019-03-26T14:39:39Z

Changes include instructions translation to llvmir, unit tests, llvmir environment generation, arch abi, etc.

Run tool with reasonable Capstone basic modes for specified architecture. Default values are as follows: -a arm : CS_MODE_ARM -a arm64 : CS_MODE_ARM [looks like keystone doesn't like this] -a mips : CS_MODE_MIPS32 -a x86 : CS_MODE_32 -a ppc : CS_MODE_32 -a <rest>: CS_MODE_LITTLE_ENDIAN

- register maps(_reg2type) - instructions map(_i2fm) Modified ARM Translator unit, Work in progress.

- register name could not be found because of the wrong cs_arch in constructor

- capstone was configured without the ARM64 support, this caused cs_open to fail

- flags from status register added to arm64 env - program counter added to arm64 env

- basic implementation of functions needed for loading and storing operands - translateAdd is for testing purposes

- started implementation of MEM operand type - Store register instruction translation method e.g. retdec-capstone2llvmir -a arm64 -t 'str x0, [x1]'

- MOV, MVN and MOVZ instructions - operand shift functions moved and changed for ARM64 - instructions like 'movz x0, avast#3 LSL 16' work now

- test framework capstone2llvmirtranslator - first INS_ADD test - cmake compilation

- MOV, MOVZ

- Store pair instruction{pre-index, post-index, signed-offset} - test for all cases except 32bit operands - pc moved to its own enum - generateGetOperandAddr to generate address from instruction operand

- LDR{pre-index, post-index, signed-offset} instruction implemented - STR{pre-index, post-index, signed-offset} instruction implemented - LDR tests ported from ARM - LDP todo

- Register parent map - Storing registers - Loading registers - Headers - Need more changes to conversions, I think 'mov w0, avast#3' zeroes out the upper 32bits of x0 register. But need to investigate further.

- taken from uname -a in qemu arm64 machine Linux debian-aarch64 4.9.0-4-arm64 avast#1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) aarch64 GNU/Linux

- when writing value to 32bit reg the 64bit, the value is zero extended to the vhole register - parent register mapping enabled in tests - 32bit version of tests

- added tests for label and imm branch

- added tests

- added tests for instruction

- real binary testing is needed - without tests

in Architecture::setArch() ARM64 needs to be set before ARM because "arm" from ARM matches the "arm aarch64" from ARM64

- Added the option to switch this behaviour - add one ADD test with shift

- Arm supports the extension of operand e.g. 'add x0, x1, w2, SXTW' will sign-extend the w2 register to 64 bit and after that add the values - test for 64bit variant implemented - need to check the optional imm(shift VM outputs weird values)

- let's start testing

-> isArmOrThumb renamed to isArm32OrThumb -> added isArm32 method -> thumb is now set with a flag _thumbFlag

Now the enum eArch represents only general architecture and all subtypes of architecture are checked to getBitSize() or _thumbFlag. The function isArm() return true for every type of subarchitecture e.g. {arm32, arm64 or thumb}

- Added some instruction IDs to branch types

- For example 'str w0, [sp]' should store only 4bytes to stack pointer

Replace svc #0 with corresponding syscall decoded from previous assignments.

Generate Vector registers so in case the pseudo instructions with them as operands is generated we don't crash. For the similar purpose I changed the f16 in ARM64_REG_H* to i16 since half type in not supported and we wan't to be able to at least generate pseudo instructions.

Those tests target loading and storing floating point values.

- Zero division is NOW undefined behaviour - This caused problems in modulo idiom detection - Also removed coresponding tests

- Correctly handle imm values as operands of this instruction

This reverts commit 7b88475. This change caused other tests to fail.

- Removed unused code from decoder/arm64.cpp - Fixed insnWrittesPcArm64 to work better - Fixed Cond branch tests

PeterMatula · 2019-03-27T13:31:04Z

Although I said that you don't need to document every single thing in Doxygen, please make sure that the existing comments are without errors - doxygen-build doesn't fail.

MatejKastak · 2019-03-27T22:13:15Z

Yes, I completely forgot about documentation builds. It should be fixed now.

MatejKastak added 30 commits September 6, 2018 23:33

Base for the ARM64 translator

7b92167

- register maps(_reg2type) - instructions map(_i2fm) Modified ARM Translator unit, Work in progress.

Fix the cs_reg_name

7007ead

- register name could not be found because of the wrong cs_arch in constructor

Add ARM64 support for capstone dependency

952def8

- capstone was configured without the ARM64 support, this caused cs_open to fail

Temporary solution to call translate function

8a79401

Status register and program counter added to environment

446820d

- flags from status register added to arm64 env - program counter added to arm64 env

Methods store/load registers/operands skeletons + add instruction

94f6426

- basic implementation of functions needed for loading and storing operands - translateAdd is for testing purposes

Store instruction base

623aef9

- started implementation of MEM operand type - Store register instruction translation method e.g. retdec-capstone2llvmir -a arm64 -t 'str x0, [x1]'

Operand shifts ported from ARM and MOV instruction tranlation

32781de

- MOV, MVN and MOVZ instructions - operand shift functions moved and changed for ARM64 - instructions like 'movz x0, avast#3 LSL 16' work now

Arm64 - tests ported from Arm

5edc4ba

- test framework capstone2llvmirtranslator - first INS_ADD test - cmake compilation

Basic MOV tests

8caa63e

- MOV, MOVZ

Test for STR instruction and test header comments

649d74d

STP instruction + tests, pc in new enum, get op addr function

928f3e3

- Store pair instruction{pre-index, post-index, signed-offset} - test for all cases except 32bit operands - pc moved to its own enum - generateGetOperandAddr to generate address from instruction operand

LDR + STR, LDR tests from ARM, LDP stub

bf88c1e

- LDR{pre-index, post-index, signed-offset} instruction implemented - STR{pre-index, post-index, signed-offset} instruction implemented - LDR tests ported from ARM - LDP todo

Implemented parent register handling

8916e12

- Register parent map - Storing registers - Loading registers - Headers - Need more changes to conversions, I think 'mov w0, avast#3' zeroes out the upper 32bits of x0 register. But need to investigate further.

LLVM data layout modified for ARM64

9b62769

- taken from uname -a in qemu arm64 machine Linux debian-aarch64 4.9.0-4-arm64 avast#1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) aarch64 GNU/Linux

Removed useless debug output

372f896

getCarryRegister for ARM64 fixed

d369bfb

Store register ZEXT_TRUNC, 32bit tests baseline + tests

8631ae7

- when writing value to 32bit reg the 64bit, the value is zero extended to the vhole register - parent register mapping enabled in tests - 32bit version of tests

Zero extension tests for ADD and MOV 32bit variants

7626662

Implemented BL instruction

58381bb

- added tests for label and imm branch

Implemented RET instruction

f0f9195

- added tests

Implemented LDP instruction

8af607c

- added tests for instruction

Implemeneted ADRP instruction

ed53e67

- real binary testing is needed - without tests

enable arm64 in decompiler.py and add arm64 architecture

8c24571

in Architecture::setArch() ARM64 needs to be set before ARM because "arm" from ARM matches the "arm aarch64" from ARM64

Arm64 ABI implementation

69b78f7

Arm64 decoder ported from Arm

f04458b

Arm64 imm operand shifts should not update flags by default.

f225f0c

- Added the option to switch this behaviour - add one ADD test with shift

Arm64 Zero/Sign extension 32bit variant tests

8ed7de9

MatejKastak added 19 commits March 1, 2019 14:51

Arm64: FCMP, FCCMP, FCVT, {U, S}CVTF instructions + tests

35b2d35

Arm64: FCVTZS, FCVTZU instructions + tests

d980328

- let's start testing

Arm64, bin2llvmir: Decoder should not analyse stack.

7b88475

Merge branch 'master' into arm-prep

309baeb

Arm64: MOVK instruction + tests

bac43b4

Arm64: MOVN instructions + tests

b8b28ec

Merge master with arm-prep

affd7d3

Architecture: Change arm architectures to account for arm64

e9b2866

-> isArmOrThumb renamed to isArm32OrThumb -> added isArm32 method -> thumb is now set with a flag _thumbFlag

Architecture: Removed the wrong architecture types

c5f421f

Now the enum eArch represents only general architecture and all subtypes of architecture are checked to getBitSize() or _thumbFlag. The function isArm() return true for every type of subarchitecture e.g. {arm32, arm64 or thumb}

Arm64: XZR loads zero and discards result when written

dbbb137

- Added some instruction IDs to branch types

Arm64: STR and LDR instructions now determine correct register size

2a5e865

- For example 'str w0, [sp]' should store only 4bytes to stack pointer

Arm64: Syscall optimalization and detection

275d44e

Replace svc #0 with corresponding syscall decoded from previous assignments.

Arm64: STR and LDR tests

5012585

Those tests target loading and storing floating point values.

Arm64: Removed zero division semantics from llvmir

891db78

- Zero division is NOW undefined behaviour - This caused problems in modulo idiom detection - Also removed coresponding tests

Merge branch 'master' into arm-prep

b56101e

Arm64: FMOV instruction with immediate values

2ee7c91

- Correctly handle imm values as operands of this instruction

Revert "Arm64, bin2llvmir: Decoder should not analyse stack."

3c6b0d3

This reverts commit 7b88475. This change caused other tests to fail.

Merge branch 'master' into arm-prep

52ac3c8

s3rvac requested a review from PeterMatula March 27, 2019 07:49

s3rvac assigned PeterMatula Mar 27, 2019

s3rvac added new-feature T-arch-arm64 labels Mar 27, 2019

Arm64: Simplified and documented some code

e92523d

- Removed unused code from decoder/arm64.cpp - Fixed insnWrittesPcArm64 to work better - Fixed Cond branch tests

Arm64: Fixed documentation build

aa97f92

PeterMatula changed the base branch from master to arm64 March 28, 2019 10:43

PeterMatula merged commit f07407f into avast:arm64 Mar 28, 2019

s3rvac added a commit that referenced this pull request May 29, 2019

Add a new CHANGELOG entry (#268, #533, #550).

5352b35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aarch64 decompilation #533

Aarch64 decompilation #533

MatejKastak commented Mar 26, 2019

PeterMatula commented Mar 27, 2019

MatejKastak commented Mar 27, 2019

Aarch64 decompilation #533

Aarch64 decompilation #533

Conversation

MatejKastak commented Mar 26, 2019

PeterMatula commented Mar 27, 2019

MatejKastak commented Mar 27, 2019