Add prototypes for dynamically-linked functions without headers #1092

richardlford · 2022-08-04T17:29:02Z

When the program involves dynamically-linked functions like _Znwj
(operator new) that return a pointer, it is necessary to have
prototypes for them, since otherwise they will be implicitly deduced
to return "int" which cannnot be dereferenced.

Previously RetDec was emitting comments telling which functions were
dynamically linked. This change moves them up before the functions are
emitted and instead emits prototypes for the functions. However,
RetDec also inserts includes of headers for functions with known
headers. So this change does not emit prototypes for functions with headers as that
would be redundant. As a result, some dynamically-linked functions
that used to show in the comments no longer appear as the included
header will declare them.

The section header comment for dynamically-linked functions is only
produced if some prototypes are written for dynamically-linked
functions.

A related PR (avast/retdec-regression-tests#121)
adds tests as well as changes needed for existing tests.

…ut headers When the program involves dynamically-linked functions like _Znwj (operator new) that return a pointer, it is necessary to have prototypes for them, since otherwise they will be implicitly deduced to return "int" which cannnot be dereferenced. Previously RetDec was emitting comments telling which functions were dynamically linked. This change moves them up before the functions are emitted and instead emits prototypes for the functions. However, RetDec also inserts includes of headers for functions for with know headers. We do not emit prototypes for functions with headers as that would be redundant. As a result, some dynamically-linked functions that used to show in the comments no longer appear as the included header will declare them. The section header comment for dynamically-linked functions is only produced if some prototypes are written for dynamically-linked functions. A related PR will have added tests as well as changes needed for existing tests.

richardlford · 2022-08-04T17:35:18Z

@PeterMatula, could you please review this change? This continues my effort to get dynamically-linked functions to decompile correctly.

PeterMatula · 2022-08-15T13:38:54Z

Thanks @richardlford for another improvement. In general I like it, however did you notice a small regression in this test? Looks like before, we applied type info to functions in Dynamically Linked Functions section, but now we don't do so for the ones in Dynamically Linked Functions Without Header. Were you aware of this? Wouldn't it be better if we tried to apply known type info to these functions, or would it potentially backfire in some cases? I can try to identify the place where the info is applied if needed.

richardlford · 2022-08-16T16:09:09Z

In general I like it, however, did you notice a small regression in this test? Looks like before, we applied type info to functions in Dynamically Linked Functions section, but now we don't do so for the ones in Dynamically Linked Functions Without Header. Were you aware of this? Wouldn't it be better if we tried to apply known type info to these functions, or would it potentially backfire in some cases? I can try to identify the place where the info is applied if needed.

@PeterMatula, I have investigated this further. Here are some findings:

For the test that you referenced, the link-time-information in support/types/windrivers.json gives the header for the dynamically-linked functions (e.g. KeStallExecutionProcessor is declared in minwin/ntosp.h), but the llvmir2hll pass does not use that information but uses the information from the retdec::llvmir2hll::Semantics class and its subclasses. The WinAPISemantics class has information on some Windows headers but does not have it for those mentioned in windrivers.json. It seems like it would be useful if the information on headers were centralized. I could take the information about headers from windrivers.json and add it to the WinAPISemantics class. Do you want me to do that? If I did, all of the dynamically-linked functions in this test would have an associated header and thus would disappear (and their headers included instead)
The former comments (before this change) about dynamically-linked functions were obtained from
retdec::common::Function::getDeclarationString. retdec::common::Function::setDeclarationString
is called in the following ways:
- In the param_return pass, and in lti.cpp, Lti::getPairFunctionFree is called and its second component
  is a retdec::ctypes::Function which can store the function declaration as a string.
- when a retdec::common::Function is being deserialized, though that does not appear that
  this happens (according to CLion).
So the former comments come from retdec::ctypes::Function. It gets its declaration either from the lti (i.e. the json files in support/types), or from demangling data (which is used if lti information is not available and the name is detected to be mangled). For the test you referenced, none of the functions are mangled.
On the other hand, the prototypes that are emitted are based on the contents of the
retdec::llvmir2hll::Function objects. If we wanted these to look more like the declarations from the lti then we would need to consult the lti when doing the translation. As it is, if we used the declarations from the lti (i.e. the "decl" field from the json), we would not get a successful compile unless we also emitted definitions of the user-defined types referenced by these declarations. That could be possible but would take some work.
The prototypes being emitted by this change are consistent with the types in the LLVM.

In summary, I think it would require a bit more work to make the function prototypes look like the lti declarations. On the other hand, it would be relatively easy to include the lti declaration as a comment just above each prototype, if you think that would be helpful.

What do you think? Do you have suggestions for how to proceed?

richardlford · 2022-08-29T18:27:43Z

@PeterMatula, ping?

PeterMatula · 2022-08-30T13:51:24Z

Sorry about the delay @richardlford, I know about it, I just didn't get around to looking into it more so I can answer. I have a look tomorrow and let you know.

PeterMatula · 2022-08-31T15:00:48Z

Sorry for taking so long @richardlford, and thanks for the improvement.

I looked into it again and realized I misunderstood what exactly was going on the first time. I thought there is a regression on an LLVM IR level, and that we are possibly losing correct data type information on function parameters. But that is not the case, all the info is there, we just generate it slightly differently. The output could be further improved with the info we have by either omitting all the linked functions we know via LTI, or by using the argument names from the same source. But this is not the goal of your PR, so let's not complicate it. You are right about this:

It seems like it would be useful if the information on headers were centralized.

Or even better, it would be best to use external knowledge like LTI information only once somewhere in the beginning and embed it all into LLVM IR either directly or via some well-documented metadata, so that we have a fully self-contained LLVM IR and further passes do not have to access specialized resources. But like I said, this is just an idea for further improvements, it does not block your work.

richardlford mentioned this pull request Aug 4, 2022

Add and modify tests for prototypes of dynamically-linked functions avast/retdec-regression-tests#121

Merged

s3rvac requested a review from PeterMatula August 5, 2022 04:10

s3rvac assigned PeterMatula Aug 5, 2022

PeterMatula merged commit dcaaad5 into avast:master Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prototypes for dynamically-linked functions without headers #1092

Add prototypes for dynamically-linked functions without headers #1092

richardlford commented Aug 4, 2022 •

edited

Loading

richardlford commented Aug 4, 2022

PeterMatula commented Aug 15, 2022

richardlford commented Aug 16, 2022 •

edited

Loading

richardlford commented Aug 29, 2022

PeterMatula commented Aug 30, 2022

PeterMatula commented Aug 31, 2022 •

edited

Loading

Add prototypes for dynamically-linked functions without headers #1092

Add prototypes for dynamically-linked functions without headers #1092

Conversation

richardlford commented Aug 4, 2022 • edited Loading

richardlford commented Aug 4, 2022

PeterMatula commented Aug 15, 2022

richardlford commented Aug 16, 2022 • edited Loading

richardlford commented Aug 29, 2022

PeterMatula commented Aug 30, 2022

PeterMatula commented Aug 31, 2022 • edited Loading

richardlford commented Aug 4, 2022 •

edited

Loading

richardlford commented Aug 16, 2022 •

edited

Loading

PeterMatula commented Aug 31, 2022 •

edited

Loading