Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prototypes for dynamically-linked functions without headers #1092

Merged
merged 1 commit into from
Aug 31, 2022

Conversation

richardlford
Copy link
Contributor

@richardlford richardlford commented Aug 4, 2022

When the program involves dynamically-linked functions like _Znwj
(operator new) that return a pointer, it is necessary to have
prototypes for them, since otherwise they will be implicitly deduced
to return "int" which cannnot be dereferenced.

Previously RetDec was emitting comments telling which functions were
dynamically linked. This change moves them up before the functions are
emitted and instead emits prototypes for the functions. However,
RetDec also inserts includes of headers for functions with known
headers. So this change does not emit prototypes for functions with headers as that
would be redundant. As a result, some dynamically-linked functions
that used to show in the comments no longer appear as the included
header will declare them.

The section header comment for dynamically-linked functions is only
produced if some prototypes are written for dynamically-linked
functions.

A related PR (avast/retdec-regression-tests#121)
adds tests as well as changes needed for existing tests.

…ut headers

When the program involves dynamically-linked functions like _Znwj
(operator new) that return a pointer, it is necessary to have
prototypes for them, since otherwise they will be implicitly deduced
to return "int" which cannnot be dereferenced.

Previously RetDec was emitting comments telling which functions were
dynamically linked. This change moves them up before the functions are
emitted and instead emits prototypes for the functions. However,
RetDec also inserts includes of headers for functions for with know
headers. We do not emit prototypes for functions with headers as that
would be redundant.  As a result, some dynamically-linked functions
that used to show in the comments no longer appear as the included
header will declare them.

The section header comment for dynamically-linked functions is only
produced if some prototypes are written for dynamically-linked
functions.

A related PR will have added tests as well as changes needed for
existing tests.
@richardlford
Copy link
Contributor Author

@PeterMatula, could you please review this change? This continues my effort to get dynamically-linked functions to decompile correctly.

@PeterMatula
Copy link
Collaborator

Thanks @richardlford for another improvement. In general I like it, however did you notice a small regression in this test? Looks like before, we applied type info to functions in Dynamically Linked Functions section, but now we don't do so for the ones in Dynamically Linked Functions Without Header. Were you aware of this? Wouldn't it be better if we tried to apply known type info to these functions, or would it potentially backfire in some cases? I can try to identify the place where the info is applied if needed.

@richardlford
Copy link
Contributor Author

richardlford commented Aug 16, 2022

In general I like it, however, did you notice a small regression in this test? Looks like before, we applied type info to functions in Dynamically Linked Functions section, but now we don't do so for the ones in Dynamically Linked Functions Without Header. Were you aware of this? Wouldn't it be better if we tried to apply known type info to these functions, or would it potentially backfire in some cases? I can try to identify the place where the info is applied if needed.

@PeterMatula, I have investigated this further. Here are some findings:

  • For the test that you referenced, the link-time-information in support/types/windrivers.json gives the header for the dynamically-linked functions (e.g. KeStallExecutionProcessor is declared in minwin/ntosp.h), but the llvmir2hll pass does not use that information but uses the information from the retdec::llvmir2hll::Semantics class and its subclasses. The WinAPISemantics class has information on some Windows headers but does not have it for those mentioned in windrivers.json. It seems like it would be useful if the information on headers were centralized. I could take the information about headers from windrivers.json and add it to the WinAPISemantics class. Do you want me to do that? If I did, all of the dynamically-linked functions in this test would have an associated header and thus would disappear (and their headers included instead)
  • The former comments (before this change) about dynamically-linked functions were obtained from
    retdec::common::Function::getDeclarationString. retdec::common::Function::setDeclarationString
    is called in the following ways:
    • In the param_return pass, and in lti.cpp, Lti::getPairFunctionFree is called and its second component
      is a retdec::ctypes::Function which can store the function declaration as a string.
    • when a retdec::common::Function is being deserialized, though that does not appear that
      this happens (according to CLion).
  • So the former comments come from retdec::ctypes::Function. It gets its declaration either from the lti (i.e. the json files in support/types), or from demangling data (which is used if lti information is not available and the name is detected to be mangled). For the test you referenced, none of the functions are mangled.
  • On the other hand, the prototypes that are emitted are based on the contents of the
    retdec::llvmir2hll::Function objects. If we wanted these to look more like the declarations from the lti then we would need to consult the lti when doing the translation. As it is, if we used the declarations from the lti (i.e. the "decl" field from the json), we would not get a successful compile unless we also emitted definitions of the user-defined types referenced by these declarations. That could be possible but would take some work.
  • The prototypes being emitted by this change are consistent with the types in the LLVM.

In summary, I think it would require a bit more work to make the function prototypes look like the lti declarations. On the other hand, it would be relatively easy to include the lti declaration as a comment just above each prototype, if you think that would be helpful.

What do you think? Do you have suggestions for how to proceed?

@richardlford
Copy link
Contributor Author

@PeterMatula, ping?

@PeterMatula
Copy link
Collaborator

Sorry about the delay @richardlford, I know about it, I just didn't get around to looking into it more so I can answer. I have a look tomorrow and let you know.

@PeterMatula PeterMatula merged commit dcaaad5 into avast:master Aug 31, 2022
@PeterMatula
Copy link
Collaborator

PeterMatula commented Aug 31, 2022

Sorry for taking so long @richardlford, and thanks for the improvement.

I looked into it again and realized I misunderstood what exactly was going on the first time. I thought there is a regression on an LLVM IR level, and that we are possibly losing correct data type information on function parameters. But that is not the case, all the info is there, we just generate it slightly differently. The output could be further improved with the info we have by either omitting all the linked functions we know via LTI, or by using the argument names from the same source. But this is not the goal of your PR, so let's not complicate it. You are right about this:

It seems like it would be useful if the information on headers were centralized.

Or even better, it would be best to use external knowledge like LTI information only once somewhere in the beginning and embed it all into LLVM IR either directly or via some well-documented metadata, so that we have a fully self-contained LLVM IR and further passes do not have to access specialized resources. But like I said, this is just an idea for further improvements, it does not block your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants