Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanity check for length of a PE resource type name #974

Merged
merged 2 commits into from
Jul 12, 2021

Conversation

HoundThe
Copy link
Member

Aims to fix #959

Sometimes the type name pointer points to (probably?) junk data, which results in a string with absurd length and content. A simple sanity check on the string length could solve this problem.

As in example 097a0f8b8c3f2b90f5360f27da5ef8e5a7406c6f211c8d3e56a1671933508bc1. More details can be seen in the mentioned issue.

If this solution is "OK", I'll add a test case in regression test suite.

Copy link
Collaborator

@PeterMatula PeterMatula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string length solution is OK. However, I would like to discuss few things - possibly propose other solutions, that could work together with the ultimate string length limit implemented as is:

  • Are these strings typically, or by definition, null-terminated? Would it make sense to add check that would break string loading if either simple null char, or wide null char, was encountered?
  • Are these strings typically readable? Would it be too strict if we added a check that would break parsing if a non-printable (ASCII) character was encountered?

But these are just things to discuss - rule out. In general, I like that we would not be too strict/clever and do it like LIEF.

@HoundThe
Copy link
Member Author

HoundThe commented Jul 9, 2021

I am afraid that the resource string has the following format (https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#resource-directory-string): 2 byte length at start, and then the variable length Unicode string, which is probably UTF 16 (I don't see it defined there to confirm)? So no null terminators and parsing this would be probably a bit complex, but maybe doable?

But I am thinking if it makes sense, to have wide null in a string even tho it is not null terminated. I am not 100 % sure about the variable length UTF 16 encoding and what it means to have 0x0000 🤔 Might be a usable heuristic.

@PeterMatula
Copy link
Collaborator

Well, in that case 100 chars limit it is ok:

  1. Do not try to do "printable" chars heuristics, I don't think it makes sense for UTF 16.
  2. Yes, 0x0000 should be terminating char for such strings, but because there is also string size, and LIEF is not using this heuristic, I'm afraid there might be a reason. So lets just keep it like this, and solve it only ad-hoc if it causes another problem.

Btw, tests are OK. They failed because tests for #973 were merged to master and master was not merged here. Maybe we should/could configure TC to tests the state of PR after it is merged to master (@s3rvac?) but that might have other disadvantages? So I don't know.

@HoundThe I'm merging this, please add a regression test(s).

@s3rvac
Copy link
Member

s3rvac commented Jul 13, 2021

Btw, tests are OK. They failed because tests for #973 were merged to master and master was not merged here. Maybe we should/could configure TC to tests the state of PR after it is merged to master (@s3rvac?) but that might have other disadvantages? So I don't know.

Yes, this is a possibility. There are the following options:

  1. Build just the PR branch (this is the current state).
  2. Build just the PR as if it was merged to master (this is suggested in the quote).
  3. Build both the PR branch as well as the PR as if it was merged to master. This would result in two builds for every configuration on every PR.

If we choose to go with option 2 or 3, we will have to decide what to do in case there are merge conflicts (the build would fail in such case) and also consider proper changes to the handling of the retdec-regression-tests repo (e.g. when there is a branch having the same name in both the retdec and retdec-regression-tests repos, that branch is used for testing).

@PeterMatula
Copy link
Collaborator

@s3rvac I think, we are ultimately interested in state 2. Option 3 could be useful in some cases, but because builds on windows may take nearly an hour, it would often be too slow to work with. I think, merge conflicts are not a big problem - they need to be solved before merging to master anyway, therefore when they are solved, and sources presumably modified, tests need to run again anyway - i.e. they should be solved first and then we can run option 2 tests.
I created #985 to explore this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fileinfo: junk in PE resources
3 participants