Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add multiline string finder in helper script #1690

Merged
merged 16 commits into from
Jun 16, 2022
Merged

feat: add multiline string finder in helper script #1690

merged 16 commits into from
Jun 16, 2022

Conversation

b31ngd3v
Copy link
Contributor

@b31ngd3v b31ngd3v commented Jun 7, 2022

image

@b31ngd3v b31ngd3v changed the title feat: add multiline string finding in helper script feat: add multiline string finder in helper script Jun 7, 2022
Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. Is there any way we could add a test for this functionality?

Copy link
Contributor

@BreadGenie BreadGenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

There are some scenarios where the script could possibly improve

  • Catching version string when the lines that matter are not adjacent but is after a couple lines. Example:
    r"var PACKAGE_NAME = 'gnome\-shell';\r?\n/\* The version of this package \*/\r?\nvar PACKAGE_VERSION = '([0-9]+\.[0-9]+(\.[0-9]+)?)';"
  • Catching version string when the line with version is before and the other relevant string (with package name) is after.

cve_bin_tool/helper_script.py Outdated Show resolved Hide resolved
@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

  • Catching version string when the lines that matter are not adjacent but is after a couple lines. Example:
    r"var PACKAGE_NAME = 'gnome\-shell';\r?\n/\* The version of this package \*/\r?\nvar PACKAGE_VERSION = '([0-9]+\.[0-9]+(\.[0-9]+)?)';"

what should be the maximum line gap between the product string and version string? if the line gap is too high then the signature won't be that good, right? @BreadGenie

@BreadGenie
Copy link
Contributor

BreadGenie commented Jun 8, 2022

@b31ngd3v We could give a maximum line gap but we can also do without it. Example:
r"var PACKAGE_NAME = 'gnome\-shell';(?:(?:\r?\n.*)*)var PACKAGE_VERSION = '([0-9]+\.[0-9]+(\.[0-9]+)?)';" for the above example.

The (?:(?:\r?\n.*)*) part would catch all newline and the characters in between the package name string and version string.

@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

The (?:(?:\r?\n.*)*) part would catch all newline and the characters in between the package name string and version string.

@BreadGenie don't you think we should use non-greedy mode (?:(?:\r?\n.*?)*) ?

@BreadGenie
Copy link
Contributor

@b31ngd3v yep

@codecov-commenter
Copy link

codecov-commenter commented Jun 8, 2022

Codecov Report

Merging #1690 (65220dc) into main (0f24b06) will increase coverage by 3.30%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1690      +/-   ##
==========================================
+ Coverage   78.55%   81.86%   +3.30%     
==========================================
  Files         298      298              
  Lines        6216     6258      +42     
  Branches     1011     1023      +12     
==========================================
+ Hits         4883     5123     +240     
+ Misses       1115      931     -184     
+ Partials      218      204      -14     
Flag Coverage Δ
longtests 81.86% <100.00%> (+3.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cve_bin_tool/helper_script.py 85.32% <100.00%> (+26.49%) ⬆️
test/test_helper_script.py 100.00% <100.00%> (ø)
cve_bin_tool/nvd_api.py 75.20% <0.00%> (-4.80%) ⬇️
cve_bin_tool/extractor.py 55.05% <0.00%> (+1.01%) ⬆️
cve_bin_tool/version_scanner.py 77.51% <0.00%> (+1.93%) ⬆️
cve_bin_tool/cli.py 72.95% <0.00%> (+2.45%) ⬆️
cve_bin_tool/util.py 78.37% <0.00%> (+2.70%) ⬆️
cve_bin_tool/checkers/glibc.py 100.00% <0.00%> (+4.16%) ⬆️
cve_bin_tool/available_fix/debian_cve_tracker.py 83.67% <0.00%> (+6.12%) ⬆️
cve_bin_tool/available_fix/redhat_cve_tracker.py 80.35% <0.00%> (+8.92%) ⬆️
... and 8 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

jackson-databind

image

gnome-shell

image

@BreadGenie what do you think?

@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

image
these ones don't look good, should we remove the ones with bare version? @BreadGenie

@BreadGenie
Copy link
Contributor

@b31ngd3v it looks nice, but do we need multiline pattern when a single line pattern exist? Or we could mark the single line pattern as a better choice in the output?

@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

@b31ngd3v it looks nice, but do we need multiline pattern when a single line pattern exist? Or we could mark the single line pattern as a better choice in the output?

good idea! if a single line pattern exists, then we should only print that one.

@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

image these ones don't look good, should we remove the ones with bare version?

@BreadGenie btw what do you think about this?

@BreadGenie
Copy link
Contributor

good idea! if a single line pattern exists, then we should only print that one.

Yep

@BreadGenie
Copy link
Contributor

BreadGenie commented Jun 8, 2022

image
these ones don't look good, should we remove the ones with bare version? @BreadGenie

We will only need one of those and should be used if we don't have any other choices. Multiline patterns could end up being expensive.

Also I didn't get what you meant by bare version

@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

image
these ones don't look good, should we remove the ones with bare version? @BreadGenie

We will only need one of those and should be used if we don't have any other choices. Multiline patterns could end up being expensive.

Also I didn't get what you meant by bare version

by bare version, i meant the second part (after (?:(?:\r?\n.*?)*)) where the version is (in the image), is completely naked it doesn't have anything like version=3.38.4, don't you think we should remove those?

@BreadGenie
Copy link
Contributor

BreadGenie commented Jun 8, 2022

by bare version, i meant the second part (after (?:(?:\r?\n.*?)*)) where the version is (in the image), is completely naked it doesn't have anything like version=3.38.4, don't you think we should remove those?

It could work in multiline mode but it could also generate false positives, so we can remove that.

@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

cleaned up the output, previously it was printing all combinations of version strings and product strings after comparing the line numbers, now it's only printing the combinations of version strings and the closest product string line to that version string after comparing the line numbers. and also if a one line signature exists then it's going to ignore all multi-line signatures.

gnome-shell [it's printing the same signature two times, because those two are in different lines in the binary]

image

jackson-databind

image

@b31ngd3v
Copy link
Contributor Author

b31ngd3v commented Jun 8, 2022

thanks @BreadGenie for helping me out with this PR

@b31ngd3v b31ngd3v requested a review from terriko June 8, 2022 22:36
@b31ngd3v b31ngd3v marked this pull request as draft June 11, 2022 02:47
@b31ngd3v b31ngd3v marked this pull request as ready for review June 15, 2022 19:38
Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a very helpful feature, thank you! I have a feeling we might want to do some more precise tests later making sure it found specific expected patterns rather than "just not the max glob" but this should at least get us some basic coverage on the code.

@terriko terriko merged commit 6ecf198 into intel:main Jun 16, 2022
@b31ngd3v b31ngd3v deleted the 1690 branch June 16, 2022 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants