Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to suppress flagging good log4j files as potentially vulnerable if renamed #112

Open
mcainx opened this issue Mar 10, 2022 · 6 comments

Comments

@mcainx
Copy link

mcainx commented Mar 10, 2022

It would help if we had a way to suppress these matches (produced by the crawl command) when the application owner has upgraded to log4j-core-2.17.1.jar but renamed the file.

In this example, I scanned examples/good_version/log4j-core-2.17.1.jar but removed "core-" from the name.

[MATCH] invalid version - unknown CVE status detected in file log4j-2.17.1.jar. log4j versions: unknown. Reasons: JndiLookup class and package name matched
@mcainx mcainx changed the title Add a way to supress flagging good log4j files as potentially vulnerable when renamed Add a way to suppress flagging good log4j files as potentially vulnerable if renamed Mar 14, 2022
@mcainx
Copy link
Author

mcainx commented Mar 15, 2022

I could specify "crawl --disable-unknown-versions" but that might suppress the flagging of other issues we need to remediate and in this case the version IS known (and known to be non-vulnerable) via the md5 checksum matching in vulnerabilityFileWalkFunc().
The problem is that this information is lost by the time control returns to Identify() and that all resolveNameAndContentFindings() can see that there was a finding but not that is was -- an md5 match of a good (2.17.1) JndiManager.class so it issues the "invalid version" warning.

@mcainx
Copy link
Author

mcainx commented Mar 15, 2022

I "fixed" this by making vulnerabilityFileWalkFunc() save version info in a global variable when the md5 check matches a non-vulnerable version and added code in resolveNameAndContentFindings() to use this global to avoid flagging files like this as invalid / potentially vulnerable.

This is a terrible kludge but is small enough for me to port to later versions of this code if necessary.

I tested this by crawling the entire examples directory after making a copy of the "good_version" subdir containing the exact same files but renamed:

$ sum examples/good_*/*|sort
40977 814 examples/good_renamed/log4j-test-2.3.2.jar
40977 814 examples/good_version/log4j-core-2.3.2.jar
52341 1749 examples/good_renamed/log4j-test-2.17.1.jar
52341 1749 examples/good_version/log4j-core-2.17.1.jar
7413 1644 examples/good_renamed/log4j-test-2.12.4.jar
7413 1644 examples/good_version/log4j-core-2.12.4.jar

I verified the results and the only differences were that these messages were now absent

[MATCH] invalid version - unknown CVE status detected in file examples/good_renamed/log4j-test-2.17.1.jar. log4j versions: unknown. Reasons: JndiLookup class and package name matched
[MATCH] invalid version - unknown CVE status detected in file examples/good_renamed/log4j-test-2.12.4.jar. log4j versions: unknown. Reasons: JndiLookup class and package name matched
[MATCH] invalid version - unknown CVE status detected in file examples/good_renamed/log4j-test-2.3.2.jar. log4j versions: unknown. Reasons: JndiLookup class and package name matched

@mcainx
Copy link
Author

mcainx commented Mar 15, 2022

It would help if someone could fix this properly. I'm not submitting a pull request because this is just a quick hack intended to reduce our false positives.

In case this helps, here are the changes I needed to make to v1.9.0 for this

$ git diff
diff --git a/pkg/crawl/identify.go b/pkg/crawl/identify.go
index 9620e3b..12b280f 100644
--- a/pkg/crawl/identify.go
+++ b/pkg/crawl/identify.go
@@ -31,6 +31,8 @@ import (
        "go.uber.org/ratelimit"
 )

+var xx_version_hack = []string{}
+
 var log4jRegex = regexp.MustCompile(`(?i)^log4j-core-(\d+\.\d+(?:\..*)?)\.jar$`)

 type Identifier interface {
@@ -84,6 +86,7 @@ func (i *Log4jIdentifier) Identify(ctx context.Context, path string, filename st

        nestedPath := []string{path}
        log4jNameMatch, nameVulnerability, nameVersions := i.archiveNameVulnerability(nestedPath)
+xx_version_hack = []string{}
        inArchive, inZipVs, skipped, fileLevelProceed, err := i.findArchiveContentVulnerabilities(ctx, 0, walker, nestedPath)
        if err != nil {
                return 0, errors.Wrapf(err, "failed to walk archive %s", path)
@@ -116,6 +119,8 @@ func resolveNameAndContentFindings(nameMatchesLog4jJar bool, nameFinding Finding
                return NothingDetected, nil
        }

+if len(xx_version_hack) > 0 { return NothingDetected, nil }
+
        findings := nameFinding | archiveFinding
        if findings == NothingDetected {
                return findings, nil
@@ -231,6 +236,7 @@ func (i *Log4jIdentifier) vulnerabilityFileWalkFunc(depth uint, result *Finding,
                        if versionMatch {
                                version, parsed := ParseLog4jVersion(versionInFile)
                                if !parsed || !version.Vulnerable() {
+if versionInFile != "" { xx_version_hack = append(xx_version_hack, filename + ":" + versionInFile) }
                                        return true, nil
                                }
                                versions[versionInFile] = struct{}{}

identify.go.txt

@glynternet
Copy link
Contributor

Hi @mcainx,

Unfortunately we do not have cycles to implement the change that you are after. I understand that this will be a frustrating answer.
As you've shown with your comment, the quick solution can be a little bit messy. A proper solution would take more work and probably could involve breaking changes.

The delete command was implemented after the crawl command and has a lot less of the legacy code mishaps within it from when we were first iterating on vulnerability finding very fast at first.
Using the delete command but in dry mode actually surfaces only findings that map to CVEs, with some configurability through the flags. Is it possible that this can provide a workaround for you?

@mcainx
Copy link
Author

mcainx commented Apr 13, 2022

I tested this and it looks like this delete --dry-run method can help some of our application owners identify false positives, particularly if they're running the official version of the code.

We'll probably continue using this global variable kludge for large scale scanning since our aggregation processing relies on crawl --json output for dashboards and tracking.

I already maintain a fair number of local modifications to support automated large scale scanning and aggregation anyway -- like the addition of metadata to the crawl --json output (timestamp, hostname, IP addresses, log4j version number, etc) and output upload options (AWS S3 / syslog) so maintaining these additional modifications isn't a big deal.

(Yes, I do realize I could have done a lot of this aggregation support via a wrapper but we needed to make running this as simple as possible.)

FWIW, if this issue is found to affect other companies, I wouldn't mind seeing this kind of "fix" dropped into the official code as awkward as it is. At least it gets the job done and that's what counts. (Just something to keep in mind.)

@glynternet
Copy link
Contributor

Thanks @mcainx, I appreciate your understanding and am glad to see that log4j-sniffer has at least provided a decent base for you to action your remediations from!

Let's leave this issue open and see what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants