Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add Additional Details to License Check #2442

Merged
merged 52 commits into from
Nov 28, 2022
Merged

Conversation

shissam
Copy link
Contributor

@shissam shissam commented Nov 9, 2022

What kind of change does this PR introduce?

This is a proposed feature enhancement to the License Check which servers two purposes

  • Provide specific details as to which license was detected

  • Provide the basis to add gradation to license analysis

  • PR title follows the guidelines defined in our pull request documentation

What is the current behavior?

License check looks in the repo for specific license files and some matching content. Score is either min or max depending if a file is found - regardless of the license content.

What is the new behavior (if this is a feature change)?**

  • introduce clients/licenses.go which embodies this github api (similar to clients/languages.go) - it would be plural as it may come down to the point where gitlab returns multiple licenses for a repo (I have a real-world example).
  • This github client would populate a structure like the following:
type License struct {
        Key      string // RepositoryLicense.GetLicense().GetKey()
        Name     string // RepositoryLicense.GetLicense().GetName()
        Path     string // RepositoryLicense.GetName()
        Size     int    // RepositoryLicense.GetSize()
        SPDXId   string // RepositoryLicense.GetLicense().GetSPDXID()
        Type     string // RepositoryLicense.GetType()
}

where (example from ossf/scorecard):

{
 Key:apache-2.0
 Name:Apache License 2.0
 Path:LICENSE
 Size:11356
 SPDXId:Apache-2.0 
Type:file
}
  • modify checks/raw/license.go to follow one of two possible paths:
    -- pathOne try the repoClient (as in c.RepoClient.ListLicenses()), if a license is reported by github USE that and return
    -- pathTwo if the repoClient returns nil, continue with the legacy logic and return what is always returned
  • there are a number of advantages to this approach:
    -- speedup: for github, my test of c.RepoClient.ListLicenses() returns in less than 5 seconds, when compared to other repos with many files (like torvalds/linux this check takes over a minute
    -- extensibility: this test with github is promising, and gitlab does appear to have a api for getting this information (https://docs.gitlab.com/ee/api/templates/licenses.html)
    -- confidence: using the github api for 'sensing' which specific license is in use I assume is higher (more review on their algorithm which might use NLP)--I tried a license on my site which changed the work Apache to Apacje in the complete Apache License 2.0 file and github still sensed it was APL-2.0 (very cool).
    -- standardization: I am encouraged by the use of the SPDXId in the github API, if it is true that gitlab using the same standard identifier for licenses - this would be great for the community
    -- usability: using scorecard to retrieve the specific license detected would a) make this information more readily available to the end-user; and b) would support develop pipelines that need to sense the license (by name) as a risk management activity without having to move to other means to acquire such license information
  • with this approach, propose changing checker.LicenseData{}:
type License struct {
        Key      string // from clients.License.Key
        Name     string // from clients.License.Name
        Size     int    // clients.License.Size (Size maybe not here but in type File struct
        SPDXId   string // clients.License.SPDXId
        Attribution     AttributionType // license sourced from LicenseRepoAPI or ScorecardEngine
}

// one file contains one license.
type LicenseFile struct {
        License License
        File    File
}

// LicenseData contains the raw results
// for the License check.
// Some repos may have more than one license.
type LicenseData struct {
        LicenseFiles []LicenseFile
}

Which issue(s) this PR fixes

Fixes #1369

With a more complete proposal at https://github.com/ossf/scorecard/issues/1369#issuecomment-1304831531

Special notes for your reviewer

I would like to have a discussion with the owners/maintainer about idea for gradating the evaluation, with additional information from the repo API, that may lead to greater confidence in scores. This PR would take a step in that direction.

Does this PR introduce a user-facing change?

format=raw would have the entire LicenseData found by scorecard in the file or the repo API
format=[json|default] would have the rational for any such decided on gradation.

For user-facing changes, please add a concise, human-readable release note to
the release-note

(In particular, describe what changes users might need to make in their
application as a result of this pull request.)


shissam and others added 30 commits October 20, 2022 16:01
* Examines and awards points for linked content (URLs / Emails)

* Examines and awards points for hints of disclosure and vulnerability practices

* Examines and awards points for hints of elaboration of timelines

Signed-off-by: Scott Hissam <[email protected]>
Signed-off-by: Scott Hissam <[email protected]>
…t length over the length of the linked content for urls and emails

Signed-off-by: Scott Hissam <[email protected]>
…ecks.yaml for generate-docs

Signed-off-by: Scott Hissam <[email protected]>
…nts)

* replaced reason strings with log.Info & log.Warn (as seen in --show-details)

* internal assertion check for nil (*pinfo) and empty pfile

* internal switched to FileTypeText over FileTypeSource

* internal implement type SecurityPolicyInformationType/SecurityPolicyInformation revised SecurityPolicyData to support only one file

* revised expected unit-test results and revised unit-test to reflect the new SecurityPolicyData type

Signed-off-by: Scott Hissam <[email protected]>
…or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>
…or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>
…or one email(s) found; e2e tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>
…licy file to track hits by line number

Signed-off-by: Scott Hissam <[email protected]>
…o support the potential for multiple security policy files.

Signed-off-by: Scott Hissam <[email protected]>
…y files only after future improvements to aggregating scoring across such files are designed. For now the security policy behaves as originally designed to stop once one of the expected policy files are found in the repo

Signed-off-by: Scott Hissam <[email protected]>
…s and removed unneeded break statements in the code

Signed-off-by: Scott Hissam <[email protected]>
… filename from the code and introduced FileSize to checker.File type and removed the SecurityContentLength which was used to hold that information for the new security policy assessment

Signed-off-by: Scott Hissam <[email protected]>
…s found in the org level repos

Signed-off-by: Scott Hissam <[email protected]>
Signed-off-by: Scott Hissam <[email protected]>
* Reorganize

Signed-off-by: Raghav Kaul <[email protected]>

* Working commit

Signed-off-by: Raghav Kaul <[email protected]>

* Compile with local scorecard; go mod tidy

Signed-off-by: Raghav Kaul <[email protected]>

* Add signing code

Heavily borrowed from https://github.com/grafeas/kritis/blob/master/cmd/kritis/signer/main.go

Signed-off-by: Raghav Kaul <[email protected]>

* Update deps

* Naming
* Makefile

Signed-off-by: Raghav Kaul <[email protected]>

* Edit license, add lint.yml

Signed-off-by: Raghav Kaul <[email protected]>

* checks: go mod tidy, license

Signed-off-by: Raghav Kaul <[email protected]>

* Address PR comments

* Split into checker/signer files
* Naming convention

Signed-off-by: Raghav Kaul <[email protected]>

* License, remove golangci.yml

Signed-off-by: Raghav Kaul <[email protected]>

* Address PR comments

* Use cobra

Signed-off-by: Raghav Kaul <[email protected]>

* Add tests for root command

Signed-off-by: Raghav Kaul <[email protected]>

* Filter out checks that aren't needed for policy evaluation

Signed-off-by: Raghav Kaul <[email protected]>

* Add `make` targets for attestor; submit coverage stats

Signed-off-by: Raghav Kaul <[email protected]>

* Improvements

* Use sclog instead of glog
* Remove unneeded subcommands
* Formatting

Signed-off-by: Raghav Kaul <[email protected]>

* Flags: Make note-name constant and fix messaging

Signed-off-by: Raghav Kaul <[email protected]>

* Remove SupportedRequestTypes

Signed-off-by: Raghav Kaul <[email protected]>

* go mod tidy

Signed-off-by: Raghav Kaul <[email protected]>

* go mod tidy, makefile

Signed-off-by: Raghav Kaul <[email protected]>

* Fix GH actions run

Signed-off-by: Raghav Kaul <[email protected]>

Signed-off-by: Raghav Kaul <[email protected]>
Signed-off-by: Scott Hissam <[email protected]>
…inated vulnerability disclosure guidelines

Signed-off-by: Scott Hissam <[email protected]>
…hing, moved licenses to raw, raw now mimics GH API return values for key, name, etc., updated unit tests and raw results accordingly

Signed-off-by: Scott Hissam <[email protected]>
… reworked some of the code comments, added map generation to TestLicense, added an additional mutex for the regex group identifier index, removed spurious prints, revised unit test accordingly, updated documentation.

Signed-off-by: Scott Hissam <[email protected]>
@shissam shissam temporarily deployed to integration-test November 27, 2022 19:56 Inactive
@github-actions
Copy link

Integration tests success for
[1f55b7a]
(https://github.com/ossf/scorecard/actions/runs/3559735104)

@shissam shissam temporarily deployed to integration-test November 28, 2022 14:28 Inactive
@github-actions
Copy link

Integration tests success for
[1ef1703]
(https://github.com/ossf/scorecard/actions/runs/3565703124)

checker/raw_result.go Outdated Show resolved Hide resolved
…tion constants to be more meaningful, update documentation as necessary for changes

Signed-off-by: Scott Hissam <[email protected]>
auto-merge was automatically disabled November 28, 2022 17:54

Head branch was pushed to by a user without write access

@shissam shissam temporarily deployed to integration-test November 28, 2022 17:54 Inactive
@github-actions
Copy link

Integration tests success for
[c1b7373]
(https://github.com/ossf/scorecard/actions/runs/3567344961)

Copy link
Contributor

@laurentsimon laurentsimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for explaining everything in such details, and the amount of work you've put in this PR!

@laurentsimon laurentsimon merged commit 28b116f into ossf:main Nov 28, 2022
raghavkaul added a commit to raghavkaul/scorecard that referenced this pull request Feb 9, 2023
* ✨ Improved Security Policy Check (ossf#2137)

* Examines and awards points for linked content (URLs / Emails)

* Examines and awards points for hints of disclosure and vulnerability practices

* Examines and awards points for hints of elaboration of timelines

Signed-off-by: Scott Hissam <[email protected]>

* Repaired Security Policy to correctly use linked content length for evaluation

Signed-off-by: Scott Hissam <[email protected]>

* gofmt'ed changes

Signed-off-by: Scott Hissam <[email protected]>

* Repaired the case in the evaluation which was too sensitive to content length over the length of the linked content for urls and emails

Signed-off-by: Scott Hissam <[email protected]>

* added unit test cases for the new content-based Security Policy checks

Signed-off-by: Scott Hissam <[email protected]>

* reverted the direct (mistaken) change to checks.md and updated the checks.yaml for generate-docs

Signed-off-by: Scott Hissam <[email protected]>

* ✨ Improved Security Policy Check (ossf#2137) (revisted based on comments)

* replaced reason strings with log.Info & log.Warn (as seen in --show-details)

* internal assertion check for nil (*pinfo) and empty pfile

* internal switched to FileTypeText over FileTypeSource

* internal implement type SecurityPolicyInformationType/SecurityPolicyInformation revised SecurityPolicyData to support only one file

* revised expected unit-test results and revised unit-test to reflect the new SecurityPolicyData type

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; e2e tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* Addressed PR comments; added telemetry for policy hits in security policy file to track hits by line number

Signed-off-by: Scott Hissam <[email protected]>

* Resolved merge conflict with checks.yaml

Signed-off-by: Scott Hissam <[email protected]>

* updated raw results to emit all the raw information for the new security policy check

Signed-off-by: Scott Hissam <[email protected]>

* Resolved merge conflicts and lint errors with json_raw_results.go

Signed-off-by: Scott Hissam <[email protected]>

* Addressed review comments to reorganize security policy data struct to support the potential for multiple security policy files.

Signed-off-by: Scott Hissam <[email protected]>

* Added logic to the security policy to process multiple security policy files only after future improvements to aggregating scoring across such files are designed. For now the security policy behaves as originally designed to stop once one of the expected policy files are found in the repo

Signed-off-by: Scott Hissam <[email protected]>

* added comments regarding the capacity to support multiple policy files and removed unneeded break statements in the code

Signed-off-by: Scott Hissam <[email protected]>

* Addressed review comments to remove the dependency on the path in the filename from the code and introduced FileSize to checker.File type and removed the SecurityContentLength which was used to hold that information for the new security policy assessment

Signed-off-by: Scott Hissam <[email protected]>

* restored reporting full security policy path and filename for policies found in the org level repos

Signed-off-by: Scott Hissam <[email protected]>

* Resolved conflicts in checks.yaml for documentation

Signed-off-by: Scott Hissam <[email protected]>

* ✨ CLI for scorecard-attestor (ossf#2309)

* Reorganize

Signed-off-by: Raghav Kaul <[email protected]>

* Working commit

Signed-off-by: Raghav Kaul <[email protected]>

* Compile with local scorecard; go mod tidy

Signed-off-by: Raghav Kaul <[email protected]>

* Add signing code

Heavily borrowed from https://github.com/grafeas/kritis/blob/master/cmd/kritis/signer/main.go

Signed-off-by: Raghav Kaul <[email protected]>

* Update deps

* Naming
* Makefile

Signed-off-by: Raghav Kaul <[email protected]>

* Edit license, add lint.yml

Signed-off-by: Raghav Kaul <[email protected]>

* checks: go mod tidy, license

Signed-off-by: Raghav Kaul <[email protected]>

* Address PR comments

* Split into checker/signer files
* Naming convention

Signed-off-by: Raghav Kaul <[email protected]>

* License, remove golangci.yml

Signed-off-by: Raghav Kaul <[email protected]>

* Address PR comments

* Use cobra

Signed-off-by: Raghav Kaul <[email protected]>

* Add tests for root command

Signed-off-by: Raghav Kaul <[email protected]>

* Filter out checks that aren't needed for policy evaluation

Signed-off-by: Raghav Kaul <[email protected]>

* Add `make` targets for attestor; submit coverage stats

Signed-off-by: Raghav Kaul <[email protected]>

* Improvements

* Use sclog instead of glog
* Remove unneeded subcommands
* Formatting

Signed-off-by: Raghav Kaul <[email protected]>

* Flags: Make note-name constant and fix messaging

Signed-off-by: Raghav Kaul <[email protected]>

* Remove SupportedRequestTypes

Signed-off-by: Raghav Kaul <[email protected]>

* go mod tidy

Signed-off-by: Raghav Kaul <[email protected]>

* go mod tidy, makefile

Signed-off-by: Raghav Kaul <[email protected]>

* Fix GH actions run

Signed-off-by: Raghav Kaul <[email protected]>

Signed-off-by: Raghav Kaul <[email protected]>
Signed-off-by: Scott Hissam <[email protected]>

* removed whitespace before stanza for Run attestor e2e

Signed-off-by: Scott Hissam <[email protected]>

* resolved code review and doc review comments

Signed-off-by: Scott Hissam <[email protected]>

* repaired the link for the maintainer's guide for supporting the coordinated vulnerability disclosure guidelines

Signed-off-by: Scott Hissam <[email protected]>

* initial implementation of ossf#1369 (comment) to provide more license details

Signed-off-by: Scott Hissam <[email protected]>

* draft implementation to provide more information on license details

Signed-off-by: Scott Hissam <[email protected]>

* repaired a misspelling

Signed-off-by: Scott Hissam <[email protected]>

* Changed to handle http errors with 404 not found as being a non-error for not being able to find a license

Signed-off-by: Scott Hissam <[email protected]>

* Return an error status similar to other gitlab checks

Signed-off-by: Scott Hissam <[email protected]>

* add new raw licenses data

Signed-off-by: Scott Hissam <[email protected]>

* updated e2e test as new license check generates more info and warn as scores change as license file content is not parsed

Signed-off-by: Scott Hissam <[email protected]>

* added numerous more test filenames and a shouldFail boolean as some filenames will fail that do not meet checks.md rules

Signed-off-by: Scott Hissam <[email protected]>

* license check now, primarily, uses the GH API for checking licenses

Signed-off-by: Scott Hissam <[email protected]>

* updated local checker as new license check generates more info and warn as scores change as license file content is not parsed

Signed-off-by: Scott Hissam <[email protected]>

* added draft license gradation for scoring, add a map to OSI and FSF licenses, added GH API for retrieving repo license, revamp license filename matching when not using a repo API for detecting license files.

Signed-off-by: Scott Hissam <[email protected]>

* repaired race condition for case insensitive map, improved regex matching, moved licenses to raw, raw now mimics GH API return values for key, name, etc., updated unit tests and raw results accordingly

Signed-off-by: Scott Hissam <[email protected]>

* completed disambiguation of SPDX Identifiers and filename extensions, reworked some of the code comments, added map generation to TestLicense, added an additional mutex for the regex group identifier index, removed spurious prints, revised unit test accordingly, updated documentation.

Signed-off-by: Scott Hissam <[email protected]>

* removed repo Key from LicenseInformation as unneeded, changed attribution constants to be more meaningful, update documentation as necessary for changes

Signed-off-by: Scott Hissam <[email protected]>

Signed-off-by: Scott Hissam <[email protected]>
Signed-off-by: Raghav Kaul <[email protected]>
Co-authored-by: raghavkaul <[email protected]>
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Jul 9, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Jul 9, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Jul 9, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Aug 22, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Nov 18, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add gradation to license analysis
3 participants