Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The rationale behind this helper command is the need to 'condense' th… #4698

Closed
wants to merge 1 commit into from
Closed

The rationale behind this helper command is the need to 'condense' th… #4698

wants to merge 1 commit into from

Conversation

porsche-rbieniek
Copy link

…e analyzer result files

delivered by multiple product teams into one summary analyzer result to be processed by the
ORT pipeline.
In this specific use case, the exact relationship of dependencies does not matter and it is
totally acceptable to work with a flat list of dependencies in the result reports.

This commit extends the helper CLI with a command to merge multiple analyzer result files into
one analyzer result file.

The algorithm to condense the result file work the following way:

  1. Take the first and second item from the input list of analyzer result files
  2. Flatten the dependency tree per scope in an analyzer result into a simple list of dependencies, thus already eliminating eventual duplicates
  3. Merge VCS info, repository configurations and dependency lists
  4. Put the result analyzer result back in the first position of the analyzer result files list
  5. Repeat with Step 1 unless the analyzer result file list contains only one entry

After the merge, the resulting VCS info of the only analyzer result is patched into the top-level VCS info for the overall project

Signed-Off: Rainer Bieniek [email protected]

…e analyzer result files

delivered by multiple product teams into one summary analyzer result to be processed by the
ORT pipeline.
In this specific use case, the exact relationship of dependencies does not matter and it is
totally acceptable to work with a flat list of dependencies in the result reports.

This commit extends the helper CLI with a command to merge multiple analyzer result files into
one analyzer result file.

The algorithm to condense the result file work the following way:
1. Take the first and second item from the input list of analyzer result files
2. Flatten the dependency tree per scope in an analyzer result into a simple list of dependencies, thus already eliminating eventual duplicates
3. Merge VCS info, repository configurations   and dependency lists
4. Put the result analyzer result back in the first position of the analyzer result files list
5. Repeat with Step 1 unless the analyzer result file list contains only one entry

After the merge, the resulting VCS info of the only analyzer result is patched into the top-level VCS info for the overall project

Signed-Off: Rainer Bieniek <[email protected]>
@porsche-rbieniek porsche-rbieniek requested a review from a team as a code owner November 15, 2021 10:50
@sschuberth
Copy link
Member

Hi @porsche-rbieniek, thanks for your contribution. Here are some general remarks before starting the code review. Please

  • rebase as we had some test issues
  • do not continue the commit message title in the body, i.e. title and body should not form a sentence; instead, regard the commit message title like the title of a book
  • hard-wrap the commit message body lines at column 75.

@fviernau
Copy link
Member

I believe it's dangerous for the normal user to build solutions on top of this command as
it's not obvious and complex to figure out what kind of inconsistencies the output has, what information the transformation looses and what kind of issues this can cause. If we start getting bugs inflowing which are for scans based on the output of this command I would not volunteer to pick these up TBH.

Just a few things which popped up after a quick look

  1. package configurations get discarded
  2. path's become ambigious as basically mutliple source trees are merged
  • entries for path excludes and license finding curations apply not only to the
    repository they have been setup for, but to all merged repositories potentially
  1. labels get discarded
  2. package curations get discarded

I wonder how we can avoid spending time on requests coming from the use of this command or the data it produces. Any thoughts, maybe @oss-review-toolkit/core-devs ?

@sschuberth
Copy link
Member

To me, the only sane answer to @fviernau's question is that the command must be changed to not loose any information.

I believe the general idea of merging multiple analyzer result files into a single one is a reasonable desire. However, as the resulting merged file does not reflect a real analysis on an actually existing repository anymore, we'd need to first come up with a concept of how to "fake" some data in the merged result. What immediately comes to my mind (but there's probably more):

  • The top-level repository VCS data needs to be "faked" as there is no real repository that contains all the merged projects.
  • Merged path excludes would need to know which original project their refer to.
  • Project IDs might need to get de-duplicated.

Each project already contains its own VCS data, so probably no need to fake anything here, but definition file paths should probably be relative to the VCS path then...

All in all I'm wondering whether the easier approach to achieve something similar would be to create a Git superproject with submodules / a git-repo manifest that "bundles" the repos to be merged, and then just run a single analysis on that superproject / git-repo project to already get a "merged" analyzer result.

@porsche-rbieniek
Copy link
Author

@sschuberth I would agree with the idea of super- / subprojects but that doesn't reflect the working approach taken by the multiple project teams.

The use case is to merge the analyzer results of multiple (and only loosely associated) product teams into one big analyzer result as the base of the reporting delivered to the law firms. The legal department does not care about the synthesized metadata regarding the overall project. But it does care about eliminating potential duplicate work executed by the law firms.

@fviernau I also agree the concerns about all the metadata that is being squashed by the approach taken. But bear in mind the goal was not exactness on the metadata. The goal was to create an effective way to deliver a condensend list of dependencies referenced across a large number of disassociated source analyzer results.

As said, in this partiucular use case it is not important how a particular package is referenced thoughout the dependency tree. It is only important to have a complete list of dependencies w/o duplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants