The rationale behind this helper command is the need to 'condense' th… #4698

porsche-rbieniek · 2021-11-15T10:50:43Z

…e analyzer result files

delivered by multiple product teams into one summary analyzer result to be processed by the
ORT pipeline.
In this specific use case, the exact relationship of dependencies does not matter and it is
totally acceptable to work with a flat list of dependencies in the result reports.

This commit extends the helper CLI with a command to merge multiple analyzer result files into
one analyzer result file.

The algorithm to condense the result file work the following way:

Take the first and second item from the input list of analyzer result files
Flatten the dependency tree per scope in an analyzer result into a simple list of dependencies, thus already eliminating eventual duplicates
Merge VCS info, repository configurations and dependency lists
Put the result analyzer result back in the first position of the analyzer result files list
Repeat with Step 1 unless the analyzer result file list contains only one entry

After the merge, the resulting VCS info of the only analyzer result is patched into the top-level VCS info for the overall project

Signed-Off: Rainer Bieniek [email protected]

…e analyzer result files delivered by multiple product teams into one summary analyzer result to be processed by the ORT pipeline. In this specific use case, the exact relationship of dependencies does not matter and it is totally acceptable to work with a flat list of dependencies in the result reports. This commit extends the helper CLI with a command to merge multiple analyzer result files into one analyzer result file. The algorithm to condense the result file work the following way: 1. Take the first and second item from the input list of analyzer result files 2. Flatten the dependency tree per scope in an analyzer result into a simple list of dependencies, thus already eliminating eventual duplicates 3. Merge VCS info, repository configurations and dependency lists 4. Put the result analyzer result back in the first position of the analyzer result files list 5. Repeat with Step 1 unless the analyzer result file list contains only one entry After the merge, the resulting VCS info of the only analyzer result is patched into the top-level VCS info for the overall project Signed-Off: Rainer Bieniek <[email protected]>

sschuberth · 2021-11-17T11:19:26Z

Hi @porsche-rbieniek, thanks for your contribution. Here are some general remarks before starting the code review. Please

rebase as we had some test issues
do not continue the commit message title in the body, i.e. title and body should not form a sentence; instead, regard the commit message title like the title of a book
hard-wrap the commit message body lines at column 75.

fviernau · 2021-11-17T11:49:23Z

I believe it's dangerous for the normal user to build solutions on top of this command as
it's not obvious and complex to figure out what kind of inconsistencies the output has, what information the transformation looses and what kind of issues this can cause. If we start getting bugs inflowing which are for scans based on the output of this command I would not volunteer to pick these up TBH.

Just a few things which popped up after a quick look

package configurations get discarded
path's become ambigious as basically mutliple source trees are merged

entries for path excludes and license finding curations apply not only to the
repository they have been setup for, but to all merged repositories potentially

labels get discarded
package curations get discarded

I wonder how we can avoid spending time on requests coming from the use of this command or the data it produces. Any thoughts, maybe @oss-review-toolkit/core-devs ?

sschuberth · 2021-11-19T08:00:58Z

To me, the only sane answer to @fviernau's question is that the command must be changed to not loose any information.

I believe the general idea of merging multiple analyzer result files into a single one is a reasonable desire. However, as the resulting merged file does not reflect a real analysis on an actually existing repository anymore, we'd need to first come up with a concept of how to "fake" some data in the merged result. What immediately comes to my mind (but there's probably more):

The top-level repository VCS data needs to be "faked" as there is no real repository that contains all the merged projects.
Merged path excludes would need to know which original project their refer to.
Project IDs might need to get de-duplicated.

Each project already contains its own VCS data, so probably no need to fake anything here, but definition file paths should probably be relative to the VCS path then...

All in all I'm wondering whether the easier approach to achieve something similar would be to create a Git superproject with submodules / a git-repo manifest that "bundles" the repos to be merged, and then just run a single analysis on that superproject / git-repo project to already get a "merged" analyzer result.

porsche-rbieniek · 2021-12-01T15:36:42Z

@sschuberth I would agree with the idea of super- / subprojects but that doesn't reflect the working approach taken by the multiple project teams.

The use case is to merge the analyzer results of multiple (and only loosely associated) product teams into one big analyzer result as the base of the reporting delivered to the law firms. The legal department does not care about the synthesized metadata regarding the overall project. But it does care about eliminating potential duplicate work executed by the law firms.

@fviernau I also agree the concerns about all the metadata that is being squashed by the approach taken. But bear in mind the goal was not exactness on the metadata. The goal was to create an effective way to deliver a condensend list of dependencies referenced across a large number of disassociated source analyzer results.

As said, in this partiucular use case it is not important how a particular package is referenced thoughout the dependency tree. It is only important to have a complete list of dependencies w/o duplicates.

porsche-rbieniek requested a review from a team as a code owner November 15, 2021 10:50

sschuberth mentioned this pull request May 5, 2022

Helper CLI function for merging muliple analyzer results into one #5317

Closed

porsche-rbieniek closed this Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The rationale behind this helper command is the need to 'condense' th… #4698

The rationale behind this helper command is the need to 'condense' th… #4698

porsche-rbieniek commented Nov 15, 2021

sschuberth commented Nov 17, 2021

fviernau commented Nov 17, 2021

sschuberth commented Nov 19, 2021

porsche-rbieniek commented Dec 1, 2021

The rationale behind this helper command is the need to 'condense' th… #4698

The rationale behind this helper command is the need to 'condense' th… #4698

Conversation

porsche-rbieniek commented Nov 15, 2021

sschuberth commented Nov 17, 2021

fviernau commented Nov 17, 2021

sschuberth commented Nov 19, 2021

porsche-rbieniek commented Dec 1, 2021