Add support for package dependency relationships #572

wagoodman · 2021-10-19T18:23:21Z

What would you like to be added:
Support tracking the full dependency graph for packages in the form of relationships, for the ecosystems that support extracting this information.

Why is this needed:
An SBOM is useful for at least listing what makes up a software artifact. However, it is more useful to know how a dependency is related to the artifact (is it a direct dependency? or a transitive dependency? is this dependency used by several other packages, or just one?).

Below is a list of each ecosystem that we could implement this for (really it's a list of all of the parsers for all catalogers). It doesn't mean that we should implement this entire list, there are some ecosystems that just don't raise up enough information to make adding relationships useful. This will have to be taken on a case-by-case basis.

These are catalogers that have been deemed not possible / practical to implement raise up relationships for at this time:

Nix (store): there is no available metadata to reference
Python (setup.py): no relationship information
Python (requirements.txt): no relationship information
Python (Pipfile.lock): no relationship information to the package this is being installed for
RPM (rpm file): there is relationship data, but not with potentially other installed packages

Notes:
This assumes that #556 is implemented, allowing for package catalogers to return relationships as first class evidence.

hectorj2f · 2021-12-17T15:28:37Z

@wagoodman I started to play with the package dependencies for cyclonedx https://github.com/hectorj2f/syft/tree/hectorj2f/add_dependencies_to_cyclonedx. I am only generating the dependencies for the components as cyclonedx format recommends. Let me know if you prefer to open a PR for that.

wagoodman · 2021-12-23T16:32:36Z

Some detail here regarding which ecosystems this will be feasible for in a static-analysis sense (not reaching out to external data sources, such as maven central).

SPDX 2.2 relationships are used to describe what will be added to the artifact package in terms of new relationship types. The used relationships in the breakdown below are:

RUNTIME_DEPENDENCY_OF
DEV_DEPENDENCY_OF
BUILD_DEPENDENCY_OF
DEPENDENCY_OF

Question: we might not be able to accurately determine build-vs-runtime dependency depending on the lack of context (e.g.
python). Should we just use DEPENDENCY_OF instead in these cases? (final answer: yes)

apk

Summary: direct runtime dependencies

The D: section lists pull dependencies, which is a space delimited list of package names and categorized dependencies. E.g. D:scanelf so:libc.musl-x86_64.so.1.
One problem is being able to figure which of the dependencies are package names, and which are other requirements.
Relationships:
RUNTIME_DEPENDENCY_OF for any packages listed as pull dependencies
question: should we make package-to-file relationships for so: dependencies that are found by the resolver?

dpkg

Summary: direct runtime dependencies

The Depends and Pre-Depends sections hold information about dependencies: "Both depends and pre-depends mention the dependencies a package needs before installing but pre-depends forces the installation and configuration of the dependency packages before even starting with the package that needs the dependencies" source

Relationships:

RUNTIME_DEPENDENCY_OF for any packages listed as dependencies

golang

go.mod

Summary: flat-subset of transitive build dependencies.

go.mod contains a subset of transitive dependencies (but possibly more than direct dependencies). Cannot reason about dependency-to-dependency relationships

Relationships:

BUILD_DEPENDENCY_OF for any package listed in the go.mod
For go.mod, we currently cannot determine if a dependency is for testing or not.

go binary buildinfo section

Summary: flat-transitive build dependencies.

go binary buildinfo section contains transitive dependencies. Cannot reason about dependency-to-dependency relationships

Relationships:

BUILD_DEPENDENCY_OF for any package listed in the binary buildinfo section.

java

pom.xml

Summary: flat-direct build dependencies.

There is a <dependencies> section which describes direct build dependencies

Relationships:

BUILD_DEPENDENCY_OF for any package listed the dependency section.

manifest

Does not contain dependency information

javascript

yarn.lock

Summary: flat psuedo-transitive runtime dependency pins

Each package has a "dependencies" section, which lists only direct dependencies. Dependencies of these dependencies is NOT tracked. Also the dependencies are not pinned versions

Relationships:

RUNTIME_DEPENDENCY_OF for any packages listed in the yarn.lock
cannot use the "dependencies" section within each package (since it is not a pin)

package.lock

Summary: transitive dependency pins with full dependency-to-dependency graph

Under the "dependencies" each pinned version specifies a loose version requirements for any packages that the pinned dependency requires under the "requires" section.
The requires section always has names that map back to the same name in the "dependencies" section.

Relationships:

RUNTIME_DEPENDENCY_OF for any packages listed as dependencies
full dependency graph possible (dependency-to-dependency relationships)

package.json

Summary: flat-direct runtime and dev dependency version ranges.

The "dependencies" section has a map of name:version values for direct dependencies
The "devDependencies" section has name:version values for direct dev dependencies
Note: version values are not pins, but range specifiers

Relationships:

RUNTIME_DEPENDENCY_OF for any packages listed as dependencies
DEV_DEPENDENCY_OF for any packages listed in devDependencies

php

composer.lock

Summary: direct dependency version pins

Direct dependencies only within the "packages" and "packages-dev" sections
versions are pinned

Relationships:

RUNTIME_DEPENDENCY_OF for any packages listed as dependencies
DEV_DEPENDENCY_OF for any packages listed in devDependencies

installed.json

Summary: No relationships possible

list of installed package versions and their required packages, required packages are only loose version specifiers

Relationships:

it's not clear that any relationships can be supported.

python

poetry

Summary: flat-transitive dependency dev and runtime relationships

Lists transitive dependencies in flat fashion (no dependency-to-dependency relationships)
Each dependency can be categorized (main and dev)
Relationships:
Can only describe that the main package relates to packages described within the poetry.lock (not a full transitive dependency graph)
DEV_DEPENDENCY_OF for packages with a "dev" category
BUILD_DEPENDENCY_OF for packages with a "main" category (question: should this be RUNTIME_DEPENDENCY_OF since these dependencies are not required in the python compile step for generating pycs?) We cannot really distinguish these in all cases, so it is safer to use DEPENDENCY_OF

pipfile

Summary: flat-transitive dependency dev and runtime relationships

Lists transitive dependencies in flat fashion (no dependency-to-dependency relationships)
Each dependency goes underneath one of two json sections: default and develop
Relationships:
Can only describe that the default package relates to packages described within the lockfile (not a full transitive dependency graph)
DEV_DEPENDENCY_OF for packages with a "develop" category
BUILD_DEPENDENCY_OF for packages with a "default" category (question: should this be RUNTIME_DEPENDENCY_OF since these dependencies are not required in the python compile step for generating pycs?) We cannot really distinguish these in all cases, so it is safer to use DEPENDENCY_OF

egg / dist metadata

Does not describe any relationships

bureado · 2021-12-28T16:31:36Z

Can you elaborate more on the why for this feature? From my reading of it, what you are trying to do is to determine why a package foo of version bar made it into the thing you are scanning with syft.

If so, I fear dumping the dependency tree might not answer that question. Parsing the package manager operations log can approximate an answer, but the only deterministic way I’m aware of to do this is to perform actual process introspection during the image build to know exactly what ended up calling e.g. dpkg -i over a file on disk.

Conversely, with a purl that is differentiated enough you can augment the syft output with dependencies and much more metadata that is publicly known and available. If syft says nano version 1.2 is in this Ubuntu container of release foo, anyone can readily obtain the dependencies of that package from public sources.

Don’t get me wrong, I’m a fan of taking as much primary source data from the package manager in the scanned instance as possible. And I think flat SBOMs can be limited in many scenarios (log4j being just the latest widely covered scenario) But I think how the feature surfaces, what it tries to solve and how it changes the syft experience for people that are expecting a flat output might be worth additional consideration.

Thank you for working on syft and for helping syft users and the industry realize better outcomes through a thoughtful approach to the existing package manager metadata.

bureado · 2021-12-28T16:35:21Z

Adding one thing to my comment above. It’s possible that the “why” for this feature is not “what other binary depended on this binary/made this binary materialize” but more of a transitive “what other software was needed to make this binary that then went into my image” and that’s where Build-Depends and Built-Using (in the case of dpkg) would be more useful but the in-artifact package manager metadata might not contain that information. Ideally, that information would carry in each packages own SBOM but in practice the trend seems to be that metadata will live in publicly queriable services. Meaning that maybe this augmentation of syft output could be a post-analyze stage?

wagoodman · 2022-02-24T22:45:25Z

@bureado thanks for your thoughts on this --we chatted a lot about this at a recent community meeting and internally as well... I wanted to expose some of these conversations here in the issue as well.

Why do this feature? That's a fair question, and one that we've been exploring before trying to take it on. Squarely put, a list of packages without how they relate won't be able to answer questions about what could have introduced a package into the artifact.

Take for example, knowing that you have log4j installed is very useful, though if your intent is to remove it you need to know how it got introduced. Is it a direct dependency of your application? Did another package bring it in? Maybe both? It happens that for java packages the syft pkg.metadata.virtualPath is a good indicator for some of this, but it's heavily encoded... and the same equivalent field isn't present for all ecosystems. Bringing in relationships to raise up common descriptions of what is in the underlying data makes sense in this case.

Same can be said for vulnerability analysis. I see that I'm vulnerable to CVE-X-Y for this package, when combing this with VEX information in the future that can indicate applicability of a CVE from the publisher's perspective, knowing through which path in the dependency tree the vulnerability match is for starts to matter... this is only achievable by knowing the relationships between packages.

External data has richer relationship information. This is generally (nearly universally) true. Many ecosystems don't express full connectivity information between packages, however, their public repository (e.g. PyPI, Maven central, rubygems.org, etc) have this information and with some external querying you can get a better understanding of package-to-package relationships.

Sometime in the near future we want to add in features that allow syft to leverage external data in an opt-in capacity. However, we do have enough raw information from the underlying artifact to convey package-to-package connectivity in most ecosystems (and we're trying to be forward with the limitations for each ecosystem in #572 (comment)).

Does the existence of better connectivity data externally indicate that we should not express package-to-package relationships? Or that we should hold off until we do have this ability to query external sources? My take is that we can introduce this feature but allow for configurability of it (be able to change behavior or the source of this connectivity information, or turn it off altogether).

But I think how the feature surfaces, what it tries to solve and how it changes the syft experience for people that are expecting a flat output might be worth additional consideration.

I 100% agree with this. We still want to provide a flat list of packages, so no change there. This would add additional elements in the relationships section of the SBOM. If it's the sheer number of additional relationships that would be the problem, then that future points to having configuration to turn off or augment this functionality.

Sorry for the radio silence on this @bureado , but happy to continue chatting about this.

wagoodman · 2022-02-24T22:46:17Z

from refinement:

this issue should not get picked up directly for work, but instead we should be creating new issues to account for each ecosystem... not byte them all off at once.

fproulx-boostsecurity · 2022-09-19T16:57:51Z

We'd love for this to be supported! How far is this on the roadmap ?
Or at least, I cannot make it work now.

VijayKumarMidde · 2022-09-20T02:17:22Z

+1. would love to see this feature on Syft. Is this feature on the roadmap?

Hritik14 · 2023-06-13T04:19:07Z

@wagoodman

External data has richer relationship information

This is something that is easily available now for public use (https://deps.dev). Are there any plans for incorporating the same ?

setchy · 2023-06-30T16:22:34Z

Ditto - I find that this feature would be incredibly helpful, particularly when using tools like DependencyTrack to visualize the dependency graph. Trivy has support for maintaining dependency relationships

markgalpin · 2023-07-29T00:40:44Z

@wagoodman so I was looking at the parsing of java archives, in the context of an effort to think about Vex document hierarchies and cycloneDX over a particular dataset of containers.

As far as I can tell, currently Syft doesn't provide any "Relationship" information package-to-package with java archive parsing, currently the archive parser recursively takes a known java archive object and checks what's inside based on the manifest files -- anecdotally the archive parser seems to be what's most commonly invoked when handed a production container running java. But there certainly IS a relationship if you are only reporting on the presence of one library because it was shipped inside the archive for another.

While opinions vary, generally from an SBOM perspective when we talk about a "dependency" we mean "if there's a problem with this, there may be a problem with thing depending on it", or for use cases about bringing it in, as discussed elsewhere. And in THAT sense, the hierarchical information derived from the archive parsing seems like its valid dependencies, even if you don't go into the next level of sorting out the pom files. That doesn't mean that the extra compile-scope issues in the pom couldn't be relevant. But knowing, when processing an SBOM that the issue reported in jc-core is because that's a library inside the netty-common uberjar... is actually pretty valuable.

Changing syft to output the hierarchy when extracting from java archives isn't that hard. I could maybe PR it (I built a POC of it after I found issue #1972 because I needed an example of maven for my purposes). Then you get into the different TYPES of relationships, should this be dependencyOf or Contains...

One thing I do think about is that from a CyloneDX perspective, I would be inclined to say that any package-to-package relationship counts as a "Dependency" for its purpose. Although anecdotally, in terms of current syft output this seems to mostly just arise in OS packages containing library package types such as python etc. Anything that makes SBOMs less flat is good for a variety of use cases.

As a note, processing NPM seems a bit harder within the current code framework. Right now for NPM the standard behavior of cataloger is to parse a package json to retrieve a single package, so as I understand the code architecture, to get the list of all npm packages for relationships to correctly display one bomref to another you'd need to do it at the end of the run, and then process the dependencies?

wagoodman · 2024-02-07T20:06:31Z

I want to revisit this statement for a bit:

SPDX 2.2 relationships are used to describe what will be added to the artifact package in terms of new relationship types. > The used relationships in the breakdown below are:

RUNTIME_DEPENDENCY_OF

DEV_DEPENDENCY_OF

BUILD_DEPENDENCY_OF

DEPENDENCY_OF

Question: we might not be able to accurately determine build-vs-runtime dependency depending on the lack of context (e.g.
python). Should we just use DEPENDENCY_OF instead in these cases?
...final answer: yes

I think there could be a compromise here to get the best of both worlds. The main problem with using all 4 relationship types is that it makes it a little harder for consumers to use (they need to know about all types and union the graph together). The problem with using only DEPENDENCY_OF is that it's lossy, which isn't ideal when you're trying to discern nuance.

The compromise I propose is this: In syft JSON use DEPENDENCY_OF , but annotate the Data field of the relationship with additional dependency qualities (such as is it a dev dependency, runtime, build, etc):

syft/syft/artifact/relationship.go

Lines 37 to 42 in da31eed

    
           type Relationship struct { 
        
           	From Identifiable 
        
           	To   Identifiable 
        
           	Type RelationshipType 
        
           	Data interface{} 
        
           }

Even if the struct was something simple like:

type DependencyKind struct {
  Runtime bool
  Development bool
  BuildTime bool
}

would be a step forward, since it would allow for multiple options to be true without muddling the graph with more edges than necessary.

I feel that this would make a good trade off in terms of making graph traversal easier to grok without loosing information.

spiffcs · 2024-03-14T20:19:04Z

Linking the latest and greatest SPDX 3.0 relationship types as a dev note for those picking this up on a per ecosystem basis:
https://spdx.github.io/spdx-spec/v3.0/model/Core/Vocabularies/RelationshipType/#

wagoodman · 2024-03-14T21:08:07Z

Team consensus from our weekly gardening meeting is to not tackle #572 (comment) , meaning we will only have DEPENDENCY_OF. Note: this means that if something is a dev, build, or dependency then it will still be captured as DEPENDENCY_OF. In the future we might still try and tackle adding edge qualifications or more edges of various types... but not on the first pass.

wagoodman added the enhancement New feature or request label Oct 19, 2021

wagoodman mentioned this issue Oct 19, 2021

Adapt new and existing package metadata as SPDX relationships #476

Open

hectorj2f mentioned this issue Dec 19, 2021

cyclonedx: add artifact relationships as dependencies #706

Closed

bureado mentioned this issue Jan 8, 2022

Catalog discovered SBOMs #737

Open

wagoodman mentioned this issue Sep 20, 2022

Add support for dependency relationships for alpine (apk) #1063

Merged

wagoodman mentioned this issue Oct 12, 2022

fix: duplicate packages when identical except location #1249

Closed

wagoodman mentioned this issue Oct 21, 2022

Upgrade generic cataloger #1281

Merged

eliaslevy mentioned this issue Mar 19, 2023

Syft SBOMs support dependency hierarchies. #1674

Closed

This was referenced Apr 13, 2023

Support fetching packaging information from build tooling #1736

Closed

Invoke known tools to gather build-time dependency information #1562

Open

willmurphyscode mentioned this issue Jul 11, 2023

Report Go runtime vulnerabilities based on the runtime detected in a Go binary anchore/grype#1370

Closed

wagoodman mentioned this issue Aug 11, 2023

Is there any feature to download/list the following details. #2002

Closed

kzantow mentioned this issue Aug 18, 2023

Add support for dpkg dependency relationships #2040

Closed

wagoodman self-assigned this Oct 10, 2023

wagoodman mentioned this issue Oct 10, 2023

Add relationships for dpkg packages #2212

Merged

wagoodman mentioned this issue Nov 30, 2023

Syft not created "dependencies" in cyclonedx report #2353

Open

tgerla mentioned this issue Jan 11, 2024

Option to filter out vulnerabilities of dev dependencies anchore/grype#1643

Open

wagoodman added the planning high level epic that should be broken into smaller tasks label Feb 7, 2024

wagoodman removed their assignment Mar 12, 2024

This was referenced May 7, 2024

Add relationships for ALPM packages (arch linux) #2851

Merged

Add abstraction for adding relationships from package cataloger results #2853

Merged

wagoodman self-assigned this May 9, 2024

wagoodman mentioned this issue May 14, 2024

Add support for RPM DB package relationships #2872

Merged

This was referenced May 24, 2024

Add python wheel egg relationships #2903

Merged

Add relationships for python poetry packages #2906

Merged

Add relationships for go binary packages #2912

Merged

wagoodman mentioned this issue Aug 12, 2024

Is generating cyclonedx dependencies supported with the javascript-lock cataloger? #2305

Open

kzantow mentioned this issue Sep 3, 2024

Java dependency graph #3189

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for package dependency relationships #572

Add support for package dependency relationships #572

wagoodman commented Oct 19, 2021 •

edited by kzantow

Loading

hectorj2f commented Dec 17, 2021

wagoodman commented Dec 23, 2021 •

edited

Loading

bureado commented Dec 28, 2021 •

edited

Loading

bureado commented Dec 28, 2021

wagoodman commented Feb 24, 2022

wagoodman commented Feb 24, 2022

fproulx-boostsecurity commented Sep 19, 2022 •

edited

Loading

VijayKumarMidde commented Sep 20, 2022

Hritik14 commented Jun 13, 2023 •

edited

Loading

setchy commented Jun 30, 2023

markgalpin commented Jul 29, 2023 •

edited

Loading

wagoodman commented Feb 7, 2024

spiffcs commented Mar 14, 2024

wagoodman commented Mar 14, 2024 •

edited

Loading

Add support for package dependency relationships #572

Add support for package dependency relationships #572

Comments

wagoodman commented Oct 19, 2021 • edited by kzantow Loading

hectorj2f commented Dec 17, 2021

wagoodman commented Dec 23, 2021 • edited Loading

apk

dpkg

golang

go.mod

go binary buildinfo section

java

pom.xml

manifest

javascript

yarn.lock

package.lock

package.json

php

composer.lock

installed.json

python

poetry

pipfile

egg / dist metadata

bureado commented Dec 28, 2021 • edited Loading

bureado commented Dec 28, 2021

wagoodman commented Feb 24, 2022

wagoodman commented Feb 24, 2022

fproulx-boostsecurity commented Sep 19, 2022 • edited Loading

VijayKumarMidde commented Sep 20, 2022

Hritik14 commented Jun 13, 2023 • edited Loading

setchy commented Jun 30, 2023

markgalpin commented Jul 29, 2023 • edited Loading

wagoodman commented Feb 7, 2024

spiffcs commented Mar 14, 2024

wagoodman commented Mar 14, 2024 • edited Loading

wagoodman commented Oct 19, 2021 •

edited by kzantow

Loading

wagoodman commented Dec 23, 2021 •

edited

Loading

bureado commented Dec 28, 2021 •

edited

Loading

fproulx-boostsecurity commented Sep 19, 2022 •

edited

Loading

Hritik14 commented Jun 13, 2023 •

edited

Loading

markgalpin commented Jul 29, 2023 •

edited

Loading

wagoodman commented Mar 14, 2024 •

edited

Loading