Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to exclude packages introduced from a base layer #1809

Open
ericbl opened this issue May 10, 2023 · 7 comments
Open

Add option to exclude packages introduced from a base layer #1809

ericbl opened this issue May 10, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@ericbl
Copy link

ericbl commented May 10, 2023

What would you like to be added:

further options on the scope flag.

Why is this needed:

First, I am not sure if I properly understood. When we speak about layer, do we speak about container layers, e.g. from docker?
If yes, I wish I could scan only the components of the 'latest' / 'top' layer of my docker images.

Indeed, in our team, I built a hierarchy of docker images, where the Dockerfile already start with

FROM {someBaseImage}

so that I share a common 'debian-base' as 'lowest' layer on all my images.

Currently, the Debian packages are duplicated in all the SBOM of all images.
I wish I could avoid this duplication.

A possible workaround is a postprocessing script filtering out components by their layerId, but it is a bit tricky to find out which layerId I want to keep!

Skipping completely the Debian registery is not a clean workaround since I might have a Debian package installed only on my 'top layer'.

Possible related issue: #435

@ericbl ericbl added the enhancement New feature or request label May 10, 2023
@kzantow
Copy link
Contributor

kzantow commented May 10, 2023

@ericbl there are currently two different scope options: squashed (the default) and all-layers. It sounds like you just want the default behavior: squashed -- it will scan all the files that are on present in the image at the most recent (top?) layer.

If I'm misunderstanding and you wanted to scan only the base image, you could just scan that image directly.

Does this answer your question?

@ericbl
Copy link
Author

ericbl commented May 10, 2023

that would have been my expectation too, and the output on cli looks promising but the result in the cycloneDX output do not match that.

syft {my image} -o cyclonedx-json=syft-cyclone-dx_sbom_top.json
Cataloged packages [341 packages]

syft {my image} --scope all-layers -o cyclonedx-json=syft-cyclone-dx_sbom_all.json
Cataloged packages [543 packages]

But then, looking at both json file, they do list both all packages :(
They just differ by the metadatas, with more metadata regarding the layers in the _all.json.

Or it is an issue with the cycloneDX output maybe?

Actually, both json are listing 341 components, including all the Debian packages (~ 100 packages from debian-11-slim)

@ericbl
Copy link
Author

ericbl commented May 10, 2023

As the name suggests, the 'squashed' option seems to list all packages visible on the top layer, but it means also packages installed on lower layers.
I would like to see only those installed on the top layer.

@kzantow
Copy link
Contributor

kzantow commented May 10, 2023

So you want a diff for the top layer (or a specific layer)?

The way many package managers work is they update a single database file. If you add a package, for example, the database file gets updated. This entire file is present on that layer. Without doing some sort of diff, Syft has no way of knowing which packages got added on that layer.

Having said that, I think you are absolutely right that this is related to #435 and one of the solutions we have discussed could help, even if using the all-layers option. The idea is to include all layers for each package, in layer added order (today they are not sorted this way). If we make this change, a consumer could read a Syft SBOM and use it to determine which layer something was introduced on, and if you only care about the latest layer, then just filter everything out based on that being the first entry in the package location. Does this sound like something that you would be able to use to accomplish your result?

@ericbl
Copy link
Author

ericbl commented May 10, 2023

sorting the components by layer instead of just alphabetically would indeed help.

@ericbl
Copy link
Author

ericbl commented May 10, 2023

After posting the issue here, I wrote a python script with another approach: remove any component which is found on multiple layerId, i.e. which has multiple layerId properties. Simply remove a component if layerID > 0 is found.

def transform_json(import_json, export_json):
    image_sbom = json.load(open(import_json))
    original_list = image_sbom['components']
    copy = original_list.copy()
    # iterate the component of the copied list.
    for comp in copy:
        remove=False
        if 'properties' in comp:
            properties = comp['properties']
            # mark any component from the layer 1 (or from any lower layer) to get removed
            # we keep only the component on layer 0, i.e. the top layer.
            for item in properties:
                if item["name"] == "syft:location:1:layerID":
                    remove=True
                    continue
        # remove also the os part
        if 'type' in comp and comp['type'] == "operating-system":
            remove=True
        # remove component from the original list
        if remove:
           original_list.remove(comp)

    # write the output json
    with open(export_json, "w") as file:
         json.dump(image_sbom, file, ensure_ascii=False)  # unicode output

This is a solution to my container, and remove all debian packages from our base image.

On the other hand, after discussing with colleague, working on the syft output is more than a hack than a proper solution for the related issue. Our driven 'issue' is indeed NOT related to syft or the sbom but to the tool where we export everything (sw360). So we are evaluating 'cleaner' solution.

@kzantow kzantow changed the title Add option on --scope to scan only 'top' layer. Add option to exclude packages introduced from a base layer Aug 10, 2023
@wagoodman
Copy link
Contributor

Related issue: #15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

3 participants