Ban integer overflow in the Nix language #11188

lf- · 2024-07-25T21:11:57Z

Motivation

Context

You can review this by looking through it commit-by-commit since that was its original form and it is intended to be read that way.

This is approximately this stack from Lix: https://gerrit.lix.systems/c/lix/+/1597 (with its parents lower in the list), with the following after-the-fact oopsie fixes added into the commits themselves:

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

doc/manual/rl-next/ban-integer-overflow.md

doc/manual/src/language/syntax.md

Mic92 · 2024-07-26T07:53:14Z

@lf- You can fix the format issues like this:

clang-format -style=file -i src/libutil/checked-arithmetic.hh tests/unit/libutil/checked-arithmetic.cc

roberth

Suggestions for the reference documentation.

doc/manual/src/language/syntax.md

doc/manual/src/language/operators.md

doc/manual/src/language/syntax.md

roberth · 2024-07-26T13:05:29Z

@lf- You can fix the format issues like this:

clang-format -style=file -i src/libutil/checked-arithmetic.hh tests/unit/libutil/checked-arithmetic.cc

Or make format in an up to date dev shell.

edolstra · 2024-07-26T14:02:53Z

src/libexpr-c/nix_api_value.cc

@@ -306,7 +306,7 @@ int64_t nix_get_int(nix_c_context * context, const nix_value * value)
    try {
        auto & v = check_value_in(value);
        assert(v.type() == nix::nInt);
-        return v.integer();
+        return v.integer().value;


Can't v.integer() return NixInt::Inner? It would make the diff a lot smaller.

Alternatively, there could be an implicit conversion from NixInt to NixInt::Inner.

This is deliberately explicit so that it's impossible to deal with values originating from user code without doing safe arithmetic.

Without this deliberate break, I would not have caught the broken JSON conversions from uint64_t and others.

I like that, but it isn't quite obvious at the usage sites.
Should it be .integer().toUnchecked()?

Ericson2314 · 2024-07-26T15:22:57Z

Thank you very much @lf- for porting this over to us! I really appreciate it.

The actual motive here is the avoidance of integer overflow if we were to make these use checked NixInts and retain the subtraction. However, the actual *intent* of this code is a three-way comparison, which can be done with operator<=>, so we should just do *that* instead. Change-Id: I7f9a7da1f3176424b528af6d1b4f1591e4ab26bf

This is in preparation for adding checked arithmetic to the evaluator. Change-Id: I6e115ce8f5411feda1706624977a4dcd5efd4d13

This also bans various sneaking of negative numbers from the language into unsuspecting builtins as was exposed while auditing the consequences of changing the Nix language integer type to a newtype. It's unlikely that this change comprehensively ensures correctness when passing integers out of the Nix language and we should probably add a checked-narrowing function or something similar, but that's out of scope for the immediate change. During the development of this I found a few fun facts about the language: - You could overflow integers by converting from unsigned JSON values. - You could overflow unsigned integers by converting negative numbers into them when going into Nix config, into fetchTree, and into flake inputs. The flake inputs and Nix config cannot actually be tested properly since they both ban thunks, however, we put in checks anyway because it's possible these could somehow be used to do such shenanigans some other way. Note that Lix has banned Nix language integer overflows since the very first public beta, but threw a SIGILL about them because we run with -fsanitize=signed-overflow -fsanitize-undefined-trap-on-error in production builds. Since the Nix language uses signed integers, overflow was simply undefined behaviour, and since we defined that to trap, it did. Trapping on it was a bad UX, but we didn't even entirely notice that we had done this at all until it was reported as a bug a couple of months later (which is, to be fair, that flag working as intended), and it's got enough production time that, aside from code that is IMHO buggy (and which is, in any case, not in nixpkgs) such as https://git.lix.systems/lix-project/lix/issues/445, we don't think anyone doing anything reasonable actually depends on wrapping overflow. Even for weird use cases such as doing funny bit crimes, it doesn't make sense IMO to have wrapping behaviour, since two's complement arithmetic overflow behaviour is so *aggressively* not what you want for *any* kind of mathematics/algorithms. The Nix language exists for package management, a domain where bit crimes are already only dubiously in scope to begin with, and it makes a lot more sense for that domain for the integers to never lose precision, either by throwing errors if they would, or by being arbitrary-precision. Fixes: NixOS#10968 Original-CL: https://gerrit.lix.systems/c/lix/+/1596 Change-Id: I51f253840c4af2ea5422b8a420aa5fafbf8fae75

Change-Id: Ie8a1b31035f2d27a220e5df2e9e178ec3b39ee68

Change-Id: Ib75ab5b8b4d879035d7ee7678f9cd0c491a39c0a

fricklerhandwerk · 2024-07-31T20:54:33Z

From the maintainer team meeting: We'll postpone merging to after the 2.24 release.

lf- · 2024-07-31T21:31:01Z

Here are the benchmarks of this change:

Benchmarks summary (from lix //bench/summarize.jq bench/bench-*.json)
result-with/bin/nix --extra-experimental-features 'nix-command flakes' eval -f bench/nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
  mean:     0.443s ± 0.003s
            user: 0.397s | system: 0.044s
  median:   0.442s
  range:    0.44s ... 0.45s
  relative: 1

../nix2/result-without/bin/nix --extra-experimental-features 'nix-command flakes' eval -f bench/nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix
  mean:     0.442s ± 0.001s
            user: 0.397s | system: 0.043s
  median:   0.442s
  range:    0.441s ... 0.445s
  relative: 0.998

---

result-with/bin/nix --extra-experimental-features 'nix-command flakes' eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  mean:     8.483s ± 0.069s
            user: 7.361s | system: 0.736s
  median:   8.512s
  range:    8.353s ... 8.555s
  relative: 1

../nix2/result-without/bin/nix --extra-experimental-features 'nix-command flakes' eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  mean:     8.464s ± 0.03s
            user: 7.36s | system: 0.718s
  median:   8.456s
  range:    8.433s ... 8.542s
  relative: 0.998

---

GC_INITIAL_HEAP_SIZE=10g result-with/bin/nix eval --extra-experimental-features 'nix-command flakes' --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  mean:     6.618s ± 0.041s
            user: 5.437s | system: 0.882s
  median:   6.632s
  range:    6.532s ... 6.671s
  relative: 1

GC_INITIAL_HEAP_SIZE=10g ../nix2/result-without/bin/nix eval --extra-experimental-features 'nix-command flakes' --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
  mean:     6.591s ± 0.058s
            user: 5.398s | system: 0.902s
  median:   6.585s
  range:    6.518s ... 6.712s
  relative: 0.996

---

result-with/bin/nix --extra-experimental-features 'nix-command flakes' search --no-eval-cache github:nixos/nixpkgs/e1fa12d4f6c6fe19ccb59cac54b5b3f25e160870 hello
  mean:     24.023s ± 0.353s
            user: 20.857s | system: 2.001s
  median:   23.945s
  range:    23.666s ... 24.928s
  relative: 1

../nix2/result-without/bin/nix --extra-experimental-features 'nix-command flakes' search --no-eval-cache github:nixos/nixpkgs/e1fa12d4f6c6fe19ccb59cac54b5b3f25e160870 hello
  mean:     23.843s ± 0.096s
            user: 20.709s | system: 1.979s
  median:   23.844s
  range:    23.723s ... 24.046s
  relative: 0.993

I fussed around with it a bunch in vtune, and as far as I can tell, the reason for the 0.2-0.7% negligible impact is probably just code size or cache fiddliness. The only actually remotely hot part of it is in ExprConcatStrings, which I somehow suspect that some minor rearrangement in a future change would potentially improve. The profile differences indicate that there was some change in inlining of some hash table stuff, possibly. One could do a Student's t-test to determine if this difference is even statistically significant, but that is way too much caring over something that will probably be disrupted far more by random code changes accidentally.

Also I have learned that zfs on Linux is extraordinarily bad for stable benchmark execution times; this is from a non-zfs machine and it just worked without any outliers.

nyabinary · 2024-08-02T02:00:37Z

From the maintainer team meeting: We'll postpone merging to after the 2.24 release.

2.24 released :P

Aleksanaa · 2024-08-02T07:43:48Z

I tested this PR against this expression, which is known as Tower of Hanoi:

hanoi/default.nix

let
  digits = 15;

  # Move the last digit from a to b
  move =
    { src, dest }:
    {
      src = src / digits;
      dest = if src == 0 then dest else dest * digits + src - (src / digits) * digits;
    };

  step =
    {
      i,
      a,
      b,
      c,
    }:
    if i == 1 then
      let
        moved = move {
          src = a;
          dest = b;
        };
      in
      {
        a = moved.src;
        b = moved.dest;
        inherit c;
      }
    else
      let
        step1 = step {
          i = i - 1;
          inherit a;
          b = c;
          c = b;
        };
        step2 = move {
          src = step1.a; # a
          dest = step1.b; # c
        };
        step3 = step {
          i = i - 1;
          a = step1.c; # b
          b = step2.src; # a
          c = step2.dest; # c
        };
      in
      step3;

  gen = i: result: if i == 0 then result else gen (i - 1) (digits * result + i);

  question =
    i:
    step {
      inherit i;
      a = gen i 0;
      b = 0;
      c = 0;
    };
in
question 14

14 is chosen because any numbers larger will cause a stack overflow; max-call-depth exceeded error. I've tested against 838b666 (with and without this PR merged) with --option eval-cache false, and haven't noticed any significant difference in execution time between the two. They are both around 0.130~0.140s, and the variation should be caused by system load. Further tests are appreciated.

Aleksanaa · 2024-08-02T08:12:49Z

I also did a 20,000 lines of random addition/subtraction of two-digits random numbers, which is also almost impossible to tell the difference between the two

lf- · 2024-08-10T02:58:13Z

I believe this can be merged now that 2.24 is out?

inclyc · 2024-08-10T06:01:56Z

Hi there, @lf- the author and Nix maintainers,

I changed my mind, the performance impact seems to be very minimal because indeed the Nix evaluator itself does not have much integer computing in comparison to native languages (as the profiling shows).

I apologize for my previous comment. My professional experience includes optimizing native language compilers, where we've long considered factors like sanitizers, wrapping, and undefined behavior. However, the intuitions I've developed for those languages don't apply to Nix as they actually do not share the hot path.

doc/manual/rl-next/ban-integer-overflow.md

lf- · 2024-08-11T00:51:56Z

Can you please apply the desired changes when merging it? I don't feel like fighting git today, and I am out of energy for porting this to CppNix (it has taken 10 times the round trip latency and several times the energy as my average Lix change).

lf- requested review from Ericson2314, fricklerhandwerk, roberth and edolstra as code owners July 25, 2024 21:11

lf- changed the title ~~Jade/kill int overflow~~ Ban integer overflow Jul 25, 2024

github-actions bot added documentation with-tests Issues related to testing. PRs with tests have some priority c api Nix as a C library with a stable interface labels Jul 25, 2024

lf- changed the title ~~Ban integer overflow~~ Ban integer overflow in the Nix language Jul 25, 2024

lf- force-pushed the jade/kill-int-overflow branch from ff3442d to f5bcbf5 Compare July 25, 2024 21:12

inclyc suggested changes Jul 26, 2024

View reviewed changes

doc/manual/rl-next/ban-integer-overflow.md Show resolved Hide resolved

doc/manual/src/language/syntax.md Outdated Show resolved Hide resolved

roberth reviewed Jul 26, 2024

View reviewed changes

doc/manual/src/language/syntax.md Outdated Show resolved Hide resolved

doc/manual/src/language/operators.md Outdated Show resolved Hide resolved

doc/manual/src/language/syntax.md Outdated Show resolved Hide resolved

lf- force-pushed the jade/kill-int-overflow branch 3 times, most recently from 50954d9 to 0b1d6f3 Compare July 26, 2024 13:59

edolstra reviewed Jul 26, 2024

View reviewed changes

This comment was marked as outdated.

Sign in to view

lf- added 5 commits July 30, 2024 18:13

libutil: add checked arithmetic tools

e28cb67

This is in preparation for adding checked arithmetic to the evaluator. Change-Id: I6e115ce8f5411feda1706624977a4dcd5efd4d13

docs: update to define integer overflow

bf050d9

Change-Id: Ie8a1b31035f2d27a220e5df2e9e178ec3b39ee68

doc: release notes for banning integer overflow

5878b14

Change-Id: Ib75ab5b8b4d879035d7ee7678f9cd0c491a39c0a

lf- force-pushed the jade/kill-int-overflow branch from 0b1d6f3 to 5878b14 Compare July 31, 2024 20:12

fricklerhandwerk approved these changes Aug 2, 2024

View reviewed changes

roberth approved these changes Aug 10, 2024

View reviewed changes

doc/manual/rl-next/ban-integer-overflow.md Outdated Show resolved Hide resolved

doc/manual/rl-next/ban-integer-overflow.md Outdated Show resolved Hide resolved

doc/manual/rl-next/ban-integer-overflow.md Outdated Show resolved Hide resolved

Edit docs

3cc2e2a

roberth enabled auto-merge August 11, 2024 01:31

roberth merged commit 18485d2 into NixOS:master Aug 11, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ban integer overflow in the Nix language #11188

Ban integer overflow in the Nix language #11188

lf- commented Jul 25, 2024

Mic92 commented Jul 26, 2024

roberth left a comment

roberth commented Jul 26, 2024 •

edited

Loading

edolstra Jul 26, 2024 •

edited

Loading

lf- Jul 26, 2024

roberth Jul 26, 2024

Ericson2314 commented Jul 26, 2024

This comment was marked as outdated.

fricklerhandwerk commented Jul 31, 2024

lf- commented Jul 31, 2024

nyabinary commented Aug 2, 2024

Aleksanaa commented Aug 2, 2024

Aleksanaa commented Aug 2, 2024

lf- commented Aug 10, 2024

inclyc commented Aug 10, 2024 •

edited

Loading

lf- commented Aug 11, 2024

Ban integer overflow in the Nix language #11188

Ban integer overflow in the Nix language #11188

Conversation

lf- commented Jul 25, 2024

Motivation

Context

Priorities and Process

Mic92 commented Jul 26, 2024

roberth left a comment

Choose a reason for hiding this comment

roberth commented Jul 26, 2024 • edited Loading

edolstra Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

lf- Jul 26, 2024

Choose a reason for hiding this comment

roberth Jul 26, 2024

Choose a reason for hiding this comment

Ericson2314 commented Jul 26, 2024

This comment was marked as outdated.

fricklerhandwerk commented Jul 31, 2024

lf- commented Jul 31, 2024

nyabinary commented Aug 2, 2024

Aleksanaa commented Aug 2, 2024

Aleksanaa commented Aug 2, 2024

lf- commented Aug 10, 2024

inclyc commented Aug 10, 2024 • edited Loading

lf- commented Aug 11, 2024

roberth commented Jul 26, 2024 •

edited

Loading

edolstra Jul 26, 2024 •

edited

Loading

inclyc commented Aug 10, 2024 •

edited

Loading