Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0109] Nixpkgs Generated Code Policy #109

Closed
wants to merge 20 commits into from

Conversation

Ericson2314
Copy link
Member

@Ericson2314 Ericson2314 commented Oct 12, 2021

Nixpkgs contains non-trivial amounts of generated, rather than hand-written code. We want to start systematizing to make it easier to maintain. There is plenty of future work building upon this we could do, but we stop here for now to avoid needing to change any tools (Nix, Hydra, etc.).

Rendered

@Ericson2314 Ericson2314 changed the title Allow IFD in Nixpkgs, simply-stupidly, and safely [RFC 109] Allow IFD in Nixpkgs, simply-stupidly, and safely Oct 12, 2021
rfcs/0000-nixpkgs-ifd.md Outdated Show resolved Hide resolved
@ryantm
Copy link
Member

ryantm commented Oct 12, 2021

Rendered

@maljub01
Copy link

It would be useful to explain somewhere in the RFC that IFD refers to "Import From Derivation" to help people not familiar with the acronym understand it better.

I think this is particularly important for any user visible configuration, so I think it would be a mistake to call it enableIFD when we can use enableImportFromDerivation.

rfcs/0000-nixpkgs-ifd.md Outdated Show resolved Hide resolved
# Summary
[summary]: #summary

Many of us want IFD to be allowed in Nixpkgs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I find it a nice pragmatic solution that solves headaches, to me it reduces the value of Nix as evaluation and building are now mixed again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core philosophical thing the shepherd team will need to discuss I think.

To me, computation in NIx code vs computation in derivations whose outputs we import isn't the core difference. Rather, it is should the "plan" be written with the upstream sources plugged in as black boxes (like today), or not (with IFD, but also with builtins.fetchurl and doing it all in Nix, no IFD needed). That is the essential difference in approach we must discuss.

@L-as
Copy link
Member

L-as commented Oct 13, 2021

I think that for this to go through we have to fix the interaction between --store and IFD.

@Ericson2314
Copy link
Member Author

It would be useful to explain somewhere in the RFC that IFD refers to "Import From Derivation" to help people not familiar with the acronym understand it better.

I think this is particularly important for any user visible configuration, so I think it would be a mistake to call it enableIFD when we can use enableImportFromDerivation.

Done

@Ericson2314 Ericson2314 changed the title [RFC 109] Allow IFD in Nixpkgs, simply-stupidly, and safely [RFC 109] Allow "import from derivation" in Nixpkgs, simply-stupidly, and safely Oct 13, 2021
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/status-of-lang2nix-approaches/14477/28

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/status-of-lang2nix-approaches/14477/29

rfcs/0000-nixpkgs-ifd.md Outdated Show resolved Hide resolved
@grahamc
Copy link
Member

grahamc commented Oct 18, 2021

How will this impact nix-env and other evaluations which expect evaluation to be "cheap" to provide user tools like search and package lists? What if we collaborated with language ecosystems to have them implement lockfiles we can parse and interpret, like the poetry2nix which does not require IFD?

@blaggacao
Copy link
Contributor

What if we collaborated with language ecosystems to have them implement lockfiles we can parse and interpret, like the poetry2nix which does not require IFD?

Do you suggest ~ builtins.fromGoLock, etc.?

When contrasting with the motivation & and the last sentence of this RFC, that might at least be something to put into the future or alternative sections.

@L-as
Copy link
Member

L-as commented Oct 18, 2021

How will this impact nix-env and other evaluations which expect evaluation to be "cheap" to provide user tools like search and package lists? What if we collaborated with language ecosystems to have them implement lockfiles we can parse and interpret, like the poetry2nix which does not require IFD?

I don't think it would be a problem.
What we would likely do is set enableImportFromDerivation = false; by default, and then when updating the channel (or flake input in the future), we would make sure to build allImportedDerivations. nix-env and such would then do no IFD.

Also, importantly, adding lock files will not remove the need for IFD in many cases.
For example, we want an attr set of all hackage packages in Nixpkgs, and a lock file wouldn't let us avoid that in any way.
We still need the list of packages to be available at evaluation time, not build time. Otherwise you wouldn't be able to do nix shell nixpkgs#haskellPackages.pandoc.

There are however some important questions that aren't answered by @Ericson2314:

  • How do we make sure that the built derivations aren't GC-ed? How would we root them?
  • Disregarding the problem of GC, how would we implement the above for flake inputs? For channels, you could use an ad-hoc approach for only Nixpkgs.

I'm personally thinking that for the future, perhaps explicitly building a new Nixpkgs is the best approach. E.g. you'd make a copy of Nixpkgs where you'd replace every instance of import drv and similar with import (builtins.storePath /nix/store/...). This isn't trivial, however. I think we should explore that idea more before settling on an approach.

@L-as
Copy link
Member

L-as commented Oct 18, 2021

Nominating myself for shepherdship.

@Ericson2314
Copy link
Member Author

I don't think it would be a problem. What we would likely do is set enableImportFromDerivation = false; by default, and then when updating the channel (or flake input in the future), we would make sure to build allImportedDerivations. nix-env and such would then do no IFD.

Yes my thoughts exactly. The IFD stuff would start as just a few extra things, and the user that ops in to using enableImportFromDerivation = true; would be be responsible for either not minding the IFD or building allImportedDerivations first.

@Mic92 Mic92 added the status: open for nominations Open for shepherding team nominations label Nov 3, 2021
@edolstra edolstra changed the title [RFC 109] Allow "import from derivation" in Nixpkgs, simply-stupidly, and safely [RFC 0109] Allow "import from derivation" in Nixpkgs, simply-stupidly, and safely Nov 3, 2021
@L-as
Copy link
Member

L-as commented Nov 16, 2021

Do we have any other nominations?

rfcs/0000-nixpkgs-ifd.md Outdated Show resolved Hide resolved
Right now, to deal with these packages, we either convert by hand, or commit lots of generated code into Nixpkgs.
But I don't think either of those options is healthy or sustainable.
The problem with the first is sheer effort; we'll never be able to keep up.
The problem with the second is bloating Nixpkgs but more importantly reproducability: If someone wants to update that generated code it is unclear how.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is more documentation needed we should rather write and improve documention IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuperSandro2000 to me the documentation argument is like saying "what's the problem with saying what apt-get to run in the readme?" as is brought up in https://twitter.com/a_cowley/status/1463923361550159880.

The answer is that no amount of a documentation is a substitute for automation. I don't want to read about a number of steps, I want to run 1 command that works as purely as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that would also be possible without IFD if better automation would be written. Also we would still need to lock some input versions which makes it impossible to make fully reproducible but less which would probably make it easier to be reproducible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree with both of those statements. Vendoring generated code alone doesn't required IFD. IFD just allows the use of vendoring to be minimized.

It *must* have in its run-time closure any derivation output that Nixpkgs with `enableImportFromDerivation = true` imports.
\(CI will verify these conditions as described in the next subsection.\)

3. Any code vendored in Nixpkgs *must* correspond to code produced in an imported derivation, so the code can be mechanistically re-vendored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we still vendor all the files I don't see the benefit of IFD. Do we want to get the tooling there to better support IFD? That sounds like nixpkgs would be a playground which should be out of tree IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't vendor any more than today. And in fact we should vendor less: instead of vendoring all of hackage, we should just vendor enough to build cabal2nix, and bootstrap everything else wit IFD from that.

But I don't want to put in the RFC what we should and shouldn't vendor, just establish the principles by which vendoring is allowed today.

Even if we kept all Hackage vendored as today, it would be still better if you could just update a fetchgit of the relevant from data from Nixpkgs and then regenerate everything purely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in fact we should vendor less

but only when we can remove parts of the vendored files which would require the final nix user to have IFD enabled which is not going to happen anytime soon if I understood it correct. And that is my main point: this will only come in really useful if we don't need any vendor files anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuperSandro2000 I'm a bit confused?

which is not going to happen anytime soon if I understood it correct

Yes, I do say I don't want to immediately crate a big dust-up making haskellPackages IFD only.

What I rather do is go in the other direction. There are many would-be things like haskellPackages that don't get merged because we don't have an appetite for more generated code. Your response to @sternenseemann's wanted to add a second stable package set, NixOS/nixpkgs#138407 (comment), is in fact a prefect example of this! :)! Let's go those things out of limbo by making them use IFD instead of vendoring, and them pioneer the new IFD approach.

I hope that well become a success, and we will then be open to using IFD for things that are vendored today, like the default haskellPackages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One prospect could in fact be to remove every haskell package that is marked as broken presently from hackage-packages.nix since those are never touched by CI anyways. We have been hesitant to do this so far, because some of those tend to fix themselves after a while or are really easy to unbreak, so removing them ruins the contributing experience.

Requiring local users to enable ifd to (try) to build these could actually be a fairly decent option.

@DavHau
Copy link
Member

DavHau commented Jul 20, 2023

I propose @DavHau to become a shepherd

Hello, I'm already part of the shepherd team.

Hey, I want this to be either completely rewritten starting with clearer goals and then actually aligned with them, or rejected!

A complete rewrite is basically like replacing the RFC with another RFC.
If we agree that this needs to be done, then let's please close the RFC as rejected, and create a new one, as this is the least painful way of doing a re-write.

I also think that before a new RFC is published, a working group should be created, that studies the topic properly and comes up with an actionable plan.

RFC being rejected will be disappointing to everyone

I don't think this is true. Lessons learned can be implemented in a new, higher quality RFC with clearer goals and an actionable plan, making it easier for everyone to engage.

This RFC has already been rewritten entirely once which was quite confusing even more so for people who joined the discussion later. Let's not do this again.

Let's start talking with @DavHau

I already engaged myself in the discussion a bit, more on matrix than here. It seemed like there were a number of points different shepherd members disagreed on.
The RFC not defining well what the goals are doesn't support having productive discussions.

I think, what we all agree on, is that studying the ecosystem is a required first step before further discussions make sense.
I think before this has happened, we can't really design a useful RFC.

To be completely honest, currently I'm quite unsure what to think of this RFC. The reasons are:

I'm not sure if the policy first approach will solve much.
Even if we have a policy, somebody has to invest time and effort to implement it and that is going to be the biggest bottleneck or even more like a blocker. Instead of starting out by making a policy, we could:

  1. study and understand the problems better
  2. decide on a solution
  3. implement a solution demo
  4. deliver the solution to nixpkgs in a small scale and prove that it actually works
  5. then extend it and show that it actually works well for more than a single ecosystem
  6. learn about some more unexpected shortcomings of the solution
  7. fix some more problems
  8. make the solution more generic and simpler to adopt by other maintainers
  9. if they still don't want to adopt it -> then make a policy that forces them to do so.

TL;DR: starting with the policy first goes the opposite way I envision that process.

Another major thing is that I believe, having reproducible code generation for all ecosystems is quite hard to achieve and not really important.

If it is in fact a goal of the RFC to make nixpkgs suitable for the needs of people coming from language specific package managers (as stated in the motivation section), then reproducible code generation is one of the least important aspects of it.

What's important for most users is that there is a reliable way of translating their project to nix at all.
I think most users wouldn't care less if their code or lock file is generated reproducibly or not. Because once it's generated, it will be reproducible anyways.

Maintaining a globally coherent package set with a single blessed version for each package, like nixpkgs, is a nice thing for building a distro, but when it comes to development environments, we need to be able to resolve packages dynamically and build dependency trees dynamically. Supporting this at all for a number of ecosystems and integrating this approach with nixpkgs would be so much more important than doing it reproducibly.

This boils down to having a good module system for packages, version independent build logic (aka. community overrides), good customization interfaces, good user interfaces etc., none of which seems to be a goal of this RFC.

Therefore I'm unsure if my priorities are aligned with the RFC.

I'm very happy to engage in meetings if something is organized and help developing ideas. But I'm not going to be the major force driving this thing. At least not until some basic confusions are clarified and I understand the goals better.

@ghost
Copy link

ghost commented Jul 20, 2023

The RFC not defining well what the goals are

I think the driving motivator is the desire to be able to use crate2nix in nixpkgs.

We made an exception for Haskell; they're allowed to check in generated code. I guess we could start an RFC to make a second narrowly-defined exception.

@KFearsoff
Copy link

I'm not sure if the policy first approach will solve much. Even if we have a policy, somebody has to invest time and effort to implement it and that is going to be the biggest bottleneck or even more like a blocker. Instead of starting out by making a policy, we could:

1. study and understand the problems better

2. decide on a solution

3. implement a solution demo

4. deliver the solution to nixpkgs in a small scale and prove that it actually works

5. then extend it and show that it actually works well for more than a single ecosystem

6. learn about some more unexpected shortcomings of the solution

7. fix some more problems

8. make the solution more generic and simpler to adopt by other maintainers

9. if they still don't want to adopt it -> then make a policy that forces them to do so.

TL;DR: starting with the policy first goes the opposite way I envision that process.

I agree. This seems like a much saner approach.

Another major thing is that I believe, having reproducible code generation for all ecosystems is quite hard to achieve and not really important.

If it is in fact a goal of the RFC to make nixpkgs suitable for the needs of people coming from language specific package managers (as stated in the motivation section), then reproducible code generation is one of the least important aspects of it.

What's important for most users is that there is a reliable way of translating their project to nix at all. I think most users wouldn't care less if their code or lock file is generated reproducibly or not. Because once it's generated, it will be reproducible anyways.

Yes! Thank you for voicing that, because I very much did not succeed in bringing that up. I'll decompose it into a problem statement (as I see it):

  1. Nix upholds strong guarantees of reproducibility: if something is pinned and it works on one machine, it will on the other
  2. Nix unifies pipeline: you use the same source of truth for development tools, pre-commit checks, lints, libraries, CI. You add dependencies with Nix, you build with Nix, you check with Nix and you build Docker containers with Nix (more optimized than you could by hand!)
  3. Those factors make Nix very desirable for developers and devops engineers, because it removes a lot of the commonplace complexity
  4. Adopting Nix is hard: you have to jump through numerous hoops to get all of that, and it gets harder the more your language benefits from Nix; packaging Rust or Go is fairly easy, but they also have a unified pipeline and strong reproducibility; Python has none, and it's hell to package.

This boils down to having a good module system for packages, version independent build logic (aka. community overrides), good customization interfaces, good user interfaces etc., none of which seems to be a goal of this RFC.

Yeah, good point. I suppose it won't help developers too much to have an easy way to migrate their project to Nix, because if something breaks, developers will have to learn about overlays, overrides, deep-replacing and other fun Nix internals. I would probably quit Nix if I were in that position, because the promise of "easy to use" had been broken once the first issue arose (even if a solution is trivial, like downgrading a dependency!)

@KFearsoff
Copy link

I'll write it in a separate comment for visibility, but I think we need to decide if we want this RFC for Nixpkgs maintainers or for regular developers that are interested in Nix.

  1. If we want this RFC for Nixpkgs maintainers, how much would it actually help? My guess is "not much", because the current solutions work despite being ugly, and developing a solution with feature parity is a hard problem.
  2. If we want this RFC for developers, should we close this RFC? We already have a solution for developers (*2nix tooling), and it's in nix-community. I think it's good enough at this point, given that we can't upstream it to Nixpkgs.

So basically, what I'm trying to say is, would us standardizing the autogen code be valuable and feasible at this point? I now have my doubts about both; perhaps a better way to go about it would be from bottom-up. First developing the tooling that autogens code in standardized way (dream2nix), and then having an RFC to adopt it in the Nixpkgs. What do you say?

@7c6f434c
Copy link
Member

developing a solution with feature parity is a hard problem.

And, separately, unifying quite different approaches of different ecosystems would probably lead to a leaky abstraction.

@uri-canva
Copy link

@amjoseph-nixpkgs

We made an exception for Haskell; they're allowed to check in generated code. I guess we could start an RFC to make a second narrowly-defined exception.

Apologies if I get it wrong since I don't know much about this, but is the exception related to the fact the Haskell community already has the concept of a closure of dependencies at specific versions that are all compatible with each other, and a non-nix implementation of it, https://www.stackage.org, that translating that into a package set in nixpkgs made sense, and made it relatively nix idiomatic, when compared to other language ecosystems?

That would be very different from consuming third party packages from other language ecosystems, where there isn't a single dependency closure across the ecosystem, but rather each package resolves its own dependency closure.

@ghost
Copy link

ghost commented Jul 24, 2023

is the exception related to ... https://www.stackage.org

No.

There are only 2,613 packages in stackage but 17,556 packages in the auto-generated ./pkgs/development/haskell-modules/hackage-packages.nix. If stackage were the source of the exception it would cover less than 15% of the auto-generated material. Obviously we're allowing the other 85% for some non-stackage reason.

More details on the role of Stackage in pkgs.haskell curation. Note in particular the bit about how we curate newer versions when Stackage is lagging behind Hackage.

@uri-canva
Copy link

Right, but I mean:

  1. a single dependency closure of package versions across an entire language ecosystem, with language packages mapped 1:1 to nixpkgs packages
  2. separate dependency closures for each package exposed as a nix package, with language package dependency closures mapped 1:1 to nixpkgs packages

are two completely different things, so the existence of pkgs.haskell, and its use of code generation, is an example of 1, and could be used to justify further instances of that pattern, but not necessarily to justify code generation to be used for 2.

By thinking of it that way, maybe we can make the new RFC more focused on the specific problem we want to solve.

Again sorry if I'm saying something that doesn't make sense, I'm missing a lot of context, but I do have interest in 1 specifically, so I'm trying to figure out if this (or the next) RFC is the right place to have the discussion on that idea.

@KFearsoff
Copy link

Again sorry if I'm saying something that doesn't make sense, I'm missing a lot of context, but I do have interest in 1 specifically, so I'm trying to figure out if this (or the next) RFC is the right place to have the discussion on that idea.

You are making a lot of sense. I'm specifically talking about 2. I think the difference between 1 and 2 is a major one, and it requires completely different ways to think and solutions, so I would be in favor of creating a new RFC for 1 specifically, as I think the current RFC text is very 2-favored but errs on the safe side of 1 (there could also be an argument to splitting the current RFC into separate RFC 1 and RFC 2).

@tomberek
Copy link
Contributor

A complete rewrite is basically like replacing the RFC with another RFC.
If we agree that this needs to be done, then let's please close the RFC as rejected, and create a new one, as this is the least painful way of doing a re-write.

It seems recent discussion is regarding the process and there is a proposal to close and re-write/re-start. The shepherd team should meet and decide if this is the course of action. If a meeting is too much, let's at least establish a clear proposal and decision asynchronously.

Comment on lines +4 to +7
author: John Ericson (@Ericson2314)
co-authors: (find a buddy later to help out with the RFC)
shepherd-team: @L-as @sternenseemann @tomberek @DavHau
shepherd-leader: @sternenseemann
Copy link
Member

@lheckemann lheckemann Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
author: John Ericson (@Ericson2314)
co-authors: (find a buddy later to help out with the RFC)
shepherd-team: @L-as @sternenseemann @tomberek @DavHau
shepherd-leader: @sternenseemann
author: @sternenseemann
co-authors: @Ericson2314
shepherd-team: @L-as @kfearsoff @tomberek @DavHau
shepherd-leader: @tomberek

Apologies for the previous suggestion, where I mistakenly missed out making sternenseemann the author.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I thought @Ericson2314 wrote this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KFearsoff
Copy link

Pinging everyone involved!

As discussed a little bit earlier, I propose to close this RFC as rejected. After that, two separate RFCs will be put out (sorry if the names are bad, just an example!):

  1. Refactoring codegen in Nixpkgs
  2. Bridging language ecosystems with Nixpkgs

The first RFC will take care of bettering the current codegen infrastructure in Nixpkgs: pkgs.haskell, Vim plugins, etc. This is a very hands-on task, and one I don't feel like I'm knowledgeable about to lead

The second RFC will basically be a draft for quite some time. @DavHau summarized very well the conditions that must be met in order for the bridging to be effective. And actually, the RFC can be much less ambitious, too! If we can get a good UI/UX going, then we don't even have to change Nixpkgs itself - we can focus on guiding users to use *2nix tooling for their own projects and not talk about Nixpkgs at all, or we could put out separate registries for different package ecosystems, or... a lot of things

I feel like the spirit of the initial RFC text was about the RFC number 2, and iterated rewrites lead to it becoming a confused mix of both. I think it would be for the best to split into two new RFCs, because while the discussion points in this PR are invaluable, it is becoming harder and harder to navigate the conversation due to the confusion surrounding the RFC's goals

@infinisil
Copy link
Member

All shepherds except @L-as confirmed that they're okay with closing it, does that also sound good to you @L-as?

Since these tools must be only invoked inside the derivations to generate code, the impure inputs must be gotten via fixed output derivations.
This might require changes to those tools to separate the pure work from the impuire gather steps.

Additionally, as @7c6f434c point out, some upstream tooling thinks it is being pure, but the "lock files" (or similar) pinning mechanism it provides isn't up to the task for Nix's purposes.
Copy link

@chayleaf chayleaf Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can give a specific example of Gradle lockfiles. It specifies the package and its version, but it doesn't specify its hash. There is a separate mechanism for storing dependencies' hashes, but due to the way Gradle works the same version may be fetched from a different repository in a different part of the build graph, which may have a different hash, but Gradle only stores one hash per dependency

@L-as
Copy link
Member

L-as commented Nov 1, 2023

All shepherds except @L-as confirmed that they're okay with closing it, does that also sound good to you @L-as?

Yes, sorry for not noticing this message.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/cargo-lock-considered-harmful/49047/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet