-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursion in builders
definitions deadlocks builds with waiting for lock on '/nix/store/...'
#10740
Comments
Triaged in Nix team meeting:
|
#1914 is maybe relevant? |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2024-05-22-nix-team-meeting-minutes-147/45835/1 |
Wow, thanks for the prompt attention :)
I would like to be able to initiate builds from any one of the machines A, B, C, etc., and recruit the other machines as builders. I am not trying to make it cyclic; that's just an effect of the remote builder implementation that I didn't anticipate. @fricklerhandwerk -- it's almost certainly a niche issue, though as you say it would still be nice to document it. As the above-linked "Developing a system that replaces nix remote build" Discourse thread documents thoroughly, the current remote builder implementation gives you not just the banana (having remote machines build stuff) but the gorilla that's holding it, and the entire jungle besides (IIUC, something like the complete Nix configuration on the remote builders, including their
Haven't touched C++ since college; maybe time to see how badly those muscles have atrophied 😅 To confirm, contributions are welcome for any/all of the following?:
|
I don't think it is; see my comment about the ofborg nodes not having remote builders. That issue is soft-blocked because some others were aware of this problem.
1 and 2 are definitely welcomed. I wouldn't recommend 3, because it is a lot more work to design develop and review, whereas I believe 2 is much simpler and covers the current use cases. Perhaps 3 is best reframed as part of a larger project to make remote building more self-configuring and/or dynamic. For instance, the scheduler doesn't discover the remote metadata such as system features. It seems that those require similar solutions, or changes in the protocol etc that benefit all scheduling related info. |
Describe the bug
I've been trying to set up remote builders following the NixOS wiki's "Distributed build" article and other resources. Because none of the machines in my home lab are particularly beefy, and I want to recruit as much compute as I can, I've got cycles in the
builders
graph; that is, machines B and C appear in thebuilders
definition for machine A, machines A and C appear in thebuilders
definition for machineB
, and machines A and B appear in thebuilders
definition for machine C (and so on).This appears to lead to deadlocks in builds, one symptom of which is the appearance of warning messages following the pattern
waiting for lock on '/nix/store/...'
.If I initiate a build on machine A, I can observe that it starts a
nix-daemon --stdio
process on machine B, and that machine B in turn starts anix-daemon --stdio
process on machine A.Steps To Reproduce
On machine A, add machine B to the
builders
definition in/etc/nix/nix.conf
. Similarly, on machine B, add machine A to thebuilders
definition in/etc/nix/nix.conf
. Then, execute a nontrivial build (e.g. rebuild a NixOS machine configuration).Expected behavior
I would expect machines defined in
builders
to either:builders
machines, orbuilders
, if it appears there.nix-env --version
outputThis is the same on the machine that initiates builds as well as on the machines in
builders
.Additional context
The proximate cause of this issue may be the same as in #2029.
Priorities
Add 👍 to issues you find important.
The text was updated successfully, but these errors were encountered: