Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0155]: NixOS Migrations #155

Closed
wants to merge 1 commit into from
Closed

Conversation

Fresheyeball
Copy link

@Fresheyeball Fresheyeball commented Jul 5, 2023

Making this an RFC as requested by @fricklerhandwerk and after doing a review with @roberth. This project is intended for Sovereign Tech Fund grants.

@samueldr

This comment was marked as resolved.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Jul 6, 2023

This refers to the Sovereign Tech Fund grants that support sustainable open source development. @Fresheyeball proposed applying with this project. I also think that would be very valuable for the ecosystem, but due to its far-reaching nature it seems sensible to first make sure the design is agreeable before planning implementation.

@Fresheyeball Fresheyeball changed the title [RFC 0135]: NixOS Migrations [RFC 0155]: NixOS Migrations Jul 6, 2023
@RaitoBezarius
Copy link
Member

RaitoBezarius commented Jul 8, 2023

I discussed a lot about solutions to that problem, notably in the ensure-style options issue, while I appreciate the effort, I feel like it's not ambitious enough and will cause more churn in the long run as it won't be able to address the core issue: restoring the ability to NixOS to rollback even from data in some scenarios.

I don't know what is the timeline for the STIF grant, etc. But I can spend some time explaining my full vision of it and what are the steps to take, there's also some academical research involved on that subject, that I do, if people are interested.

@fricklerhandwerk
Copy link
Contributor

It would be a great start if we used this RFC as a place to compile everything we know and then go from there.

@roberth
Copy link
Member

roberth commented Jul 8, 2023

It would be a great start if we used this RFC as a place to compile everything we know and then go from there.

May I recommend to use discourse and a nixpkgs tracking issue instead? Overly long discussions in GitHub comments are a pain to use and refer to. RFC discussions are particularly annoying because they incentivize long threads (due to deep topics) while disincentivizing splitting into multiple threads that would keep things manageable. A tracking issue allows for branching off, and summarizing back. Similarly for discussions on discourse.

@infinisil
Copy link
Member

I'd like to highlight #138, which proposes to use repositories for RFC development and discussion instead, which would improve this kind of thing.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going from the current state of manual migrations and a global stateVersion straight to automatic migrations sounds a bit dangerous. I'd rather have a plan to just move to a per-service stateVersion-like approach first, which would fix the problem of a global stateVersion, be fairly straightforward, and safe.

@edolstra edolstra added the status: open for nominations Open for shepherding team nominations label Jul 12, 2023
@edolstra
Copy link
Member

This RFC is now open for shepherd nominations!

@JamesofScout
Copy link

I think it might be an good idea to add an additional field of init to describe the first time a service is created as some Software has additional imperative steps on the first time they are created (like creating the database manually)

Comment on lines +48 to +58
# Down Script

A place in the Nix Module system for describing the imperative steps required to migrate from the current version to the previous version. This will run as a systemd "oneshot" service, to take advantage of the standard architecture.

```Nix
{
down.script = "my-service.up" ''
${pkgs.my-service-cli} --run-migration /etc/service-state
'';
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about things that don't have a way to downgrade, not everyone provides scripts for migrating to a old version.
We could link a backup of the current state to the corresponding genration when rebuilding and get it back that way, yes this would loose new data but it would be good enough to test if the migration worked in $userEnv and roll back if it didn't.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not everyone provides scripts for migrating to a old version

Even worse, not everyone provides instructions for it. Honestly, I'd be more surprised to learn that somebody does. There are upgrade guides for software, but almost never downgrade guides. Which means that it's left as an exercise for the reader, which ends up multiplying the workload tenfold.

@tomberek
Copy link
Contributor

tomberek commented Aug 9, 2023

This RFC has not acquired enough shepherds. This typically shows lack of interest from the community. In order to progress a full shepherd team is required. Consider trying to raise interest by posting in Discourse, talking in Matrix or reaching out to people that you know.

If not enough shepherds can be found in the next month we will close this RFC until we can find enough interested participants. The PR can be reopened at any time if more shepherd nominations are made.

See more info on the Nix RFC process here

Copy link

@KFearsoff KFearsoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the RFC!

That said, I'm not sure if actually implementing it is feasible. The whole area surrounding migrations if usually quite hand-wavy and unspecified, and I'm not sure if Nixpkgs maintainers can provide a good UX if the developers of the software didn't (as is evident by them not having written a migration script).

Nix can even be an obstacle to doing so, as often these steps are dissonant with the methodologies required for declarative and immutable structures

I think this is closer to the actual problem that can be solved. Perhaps we can document better on how to upgrade the services given the NixOS constraints.

I'd rather have a plan to just move to a per-service stateVersion-like approach first, which would fix the problem of a global stateVersion, be fairly straightforward, and safe.

Yes please!

On the less related note: this RFC explores the concept of NixOS managing state too. I like having that direction explored, and I wonder if there are people who want NixOS to do more state lifecycle management (things like Impermanence, maybe even segregation of state into different buckets, suggesting backups etc.), I certainly would like to be on the receiving end of it (and not like to be on the implementing end of it lol).

Comment on lines +48 to +58
# Down Script

A place in the Nix Module system for describing the imperative steps required to migrate from the current version to the previous version. This will run as a systemd "oneshot" service, to take advantage of the standard architecture.

```Nix
{
down.script = "my-service.up" ''
${pkgs.my-service-cli} --run-migration /etc/service-state
'';
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not everyone provides scripts for migrating to a old version

Even worse, not everyone provides instructions for it. Honestly, I'd be more surprised to learn that somebody does. There are upgrade guides for software, but almost never downgrade guides. Which means that it's left as an exercise for the reader, which ends up multiplying the workload tenfold.


```Nix
{
up.warn = ''

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a warning is sufficient here. It would be nice if there was "immediately stop and force the user to read the warning and confirm if he accepts the risk" thing in Nixpkgs, but there is none. So I think a better way would be to throw an error outright and have the way to bypass it with something like forceMigration = true; in the config.

This is kinda like the thing with unfree software, in a way. We REALLY don't want to do anything potentially problematic, so it's better to be left opt-in.

}
```

`$BACKUP` is a path to a temporary filesystem location which will be deleted upon completion of the migration. This provides a temporary location to backup state if needed. `up.backup` is a hook that will run before `up.script`. If `up.script` encounters an error, `up.restore` is run to ensure that the failed migration does not result in contamination of the system. These options are available in both `up` and `down` definitions. This roughly allows for transaction-like logic for the migration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$BACKUP is a path to a temporary filesystem location

I really hope you don't mean it's on tmpfs. I think a different wording should be used here.

Regardless, I don't think it's a good idea to clear backups as part of a script, even if the script ran successfully. There might be an error in the logic for all you know. Lastly, we REALLY need this to be as transactional as possible, though thankfully there's a lot of know-how in the Nix space around that.


# Testing

Extend the NixOS VM Test framework to ergonomically test migrations in an automated fashion. Migrations should be accompanied by VM tests demonstrating that migrations succeed from a clean service state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While that's eons better than nothing, the problem with migrations isn't really when your service state is clean; it's when it's not. And I'm afraid we can't really test for that.

@kevincox
Copy link
Contributor

kevincox commented Sep 6, 2023

@KFearsoff are you interested in being a shepherd for this RFC.

Also a reminder that this week is NixCon, a great time for everyone attending to talk to potential shepherds in person.


While the current NixOS Configuration system works incredibly well for immutable services, not all services are immutable, and we currently lack facilities to handling imperative steps needed to upgrade or downgrade complex stateful services. Nix can even be an obstacle to doing so, as often these steps are dissonant with the methodologies required for declarative and immutable structures.

Let's take the example of GitLab. GitLab is notoriously hard to upgrade, and the current NixOS module system produces major obstacles to upgrades of this nature, as GitLab expects the user to run many imperative steps to modify stateful parts of the application such as the database and configuration files. The need to migrate stateful portions of an application to new versions is nothing new; database migrations are standard practice and can provide structure and inform the concerns of module migrations.
Copy link

@yu-re-ka yu-re-ka Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a GitLab package/module maintainer, I can say that GitLab is a bad example. The things said in this paragraph are not generally true for GitLab.
PostgreSQL major upgrades would be a good example.


## Up Script

A place in the Nix Module system for describing the imperative steps required to migrate from the previous version to the new version. This will be run as a systemd "oneshot" service, to take advantage of the standard architecture.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if migrations were not fully automatically triggered, for example for nixos-unstable users it could be important to make a backup and supervise the process.

@tomberek
Copy link
Contributor

@fricklerhandwerk is this on-hold due to grant status? This RFC does not yet have enough shepherds.

@fricklerhandwerk
Copy link
Contributor

As the author @Fresheyeball should determine if he has time to drive the process and how to proceed.

@edolstra
Copy link
Member

edolstra commented Nov 1, 2023

This RFC is being closed due to lack interest. If enough shepherds are found this issue can be reopened. If you don't have permission to reopen please open an issue for the NixOS RFC Steering Committee linking to this PR.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/community-calendar/18589/94

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/what-about-state-management/37082/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: open for nominations Open for shepherding team nominations
Projects
None yet