Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0107] Nixpkgs version attribute usage normalization #107

Closed
wants to merge 19 commits into from

Conversation

AndersonTorres
Copy link
Member

@AndersonTorres AndersonTorres commented Sep 24, 2021

This RFC proposes to normalize the version attribute used in Nixpkgs' expresions.

Rendered!

@AndersonTorres AndersonTorres changed the title Nixpkgs version normalization [RFC 0107] Nixpkgs version normalization Sep 24, 2021
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/brainstorming-for-rfc-pname-and-version/12873/102

@AndersonTorres AndersonTorres changed the title [RFC 0107] Nixpkgs version normalization [RFC 0107] Nixpkgs version attribute usage normalization Sep 24, 2021
@figsoda
Copy link
Member

figsoda commented Sep 24, 2021

I don't really like 0_0, imo 0.0 or 0.0.0, or 0 looks better
as an alternative to the +unstable=<date> scheme, I like the idea of something like 0.0.0-unstable.2021-09-24+linux5.10.68, which follows semantic versioning
instead of the word unstable, dev, pre, and post could be good candidates since they are shorter

@AndersonTorres
Copy link
Member Author

AndersonTorres commented Sep 24, 2021

I don't really like 0_0, imo 0.0 or 0.0.0, or 0 looks better

Using 0.0 as a default can be potentially confusing in the case of a 0.0 official release. I chose 0_0 because it is way less common.

I like the idea of something like 0.0.0-unstable.2021-09-24+linux5.10.68, which follows semantic versioning

The idea is using a (+key=value)* format; it is more amenable to sed+awk-parsing.

instead of the word unstable, git (or any other vcs) and dev could be good candidates since they are shorter

I am striving to make this specification VCS-agnostic. If e.g. the program team changes the VCS, it will not affect version attr.

@figsoda
Copy link
Member

figsoda commented Sep 24, 2021

Using 0.0 as a default can be potentially confusing in the case of a 0.0 official release.

I don't think that is a problem

0.0 is a very unpopular version name, the only times i've seen a 0.0.0 (or anything alike) is for name squatting, which are essentially empty packages that shouldn't be packaged in repositories

in the case of 0.0 actually existing, something like 0.0+unstable=2021-09-24 isn't really ambiguous as it just sounds like a pre-release of 0.0. maybe we can use pre instead of unstable to be more clear about that

@AndersonTorres
Copy link
Member Author

AndersonTorres commented Sep 24, 2021

pre is discouraged. Look at this repology-updater thread:

It must be post-known-version format. E.g. if the latest known version is 0.4.7, the snapshot must be 0.4.7something, not 0.4.8something, because there should be no guessing on what the next version would be.


as it just sounds like a pre-release of 0.0. maybe we can use pre instead of unstable to be more clear about that

If we use 0.0 as a default and a project uses 0.0 as a release in the future, it becomes confusing if 0.0+unstable=2021-09-24 refers to a snapshot that was released before or after 0.0, defeating completely the monotonicity.

this order:
- the version of the latest labeled snapshot, as defined above;
- If the project never released a labeled snapshot, `0_0` should be used
as default.
Copy link

@hyperfekt hyperfekt Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the change from 0_0 to i.j impact lexicographical ordering?
I'm thinking most (all?) systems probably use ASCII values to sort, but it'd be good to both research this and explicitly consider it.

Copy link
Member Author

@AndersonTorres AndersonTorres Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version ordering is the same as used by Nix itself, being registered in the nix-env manpage.

I am treating it as previous knowledge here.

The idea of 0_0 is that, being an uncommon numbering scheme (underscores are not employed in Nixpkgs version attrs, as far as I remember), it will not clash anything.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about systems other than Nix as well, as other programs operating with nixpkgs data may be using the version attribute for their own purposes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see in the nix repl 0_0 sorts as greater ('newer') than for example 0.1.

Copy link
Contributor

@kevincox kevincox Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also against 0_0. It is not very intuitive for readers. If we need to put a number to satisfy parseDrvName I think we should just use 0 or something more obvious like 0unreleased

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"0_0" is [0, "_", 0] thus smaller than "0.0" but larger than "0"

Lexicographic comparisons applies to string components after splitting.

"0.pre" seems to be smaller than anything not starting with "0.pre.pre" (dot after zero is optional). I hope that "0.pre.pre" upstream version will not be a frequent issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the man page does not mention this but both . and - are special-cased separators that don't become part of any component! I had assumed the . was part of the evaluation.

Copy link

@milahu milahu Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holy cow. just use sort -V aka sort --version-sort

printf "%s\n" 0.0.0 0.0.1 0.0 0_0 1.0 0.1 2.0 \
1.0-unstable-2023-02-04 1.0+unstable-2023-02-04 \
1.0-unstable.2023-02-04 1.0+unstable.2023-02-04 \
| sort -V

0.0
0.0.0
0.0.1
0.1
0_0
1.0
1.0+unstable-2023-02-04
1.0+unstable.2023-02-04
1.0-unstable-2023-02-04
1.0-unstable.2023-02-04
2.0

nix should produce the same sort order, otherwise i would call that a nix bug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix does not produce the same order, pre is special-cased to sort earlier than in sort -V, and this is not a bug but a reasonable decision, compatibility with which we want to keep to avoid globally-breaking changes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah, makes sense

builtins.sort (a: b: (builtins.compareVersions a b) < 0) [ "000" "00" "0_0" "0" "0.0" "1.0" "1.0pre" "1.0-pre" "1.0post" "1.0-post" "1.0-unstable-2023-02-04" "1.0-unstable.2023-02-04" ]

[ "000" "00" "0" "0_0" "0.0" "1.0pre" "1.0-pre" "1.0" "1.0post" "1.0-post" "1.0-unstable-2023-02-04" "1.0-unstable.2023-02-04" ]

@hyperfekt
Copy link

hyperfekt commented Sep 24, 2021

The usage of only unstable assumes that there is a single dimension along which versions evolve. Is it worth it to specify a rule that handles the version attribute if we were to include (potentially) multiple (possibly temporarily) diverging branches in nixpkgs?
Specifically, are we planning to change the name attribute (as suggested on the Discourse) for every switch between branches? Not every divergence may be as consistent or long-lived as to warrant that.

@figsoda
Copy link
Member

figsoda commented Sep 24, 2021

Using 0.0 as a default can be potentially confusing in the case of a 0.0 official release. I chose 0_0 because it is way less common.

Both are extremely unpopular, 0_0 is a less conventional format but it can still exist in the wild, and I still personally prefer using .s in between. Maybe just 0 could be good too?

The idea is using a (+key=value)* format; it is more amenable to sed+awk-parsing.

I like this idea. it looked a little weird to me initially but it is growing on me. regex can still be used if we end up with the semver variation

I am striving to make this specification VCS-agnostic. If e.g. the program team changes the VCS, it will not affect version attr.

I agree. I removed git from my original suggestions

pre is discouraged. Look at this repology-updater thread:

I don't think it's a problem for 0.0 versions, for other versions we can use post instead

@AndersonTorres
Copy link
Member Author

The usage of only unstable assumes that there is a single dimension along which versions evolve. Is it worth it to specify a rule that handles the version parameter if we were to include (potentially) multiple (possibly temporarily) diverging branches in nixpkgs?

It was an idea @7c6f434c pointed me out in the RFC brainstorm. I didn't find a clear wording to this at that time.

On the other hand, I think that it should not be encoded directly in version.

Suppose we have a program, funny, whose team employs two branches of development, say long-term-support and a bleeding-edge. (Indeed, it is an almost perfect description of the world before Python-2.x deprecation!)

I don' think this situation should be encoded in version at all. It looks more like a pname thing.

@hyperfekt
Copy link

hyperfekt commented Sep 24, 2021

My understanding is that the case you are describing is for packages that are present in nixpkgs in both variants - would it be confusing (for humans or programs) to change the pname for a single package, possibly even in quick succession? For example, we may want to switch to a newer (unreleased) commit of a stable branch because it includes some critical fix but not imply with that that we are now following the development branch.

Copy link
Contributor

@kevincox kevincox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. It is important to have some documented conventions around this.

I suspect it will be interesting to bikeshed for a while but at some point it will be best to cut the discussion at what we have and if needed future RFCs can change the format if they have a strong reason to do so.

this order:
- the version of the latest labeled snapshot, as defined above;
- If the project never released a labeled snapshot, `0_0` should be used
as default.
Copy link
Contributor

@kevincox kevincox Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also against 0_0. It is not very intuitive for readers. If we need to put a number to satisfy parseDrvName I think we should just use 0 or something more obvious like 0unreleased

@hyperfekt
Copy link

hyperfekt commented Sep 24, 2021

something like 0.0+unstable=2021-09-24 isn't really ambiguous as it just sounds like a pre-release of 0.0

This brings me to another finer point about lexicographic ordering: What DO we do about pre-releases? Using the previous version number is misleading, and using a suffix will sort them as newer than the actual release. I think maintaining lexicographic ordering in line with chronology is worth it to avoid subtle bugs in version checks. If we do not do that, we should maintain very explicit rules about what is allowed and how to sort it.

@hyperfekt
Copy link

hyperfekt commented Sep 24, 2021

Sorry if I'm being very particular here, I just think we have a chance to get all the version discussions behind us at once and for all, and while things are still nicely fluid as well. I really appreciate this RFC rationalizing the whole affair and would like to maximize that effect (to a reasonable degree).

- Branch denotes a logical sequence of snapshots, as identified by the program's
team;
- Usually these branches are denoted by names such like `stable`, `master`,
`unstable`, `trunk`, `experimental`, `staging` etc.;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or by major version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A major version is more akin to a labeled snapshot, not a logical sequence of them. Indeed, major versions don't necessarily follow a continuous sequence. An example is Zig: between 0.8.0 and 0.8.1 there are many small, discrete snapshots.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These only differ in patchlevel anyway.

I mean that PostgreSQL has branches 13.x and 14.x and some more.

Linux kernel, annoyingly, has stable branches differing just in minor version (although major.minor is just formatting for Linux kernel versions). So maybe a branch can be called by a version prefix instead of a name?

@7c6f434c
Copy link
Member

Using the previous version number is misleading, and using a suffix will sort them as newer than the actual release.

I think at least Nix does sort pre before unsuffixed.

@hyperfekt
Copy link

hyperfekt commented Sep 25, 2021

I think at least Nix does sort pre before unsuffixed.

Good point. We could canonize this for minimal change (but it means breaking with lexicographical ordering in both the Nix language and outside consumers) as an example of the kind of explicit rules I mentioned; or think about an alternative that universally preserves correct ordering. The only ones I can think of require a mandatory suffix for all versions, so that isn't very attractive either.

@rhendric
Copy link
Member

One issue that I don't think has been discussed is that some ecosystems of packages (Python libraries, as a running example) may have their own definition of what constitutes a valid version string (PEP 440) and tools that expect versions to conform to that definition. I've seen comments to the effect that it's useful to end users if they see versions that match the versions used by upstream/in other package managers; I'm piping up to note that it's also useful to automated tools (update-python-libraries) if they have access to versions that are compatible with the version strings they expect for their ecosystem.

Taking a step back: this RFC has seen a lot of discussion and the most recent discussion seems to be in the direction of designing the one version string format to rule them all, which strikes me as a poor use of everyone's time—it's been tried by so many others, and no true consensus has emerged; with no disparagement meant to any of the intelligent folks commenting here, why should any of us believe that we're situated to do better?

Given the broad agreement that the status quo is bad, and the absence of a consensus on what is best, I propose restructuring this RFC as something like this:

  • The version number of a package, stable or unstable, should conform to the nomenclature used by upstream, with any initial non-digits removed.
  • A passthru.compareVersion attribute, if it exists, should be a function that can compare a provided version string to this package's version; this is for Nix tools.
    • We should lib some common schemes to use here, so maintainers can write passthru.compareVersion = lib.versions.compareWithSemver version; for standard cases.
  • Recommend a version scheme for use as a default if the upstream nomenclature is underspecified or somehow technically incompatible with Nix's versioning requirements—this is where we can bikeshed about unstable or post or patched or whatever, but it's just a recommendation so maintainers can use their judgment with edge cases, and we don't have to spend years talking about it.

This would surely be better than the status quo, would keep automated tools and our relationship with Repology working, would keep Nix store paths friendly to globbers, would keep versions as strings without making those strings have to look like data structures encoded as URL query strings, would give individual special packages ways to continue to be special without amending Nixpkgs-wide policy again, and would provide a clear path forward if the recommended version scheme needs to evolve.

@7c6f434c
Copy link
Member

7c6f434c commented May 17, 2023 via email

@rhendric
Copy link
Member

rhendric commented May 19, 2023

If that's the main motivation, then I missed it. The RFC as written cites three motivations:

  • Restricting version to a set of strings such that parseDrvName is the left inverse of "${name}-${version}"—this amounts to simply requiring that version always starts with a digit, as I understand it
  • Constructing version in such a way that it can be compared with the expected semantics—this is sidestepped if we give packages the ability to specify their own comparison logic
  • Providing a ‘consistent interface’ to Repology—what is more consistent with what other packagers do than using upstream versions as closely as possible?

It does not appear to be a motivation that Nixpkgs-specific extensions to the version need to be formatted in the same way across all packages, and if comparison logic can be customized then I don't see why this would be necessary (though I could be missing something). As an outside observer, it seems to me that this is what has caused this proposal to stagnate, as people come up with different edge cases that could pose difficulties for different formats and use cases. If it stops us making progress and we don't need it, let's stop trying to achieve it—choose a good-enough default for the common case, allow maintainers to solve the problem in other ways if the need arises, and start reaping the benefits of resolving the cited motivations instead of letting them linger on indefinitely.

@7c6f434c
Copy link
Member

I am no longer sure there is a way to specify that we use an untagged snapshot that is accepted by everyone as «well, that's the most readable we will actually get» and at the same time is clearly safe for rev-deps (where things go wrong without always being obviuous to the maintainer of the package where the version is set)

@rhendric
Copy link
Member

I am no longer sure there is a way to specify that we use an untagged snapshot that is accepted by everyone

Precisely my point. If it seemed to me that such a specification were imminent I would keep my thoughts to myself and trust the process. But if you and I agree that making everyone happy doesn't seem likely, why not go with a recommendation that makes some people happy, leaving enough flexibility for maintainers to define their own version comparison semantics when issues arise? I haven't seen anyone arguing for keeping the status quo; almost anything you do here that resolves the stated motivations would be some sort of improvement on net, even if there remain people who think that a slightly different proposal would be better still. The potential for harm here isn't worth spending years in indecision.

and at the same time is clearly safe for rev-deps

If this is an important point that I'm missing, my apologies, I would need it spelled out more than this to address it.

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2023 via email

@rhendric
Copy link
Member

I am not even sure we can build a shepherd team consensus around any set of wording markers.

Look, as the man said, if you choose not to decide, you still have made a choice. If the shepherd team can't find a way to move forward, that is equivalent to the shepherd team choosing the status quo: unstable-YYYY-MM-DD for snapshots. We know this is technically flawed because it doesn't start with a digit so parseDrvName doesn't like it, and it doesn't sort with stable versions the way we'd want. So, do the closest thing that resolves those issues: use X.Y.Z-unstable-YYYY-MM-DD for a snapshot if it succeeds a stable release of X.Y.Z, and 0-unstable-YYYY-MM-DD if there is no preceding stable release. You are sad because according to current version sorting rules 2.11-unstable-2018-04-01 is greater than 2.11.1? Then change the default version sorting rules to recognize -unstable- as a super-separator, so that versions are compared as if they were parsed into a pair (StableVersion, Maybe Text) with the natural lexicographic ordering, using the existing version comparison logic on the StableVersion part. (This is way smaller in scope than what the RFC as written proposes, intentionally; sorry if that means this counts as ‘hijacking’ but clearly the bigger thing isn't moving.)

What are the objections to that? That version scheme doesn't handle every corner case we can throw at it? It handles more than the status quo does. Don't like the word ‘unstable’? That's what the status quo uses. It fully resolves the first two of the three issues cited as motivations for this RFC, and maybe it isn't perfect, but after some amount of time for deliberation you have to stop letting the perfect oppose the good, and I think the appropriate amount of time in this case is probably less than two years, don't you? At this point, it is my humble opinion that the shepherd team should stop entertaining (or, I guess, raising?) objections to progress that apply equally to the status quo, because the status quo is the consensus they are choosing through their inaction.

Version comparison function is a very different proposal and doing it here without addressing the handling of untagged snapshots would be hijacking, in my opinion.

I saw some discussion upthread of making versions structured values instead of strings and implementing version comparison with typeclass-style polymorphism, and if that's part of the conversation then I really don't see how my suggestion is hijacking. But maybe you think that was hijacky too and just didn't object to it at the time, I don't know. I included it because the third motivation in this RFC seems difficult to properly address without either a versioning scheme that somehow includes all others as subsets (highly unlikely if not outright impossible) or per-package customization. But fine, if you think that's out of scope I think we still have a proposal that is an improvement over the status quo and hits two out of three.

@alyssais
Copy link
Member

alyssais commented May 22, 2023 via email

@7c6f434c
Copy link
Member

Look, as the man said, if you choose not to decide, you still have made a choice.

However, RFC choices are more authoritative so they should be held to at least some standard.

That's what the status quo uses.

Not fully uniformly, and improving uniformity of an unfortunate thing is a loss.

For most packages where it is used, it is not an absolute lie at least. The few where it is are free to just use a word which is actually a good idea, because it is just a prevailing practice, not well codified.

At this point, it is my humble opinion that the shepherd team should stop entertaining (or, I guess, raising?) objections

Getting a response from the author which of the proposal would be considered for incorporation (when there were multiple in parallel) doesn't always work here… and by process, this is a pretty blocking circumstance.

if that's part of the conversation then I really don't see how my suggestion is hijacking.

I was talking about hijacking in a context that completely omitted the part about snapshots not bearing an upstream version. The new proposal is not hijacking, sure.

This sounds like a much better change to me than the one the RFC proposes.

If you put it like this (and given the timing precedent in the previous iterations of the discussion), it sounds like as different RFC with an author focused on getting no matter what as soon as possible should be opened (presumably by @rhendric )

For the case I personally care about, the change proposed is useless: if patch gating needs the expensive evaluation of custom code, that we can have now, and what nix-env thinks… well, why care.

(Also, neither «unstable» nor «unreleased» are good mandatory choices because they are incompatible with the language some upstreams use, so the status quo where you can change the word and get someone to merge before nitpickers arrive is better than fully codifying the sometimes-bad words)

@rhendric
Copy link
Member

Getting a response from the author which of the proposal would be considered for incorporation (when there were multiple in parallel) doesn't always work here… and by process, this is a pretty blocking circumstance.

I suppose that's the crux, then; the rest of this stuff doesn't matter if the process prevents progress on the RFC without author participation.

I'll start another one and we'll see how far that goes.

@AndersonTorres
Copy link
Member Author

Given that the original intent of this RFC was smashed by corner cases, we are at the step zero. Any idea will look good.

Especially, a complex data structure for monotonic versioning instead of magic with strings.

@AndersonTorres
Copy link
Member Author

Given that many people are discussing the new RFC #147, I will give for this a grace period of 2 days and close it.

@AndersonTorres
Copy link
Member Author

I will close this in two weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet