Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repr(tag = ...) for type aliases #3659

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

clarfonthey
Copy link
Contributor

@clarfonthey clarfonthey commented Jun 15, 2024

Primitive representations on enums now accept type aliases, meaning that in addition to primitives like #[repr(u32)], #[repr(tag = core::ffi::c_int)] and #[repr(tag = my_type)] are now accepted.

Internals discussion: https://internals.rust-lang.org/t/pre-rfc-type-aliases-in-repr/20956
Last comment on RFC under first version (self::): #3659 (comment)
Last comment on RFC under second version (type = ...): #3659 (comment)
Last comment on RFC under third version (discriminant = …): #3659 (comment)

Rendered

@joshtriplett joshtriplett added T-lang Relevant to the language team, which will review and decide on the RFC. I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. labels Jun 15, 2024
@joshtriplett
Copy link
Member

joshtriplett commented Jun 15, 2024

This seems like a straightforward change that we should support to improve type definitions for interfaces that depend on type aliases, particularly those that vary by target.

@rfcbot merge

That said, I think we should go with the mentioned alternative of allowing type aliases to shadow reprs (with a lint) in a future edition, to avoid forcing people to write repr(self::c_int) or similar.

@rfcbot concern allow-type-aliases-to-shadow-reprs

(We may even consider allowing that in current editions on the basis of a crater run turning up no conflicts.)

@rfcbot
Copy link
Collaborator

rfcbot commented Jun 15, 2024

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Jun 15, 2024
@scottmcm
Copy link
Member

@rfcbot concern ambiguity

This makes things like

type transparent = u64;
#[repr(transparent)] enum Foo {}

legal.

That seems particularly bad for any other proc macros that want to look at the attribute as part of things like their own soundness checks, since they'd no longer know what's going on from what's in the attribute.


Also, I'm always a bit sad when we have "real" stuff in an attribute, in the sense that it's something that we have a proper grammar construction for rather than just needing to deal in tokens.

It makes me think of changing it to instead be

enum Foo : c_int {}

or something instead.

@scottmcm
Copy link
Member

scottmcm commented Jun 15, 2024

@rfcbot resolve ambiguity

I'm an idiot and somehow missed the "you need self:: part" 🤦

That does solve the hard blocker, but it still makes me wish for a better thing instead so that use actually works nicely.

@ogoffart
Copy link

Just an idea for a another syntax that is not ambiguous: #[repr(type = c_int)

And can I use complex types such as <Self as Foo>::Bar or should it be limited to plain path?

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Jun 16, 2024

Hmm, making the existing repr(type) aliases for repr(type = X) does actually feel right. It was one of the things I initially was thinking but kind of discarded it.

Although, I'm not sure if "type = " is the right name, since we're really talking about the enum discriminant here, but "discriminant" is unfortunately a very, very long name.

Perhaps repr(as = u32) is a good compromise? Then, stuff like repr(as = c_int) works as expected.

In terms of allowing shadowing, the main reason why I'm totally against it is because there's no way to override the shadowing. For example, you can shadow u32 but get it back by referencing the absolute path ::std::u32, whereas there'd be no such thing for repr(transparent) once you've shadowed it to a type alias. I don't like that idea since, as @scottmcm says, you'd have a way to easily destroy proc macros and the like and make them no longer able to properly reason about things.

@joshtriplett
Copy link
Member

#[repr(type = ...)] seems like a great solution here. I agree that that's better than allowing shadowing.

"type" or "base" or several other possible names could work there.

@juntyr
Copy link

juntyr commented Jun 16, 2024

I also really like #[repr(type = ...)]. It allows greater flexibility in the future without clashing with existing reprs that are assumed to have an unambiguous meaning. When I read #[repr(u8)], it would be clear that this type has a u8 primitive repr, while a ``#[repr(type = u8)]type has a repr that matches the in-scope type namedu8`, which may be shadowed to not refer to the primitive. So this explicit form provides an extra warning sign that shadowing is possible (while the existing form guarantees a fixed repr). It would still probably be good to lint against a shadowed repr where the discriminant type looks like one of the primitive reprs.

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Jun 16, 2024

Decided to just update the RFC to use the type = proposal. There may be a few rewriting errors, but hopefully I caught them all. Hopefully this resolves the concern brought up by @joshtriplett.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jun 17, 2024

This is not straightforward, we currently never have "real code" (types, expressions, patterns) in inert attributes.

For the purpose of this feature we'll need treat the repr attribute specially in the frontend, unlike any other attribute, as if it was a real non-attribute feature with a dedicated syntax (keep it in AST/HIR separately, resolve and expand it separately, insert into into the definition tree, etc). That is going to be a source of pain and hacks.
Previously features requiring real code in them ended up having real syntax as well.

(In other words, this is not what inert attributes are generally supposed to be used for.)

@ogoffart
Copy link

Although, I'm not sure if "type = " is the right name, since we're really talking about the enum discriminant here, but "discriminant" is unfortunately a very, very long name.

Discriminant might be long, but on the other hand it describes exactly what this is, and makes it very clear and obvious. It is also consistent with std::mem::discriminant
Since the repr attribute is always on its own line, it's not like we don't have room for it and its also not something we type very often.

So if it was me, I'd go with discriminant =

It doesn't look bad at all:

#[derive(Clone, Copy, Eq, PartialEq)]
#[repr(discriminant = u32)]
#[non_exhaustive]
enum Foo {
   //...
}

@kennytm
Copy link
Member

kennytm commented Jun 18, 2024

@petrochenkov How difficult is it to make #[repr] a non-inert attribute, or at minimum separate #[repr(u8, type = u8)] as non-inert from the inert #[repr(C, packed, align, simd, transparent)]? Would doing so introduce any breaking regressions?

@petrochenkov
Copy link
Contributor

petrochenkov commented Jun 18, 2024

@kennytm
It would be easy to add a new built-in attribute macro, like #[repr_type(MyType)], expanding directly to the new AST node, similarly to what offset_of and friends do.
Turning an inert built-in attribute into a macro is a breaking change. Upd: maybe not much more breaking than adding new things to prelude though.

@clarfonthey
Copy link
Contributor Author

I'm a bit confused about the concept of inert attributes versus attribute macros, and also why particularly this would be a problem to implement. I assume you're totally correct about the specific issues with implementing this, but could use a bit more info to fully understand what we're going for.

From my (relatively naïve) perspective, when the definition is expanded in the HIR, we just resolve the type alias when we put together the enum definition, since we should have the information at that point. Since we're literally just listing allowed types for the alias, it shouldn't need much extra logic. But this appears to be wrong, and I'm not sure why.

In addition to the primitive types themselves, you can also use the path to a type alias in the `repr` attribute instead, and it will resolve the primitive type of the type alias. However, to ensure compatibility as new potential representations are added, the path to the alias must contain a double-colon: you can access an alias `Alias` defined in the same module by using `self::Alias`.

For example, `#[repr(core::ffi::c_int)]` is valid because it contains a double-colon, but a `use core::ffi::c_int` followed by `#[repr(c_int)]` is not. If you wanted to `use core::ffi::c_int` first, then you could still do `#[repr(self::c_int)]` to reference the type.
To ensure compatibility, the `#[repr(type = ...)]` form is required if the type is not one of the known primitive types. Note that this form is not necessarily equivalent to using the primitive representations directly, since shadowing is possible; for example, if you did `type u32 = u8` and then `#[repr(type = u32)]`, this would be equivalent to `#[repr(u8)]`, not `#[repr(u32)]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think type u32 = u8 seems needlessly obfuscating; I think it'd be a more readable example to write type C = u8 and then #[repr(type = C)], which is equivalent to #[repr(u8)] rather than $[repr(C)].

Copy link
Contributor Author

@clarfonthey clarfonthey Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that I was intentionally pointing out the obfuscation here because it feels more likely: #[repr(type = C)] is obviously going to mean whatever C type you have, but #[repr(type = u32)] meaning #[repr(u8)] is more likely to occur in something like proc macros if someone is doing something nefarious. So, genuinely, there is a preference to do #[repr(u32)] over #[repr(type = u32)] when you don't necessarily trust the parent scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've since updated this section a bit to elaborate a bit better, including what a type alias C might look like. Does this feel like it addresses your concerns?

@joshtriplett
Copy link
Member

@rfcbot resolve allow-type-aliases-to-shadow-reprs

Thank you to @ogoffart for the better type = syntax, and to @clarfonthey for incorporating it.

text/0000-repr-type-aliases.md Show resolved Hide resolved
text/0000-repr-type-aliases.md Show resolved Hide resolved
text/0000-repr-type-aliases.md Outdated Show resolved Hide resolved
text/0000-repr-type-aliases.md Show resolved Hide resolved
text/0000-repr-type-aliases.md Outdated Show resolved Hide resolved
text/0000-repr-type-aliases.md Outdated Show resolved Hide resolved
text/0000-repr-type-aliases.md Outdated Show resolved Hide resolved
text/0000-repr-type-aliases.md Outdated Show resolved Hide resolved
text/0000-repr-type-aliases.md Outdated Show resolved Hide resolved
@tmandry
Copy link
Member

tmandry commented Jun 20, 2024

@rfcbot reviewed

This seems like a great idea to me. As a future extension, I would like it if we could support types like the following:

#[repr(transparent)]
pub struct SecretInteger(u16);

Since that would allow "delegating representations" without allowing code to rely on the specific value of a type alias (this is especially useful when it changes on rarely-tested platforms).

If you agree that this is sensible, could you please add it as a future possibility?

…dowing, and alter the recommended list of lints
@clarfonthey
Copy link
Contributor Author

clarfonthey commented Jun 21, 2024

So, taking in additional suggestions:

  • type = ... is now discriminant = .... This is now both consistent with Add an expression for direct access to an enum's discriminant #3607 (even though that's not official yet) and feels more accurate. The typing is worth it.
  • Elaborated the suggested lints. Instead of recommending that people use #[repr(type)] directly when type is a primitive, recommend that they use #[repr(discriminant = type)] where type is the absolute path to a type. Make this an allow-only or clippy::pedantic lint that requires enabling by default.
  • Explained the guide-level explanation of shadowing a bit better, recommending that people use absolute paths if they're worried about shadowing rather than just recommending they use the old syntax.
  • Updated future possibilities to include repr(transparent) aliases and limited-value aliases like NonZeroU* and char.

Still need to go through more existing feedback though, so, there will probably be more changes. I don't expect to change the syntax again, though.

@clarfonthey clarfonthey changed the title repr(type) for type aliases repr(discriminant = ...) for type aliases Jun 21, 2024
@petrochenkov
Copy link
Contributor

I'm a bit confused about the concept of inert attributes versus attribute macros, and also why particularly this would be a problem to implement. I assume you're totally correct about the specific issues with implementing this, but could use a bit more info to fully understand what we're going for.

From my (relatively naïve) perspective, when the definition is expanded in the HIR, we just resolve the type alias when we put together the enum definition, since we should have the information at that point. Since we're literally just listing allowed types for the alias, it shouldn't need much extra logic. But this appears to be wrong, and I'm not sure why.

If #[repr(discriminant = TYPE)] accepts types in general, then some interesting cases are

  1. #[repr(discriminant = type_macro!())]
  2. #[repr(discriminant = U<16>)] // An anonymous constant in generic arguments
  3. #[repr(discriminant = new_crate::Type)] // Loading some new crate that previously wasn't in the dependency graph

All these cases require having the TYPE in AST during macro expansion already (or at least immediately after it for 3.), because macro expansion (for 1.) and definition tree building (for 2.) work on AST, and there is no AST inside attributes, only unstructured tokens.
The "when the definition is expanded in the HIR" point is too late for name resolution in the current compiler, because we are resolving all paths immediately after macro expansion and then freeze the dependency graph. (There are good reasons for that, e.g. we'd generally want to be able to produce depinfo for build systems without doing full compilation, so we should not lazily load new crates from type checking or codegen or something. Also that's how incremental compilation is currently organized.)

So what are the possible ways to put TYPE into AST early enough.

  1. Add a first class syntax for it, that's the most natural and the least hacky way.
enum E: TYPE { ... } // E.g. something like this (the specific syntax is taken from C++).
  1. Add a procedural macro that will expand directly to AST, that's equivalent having an unstable "secret" syntax.
    This is similar to features like offset_of!(...), the approach feels a bit like a temporary solution for features that may or may not get a first class syntax later.
// Before expansion
#[repr_discriminant(TYPE)]
enum E { ... }

// After expansion, you may see this with `-Zunpretty=expanded`.
enum E builtin#repr TYPE { ... }
  1. Turn repr itself into a procedural macro, then the discriminant part will expand into builtin#repr TYPE and the rest will expand into some inert attribute #[inert_repr] equivalent to the current #[repr]. This feels more hacky to me.
    There's a small chance of this causing breakage because macro expanded code is a subject to "time travel" restrictions, e.g. see one recent example (probably no breakage in practice though).
  2. Turn repr into an active built-in attribute.
    "Active" means it acts like a macro and transforms code, and "built-in" in relation to attributes means that it doesn't go through name resolution (doesn't support shadowing, and we immediatly know what it is when we encounter it during expansion).
    This is a relatively major hack, only two active built-in attributes exist now for historical reasons - cfg and cfg_attr.
    In the past we prevented doc from becoming an active attribute by replacing #[doc(include = "path/to/somewhere")] with #[doc = include_str!("path/to/somewhere")], and I'm pretty happy about that now.
    In this solution #[repr(discriminant = TYPE)] immediately expands into the secret syntax and inert components when it is encountered during expansion.
// Before expansion
#[repr(discriminant = TYPE, packed)]
enum E { ... }

// After expansion, you may see this with `-Zunpretty=expanded`.
#[repr(packed)]
enum E builtin#repr TYPE { ... }

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Jul 3, 2024

Now that this is in FCP, I decided to go through the last of the feedback and add more info to the RFC so folks that get notified via TWIR or otherwise can get a reasonable picture of what this feature is proposing.

Let me know if you have any additional feedback on the contents of the RFC itself.

(Note: these aren't any more design changes, just clarifications on alternatives, prior art, and drawbacks. In particular, I mentioned the desire for syntax and how that can still be added after this RFC is merged.)

Also, I still would like to express my preference for the attribute over the syntax regardless. To me, #[repr(discriminant = T)] enum U { ... } is much more descriptive than enum U : T { ... }, especially since equal weight is put to the name of the enum and its discriminant type in the latter case. Maybe that could be improved somehow, but this is at least how I feel about the proposed alternatives.

Comment on lines 70 to 72
## `type`

The second proposal for this RFC used `type = ...` instead of `discriminant = ...`. Initially, this decision was chosen because `discriminant` was long to type, but it was ultimately decided that `discriminant` is more clear and that the extra typing is worth it. Additionally, RFC [#3607] proposes using `discriminant` in its proposed syntax, so, this would be further in line with that as well.
Copy link
Member

@jswrenn jswrenn Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we intend for repr(discriminant = u8) to be equivalent to repr(u8), then I'd like to propose that we perhaps shouldn't use repr(discriminant = ...) since it might complicate a future possibility for enums.

That future possibility — alluded to in this RFC — is that we someday might want to expand the accepted set of discriminant types of enums (e.g., to include things implementing StructuralEq). And, if we had such functionality, it might also be nice to declare up front what the discriminant type is. A natural syntax for this could be #[repr(discriminant = ...)].

However, repr(discriminant = ...), as proposed by this RFC, does not only set the discriminant type of an enum, but also affects its memory layout. This is not desirable for all enums. If this RFC is adopted as-is, we'd need to find an alternative way to say "the discriminant type of this enum is X, but I don't care what the enum's in-memory representation is".

Given this, I'd like to suggest that this RFC consider repr(tag = ...). Doing so has three advantages:

  1. it makes it clearer that the annotation has an affect on the enum's in-memory representation, not just its logical representation
  2. it keeps repr(discriminant = ...) syntactically reserved for the aforementioned future possibility in which the user explicitly provides the type of the discriminant
  3. it's significantly shorter than "discriminant"

If not repr(tag = ...) than perhaps something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused how labeling this a tag makes it less confusing. It's a shorter word, but that's about it.

We've already established that we're changing the memory layout by using repr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that these are often called "tagged unions" in contexts where memory layout matters, I do think the name is clearer. It's also going to be consistent with other in-flight changes to Rust's documentation: rust-lang/reference#1454 (comment)

What are the advantages of discriminant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are often called this, but we also explicitly use the term discriminant in places where it has an API meaning. See: https://doc.rust-lang.org/nightly/std/mem/fn.discriminant.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the lang meeting today, we decided on tag. We felt that the existing mem::discriminant API was unfortunate, and unloved in enough other ways, that we didn't need to weigh that heavily as precedent. See also here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very fair. I think that it would be worth proposing to deprecate & rename that API if this is the route we're going, to ensure uniformity.

Copy link
Contributor Author

@clarfonthey clarfonthey Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, while the decision on the rename is resolved, I'm going to unresolve this until I update the RFC to rename the attribute, just so it's easier to track. Plus, it would be helpful to clarify all the finer details on how they should be written into the RFC before this gets marked as resolved.)

@joshtriplett
Copy link
Member

@rfcbot concern naming-and-syntax

Filing a concern both for the question of discriminant vs tag and because @scottmcm has an RFC in progress for the enum E: Type syntax.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Jul 10, 2024
@clarfonthey
Copy link
Contributor Author

clarfonthey commented Jul 10, 2024

I didn't want to have to elaborate more on the context of syntax, but since the FCP is not happening any more, I might as well.

First, I'll start with the obvious. A colon is a terrible syntax. There's a reason why, when given the chance to design a new language from scratch instead of bolting on additional syntax like C++ does, Java and many other languages chose to use a keyword to describe the relationship between the name of the type and whatever occurs after the colon. Sure, Java also has distinctions like extends and implements that didn't exist in C++, but regardless, having a keyword is more clear.

With enum E: Type, the relationship between E and Type is incredibly unclear in general, and the precedent from C++ and other languages is that Type is the parent type of E, which is not true in this case. E isn't even a child type of Type, since explicit discriminants for enums with payloads has been allowed for a while, and this mucks up the semantics even more. Yes, the relationship in enum E: u8 is clearer, but this RFC is explicitly talking more about cases like enum E: MyType, where the actual type behind MyType may not be as clear.

Second, the ship has already sailed as far as discriminant and tag are concerned. We use the term discriminant in the standard library, throughout the compiler internally, and even new RFCs like #3607 are proposing to use the term "discriminant" to refer to the discriminant as we have been describing it. "Tagged union" was used as a term because it mimics what people have been saying in the C/C++ vernacular, but it is not part of the Rust vernacular, and I see no reason to treat it that way all of a sudden. Documentation and references, in my opinion, simply use the C/C++ vernacular because that is what people unfamiliar with Rust might know them as, to bridge common ground. I'm not counting libraries like serde using the term "tag" because in those cases, it is referring to a particular representation of the discriminant and not the discriminant itself.

Like, I personally have no issues with adding a dedicated syntax for this in the future. I just don't think that enum E: Type is a fleshed-out dedicated syntax and that #[repr(discriminant = ...)] is, and that personally, I think that most of the apprehension with making this an attribute largely boil down to implementation concerns in the compiler that people dislike and want to change anyway.

I think that an attribute fits this feature because, as mentioned before, any syntax will likely be less descriptive and more confusing. Elevating repr(discriminant = ...) to a dedicated syntax feels disproportionate to other features which are perhaps more ubiquitous, like repr(transparent) or repr(C). There are plenty of features which exist as attributes which maybe eventually will have syntax versions, like non_exhaustive, and I think it's perfectly fine to use attributes to be descriptive when syntax otherwise might be difficult to nail down.

@scottmcm
Copy link
Member

I went to mark this waiting-on-team, but it looks like that label doesn't exist here. Please treat this RFC as waiting on us to give better feedback.

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Jul 10, 2024

I should add: I was having a sour day before I saw this, so, that is definitely reflected in my response.

I think that it's absolutely reasonable to want to pause the FCP because you have additional concerns without fully fleshing out those concerns, since you are on a time limit. I just wish that were the reason stated for pausing the FCP, instead of explicitly providing not-fleshed-out concerns as an excuse instead. Because as far as I'm concerned, the reasons mentioned were addressed before the FCP started, and no additional reasons were cited.

@scottmcm
Copy link
Member

(We were actually talking in the lang design meeting about marking this waiting-on-team before I even saw your response. Under the usual "no new rationale", we always need to put detailed rationale and concerns in the thread.)

@juntyr
Copy link

juntyr commented Jul 10, 2024

I would like to second @clarfonthey’s point about clarity. What I really appreciate about Rust is being able to write more “self-documenting” code, where types and variable names (and ? short-circuiting) make many more comments superfluous. A big part of that is writing code that looks like it explains itself. Yes, attributes are always a bit more magic than I would like, but otherwise repr(discriminant = u8) is extremely descriptive and easy to understand (to the point where I would use it even update existing non-C enums in my code to use it).

In contrast, the : syntax is akt tougher to read. I like it for type (generics) bounds and variable types and that’s about it (I even forget every time which way the outlives relationship goes for lifetime bounds after using Rust professionally for >5 years).

If Rust gains dedicated syntax for this feature at some point (and perhaps a stable trait to reason about the discriminant type in the type system), that would be nice. But it should be as easy to read as the proposed attribute syntax. As this syntax is also not a new attribute but merely extends an existing one, I feel like not having it would at some point feel increasingly like a lack in feature with the repr attribute.

@traviscross
Copy link
Contributor

traviscross commented Jul 10, 2024

@rustbot labels +S-waiting-on-team

We decided in our design meeting today that we're going with tag rather than discriminant as the word for new language work. We were persuaded by its conciseness, and noted that much of the language of RFC 2195 had preferred tag also.

So if this RFC were to be accepted, we'd want it updated to #[repr(tag = ..)]. But we're not asking for that update to happen right now, because as mentioned, whether we can accept this RFC at all is blocked on some other further decisions we need to make.

@rustbot rustbot added the S-waiting-on-team Status: This is awaiting action from the team. label Jul 10, 2024
@traviscross traviscross changed the title repr(discriminant = ...) for type aliases repr(tag = ...) for type aliases Jul 10, 2024
@RalfJung
Copy link
Member

RalfJung commented Jul 11, 2024

"tag" might not be a good choice of terminology. As I already wrote over there:

The use of "tag" as apparently a synonym for "discriminant" is unfortunate insofar as "tag" exists as a term in the compiler and it is not equivalent to "discriminant" there. It refers to how the discriminant is encoded in memory. For instance, for an Option<&T>, the "tag" is just the entire value, and the "discriminant" is either 0 or 1 depending on whether the tag is null or not. Also see the compiler glossary. RFC #2195 was written before we established consistent terminology here; terminology around variant index / discriminant / tag used to be quite messy in the compiler until they got cleaned up (around 2019 if I recall correctly).

Granted, with this largely being internal compiler terminology, it can be changed. But it will certainly be confusing to people that have worked with enums in the compiler in the past. We have also occasionally used this terminology in opsem and other language discussions, to my knowledge.

And we would need a new word for what is currently called "tag".

@kennytm
Copy link
Member

kennytm commented Jul 12, 2024

"Tag" could make sense here since #[repr(uNN)] is primarily controlling the memory representation, not the order of variants.

@jswrenn
Copy link
Member

jswrenn commented Jul 12, 2024

@RalfJung Here, tag is appropriate. This RFC can be thought of as an extension of RFC2195: Really Tagged Unions, allowing the primitive repr type to be a type alias. Variants of enums with explicit primitive reprs are defined to always have tags.

By contrast, the feature proposed by #3607 only concerns discriminants. I agree entirely with your comment there that the use of "tag" in that case is inappropriate.

It's extremely useful to draw a distinction between discriminants (part of a variant's logical representation, and something all variants have) and tags (part of the physical representation, and only had by the variants of some enums). Using these terms synonymously will make talking about enum representation much more challenging.

@traviscross I hope there is still time and procedural flexibility for the lang team to reconsider using the same terminology for both of these RFCs.

@traviscross
Copy link
Contributor

We'll take the required time and consider all the feedback.

@RalfJung
Copy link
Member

@jswrenn that's fair, if this here is indeed about controlling the representation, and maybe even saying that tag == discriminant, then using tag makes sense.

With repr(u8) etc, do we just guarantee that the enum has the layout and ABI of a u8, or do we also guarantee that transmuting enum values to a u8 yields the discriminant (which would indeed guarantee tag == discriminant)?

@kennytm
Copy link
Member

kennytm commented Jul 12, 2024

I think unless we change https://doc.rust-lang.org/reference/type-layout.html#primitive-representation-of-enums-with-fields-less-enums, a #[repr(uNN)] enum is going to imply #[repr(C)] (inhibits layout optimization), and also forcing the tag to be transmutable with uNN.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jul 12, 2024 via email

@clarfonthey
Copy link
Contributor Author

Yeah, I'm also of the opinion that discriminant is a much better term in general.

The way we tell variants apart is using a discriminant. That discriminant may be a dedicated tag field, but it could be niche values.

However, I think that calling this tag is acceptable because, well, it's an explicit tag representation.

Take this example:

enum Even {
    Zero = 0,
    Two = 2,
}

enum Odd {
    One = 1,
    Three = 3,
}

enum Both {
    Odd(Odd),
    Even(Even),
}

There is a sense where all of these could have a u8 representation, and also be a single byte.

The "discriminant" here would fit into a u8. However, repr(u8) explicitly adds a tag and thus would make Both be two bytes long.

So, calling this representation flag tag makes sense, while calling discriminants in general tags does not.

@lolbinarycat
Copy link

if this gets accepted, this discriminant/tag definition should be added to the glossary of the rust reference

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Jul 25, 2024

I agree that the decisions made, if final, should be solidified in the reference and similar documentation.

Personally, I'm fine with deciding that this particular representation can be called a "tag" in all cases and in the reference, and thus using that name in the attribute makes sense. However, whether enum discriminants in general should be called tags is something I particularly disagree with, and which I'm not sure is actually finalised given the current discussion.

That said, it is the teams responsible that have the final say in these discussions, and while I do hope they listen to the feedback presented to them, I ultimately can't force them to decide one way or another.

As far as I'm concerned, the discussion left to be had is largely external to this particular RFC, and I've also already made the name changes in the RFC text itself so that we can restart the FCP.

I'm not sure what the best place to continue the discussion is, but considering how I don't have anything particular to add to it, I'll wait for others to decide and share links and such.

@kennytm
Copy link
Member

kennytm commented Jul 25, 2024

I think the concern for naming-and-syntax is not just "tag" vs "discriminant" but because of implementation constraint of rustc 🤷 the compiler currently simply cannot support #[repr(tag = arbitrary_tts!{})]; it requires at minimum expanding to some "secret syntax" like enum E __internal#repr_tag(u8) #3659 (comment) or overhaul the built-in attribute system #3659 (comment). It was expected that the RFC text is updated to mention about this #3659 (comment).

@clarfonthey
Copy link
Contributor Author

I'm fine with commenting that down, although based upon @petrochenkov's comments (#3659 (comment) and #3659 (comment)) I was under the impression that this was less a design concern and more of an implementation detail.

Like, I agree that implementing this now would require a few invasive changes to the way repr works. It might require a few crater runs to ensure that it doesn't break anything. Depending on how ambitious a potential implementer wants to be, it could involve merely hacking in a way for this to work now or reworking the attribute system entirely. It may also mean that this RFC takes a while (potentially a year or two) to get implemented, which is fine for me.

To me, approving this RFC means that:

  1. The lang/compiler team agree that allowing type aliases for discriminant representations is useful, and that opening the potential for other types than primitives is okay.
  2. The attribute-based syntax for this, if implemented now, would be acceptable to support permanently in the language.

If, later down the road, someone proposes a better syntax for this feature and decides to make a new RFC amending this one to ditch the attribute-based syntax, I'm fine with that. I just think that inheriting C++'s messy enum Name : Repr { ... } syntax is a bad idea for all the reasons I mentioned, and that "well, it'd be easier to do than an attribute" is not a good reason to stabilise this syntax into the language permanently. If the attribute desugars to this syntax and the syntax itself is unstable, that's fine with me, but I don't think that we should make it the canonical, permanently stable way of doing this.

To me, attributes are just a predefined syntax we can use instead of sprinkling other sigils in the language, and are a way to easily stabilise features without giving them a permanently stable notation. The fact that the compiler is unable to keep up with this perception is something that would be better to change, IMHO, or maybe it's just repr that has this problem. I don't know enough of the details to tell.

I really shouldn't go off my own personal interpretation of the opinion of one person when that interpretation could be wrong, or that person's opinion doesn't reflect the entire team, but at least without any other input on this, that's what I'm doing for now. As stated, if this is actually a serious problem to consider, or if the team(s) would prefer to postpone this RFC instead of merging it, I can update the RFC text to reflect that.

@traviscross
Copy link
Contributor

@clarfonthey: We've just been a bit busy recently, with the edition coming up and whatnot. We'll circle back to these design questions when we can.

@clarfonthey
Copy link
Contributor Author

My entire point here was that I'm against rushing out a syntax before it's fully designed, so, it would be hypocritical to try and rush this out while folks have other, more important work to do. Thank you all for the 2024 efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. S-waiting-on-team Status: This is awaiting action from the team. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet