Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] str is dead, long live str([u8])! #19612

Closed
wants to merge 39 commits into from
Closed

Conversation

japaric
Copy link
Member

@japaric japaric commented Dec 7, 2014

This PR removes the built-in/primitive str type from the compiler, and replaces it with a library type: str([u8]). This new type is defined in the core::str module, and re-exported in the collections::str and std::str modules.

[breaking-change]s

For most of people:

  • str is now a struct imported as part of the std prelude, this means that importing the std::str module or using str as a variable name now is a name clash, and will result in a compile error.
  • The StrPrelude trait has been removed, and all the methods defined in it are now inherent methods of the new str type. This will break code that was explicitly using the StrPrelude trait.

For #![non_std] users:

  • If you are not using the core crate then you need to define the str type and mark it with the "str" lang item:
#[lang = "str"]
struct str([u8]);
  • The format_args! and panic! macros now use std::str::str instead of just str in their expansion, to use these macros you'll need to use the "std module trick":
#[doc(hidden)]
pub mod std {
    pub use core::cmp:  // for `#[deriving(PartialOrd)]`
    pub use core::str;  // for `panic!`, `format_args!`
}

Closes #19036


This passes make check as it is, but I broke the run-pass/unsized3 test (it triggers an LLVM assertion) and I'm currently investigating how to fix it.

@nikomatsakis re: compiler. I removed the sty::ty_str variant and replaced it with the following check: ty_struct(def_id, _) if ty::is_str(cx, def_id) in most match expressions. But, I'm sure that I can simplify the logic (i.e. remove that match arm) in several parts of the compiler, which I'd like to do before landing this.

@aturon re: library. One annoying thing after this change is that the str type/struct now collides with the std::str module (when imported). As a "fix", instead of directly importing the module, I now import it under the str_ name (use std::str as str_). From what I've seen that the main reason for importing that module is to use its free functions (e.g. str::from_utf8), so I'm wondering if we should convert all those free functions into inherent methods of str, that way str::from_utf8 would work without needing to import std::str, and no collision would occur. Thoughts?

@petrochenkov
Copy link
Contributor

Please, don't forget about RFC 339. It should be possible to add fixed strings to the language without breaking changes.

@aturon
Copy link
Member

aturon commented Dec 9, 2014

wow, you are on a serious roll!

The library situation is ... unfortunate. Of course, we usually avoid this by convention -- a convention the str type doesn't follow. (Personally, I'd love StrBuf and Str, but apparently there was a massive conflagration over the current choice already.)

I really wish that we had complete enough associated items (and HKT) that "single type" modules could essentially just be types, with all of their contents associated items.

That said, we don't have HKT, and things like iterators have to live somewhere.

I don't know; maybe this unfortunate situation is enough to revisit the String/str naming...?

Worse case, we could rename the module str_slice.

@barosl
Copy link
Contributor

barosl commented Dec 10, 2014

@aturon

I don't know; maybe this unfortunate situation is enough to revisit the String/str naming...?

I had thought String/str was better, but after seeing the proposal of PathBuf/Path, I actually got to love the consistency of XXXBuf/XXX. If we are to bikeshed again, I will gladly vote for StrBuf/Str.

@aturon
Copy link
Member

aturon commented Dec 11, 2014

@japaric

So. I've been talking a bunch with the core team about this PR. Unfortunately, it's a bit unclear whether we want to move in this direction -- whether the benefits outweigh the drawbacks.

Benefits:

  • Simplifies things in the compiler/type system by largely removing a special case.
  • Allows for inherent methods on str -- but note, it does not allow us to do away with extension traits because some of the operators must be defined in liballoc.

Drawbacks:

  • Raises problems with naming that don't have very good solutions (at least, not without a lot of churn).
  • Having the API split between inherent methods and extension traits is not necessarily helpful -- especially since we plan to "facade" the core extension traits in libstd so that there is one per built-in type.

At the moment, this change does not seem worth it, because it makes str more of a special case in the library wrt other "basic" types (primitive or not), and likely requires us to use a bad name (like str_slice) somewhere. Note that we can't do away with the module entirely since we still need a place for the extension trait to go.

@japaric, I'd like to hear your thoughts on the above, and whether you see additional benefits or drawbacks we should consider, or alternative routes.

cc @alexcrichton @nikomatsakis

@japaric
Copy link
Member Author

japaric commented Dec 13, 2014

(I was going to say that it's necessary to convert str into a library type in order to get string literals to store their size in their type, i.e. "foo" would have type &'static str([u8, ..3]) (as sketched in its RFC). But, I think that it's still possible to accomplish something similar without touching the str primitive, for example we could change the type of string literals to &'static FixedStr([u8, ..N]), and make &FixedStr coercible to &str. Either way would be backward incompatible change, but, more importantly, I haven't seen anyone pushing to get that RFC done by 1.0, so most likely it'll get postponed until the next major version (which makes sense because AFAIK that change is not useful without CTFE and integers as type parameters))

With that out of the way, of the two benefits you have mentioned, the first one will go mostly unnoticed by the community. And the second one is not really relevant anymore, because we are getting "facade" extension traits. That means that this PR will effectively break downstream code for no gain at all. And that's a good reason to not land this when we are so close to 1.0.

Feel free to close this, unless someone can think of a good reason to actually do this.

@ftxqxd
Copy link
Contributor

ftxqxd commented Dec 16, 2014

FWIW, if the str struct were changed from a tuple struct to a normal struct, variables named str could stay that way. Module imports of std::str would still have to avoid the name conflict, but in a way that’s a good thing IMO—intuitively, I expect the str in str::foo to represent the same thing as the str in &str.

My 2¢ regarding the whole situation: is it too late for an RFC renaming str to Str and moving it into the library? That way, all naming problems get solved. We already have both std::string and String, so it makes sense to me to also have std::str and Str. It also makes sense (to me) to have str in the library if we ever get string literals for arbitrary types implementing a trait, because (theoretically) str could stop being a lang item altogether. Then Str could be on equal footing with types like Str16 that could represent strings but with different encodings, and of course fixed-size strings if we ever get them.

(BTW, fixed-size string literals would certainly be useful without CTFE or integral generic parameters. For example, box *"foo" could be used to create a Box<str>, where there is no way to do so today in safe code. More generally speaking, str is currently lacking the fixed-size equivalent necessary to make DST coercions possible, rendering it quite hard to use in a few places.)

@aturon
Copy link
Member

aturon commented Dec 16, 2014

It's not necessarily too late for an RFC (though I would suggest String
become StrBuf and str become Str, to match with emerging DST conventions).
But last time this was raised it was a pretty massive bikeshed, and we're
talking about a lot of churn... Certainly I didn't have the stomach to
float such an RFC myself.

FWIW, I don't think keeping str as somewhat special will hold us back
from changing the way literals work.

On Tue, Dec 16, 2014 at 12:44 AM, P1start [email protected] wrote:

FWIW, if the str struct were changed from a tuple struct to a normal
struct, variables named str could stay that way. Module imports of
std::str would still have to avoid the name conflict, but in a way that’s
a good thing IMO—intuitively, I expect the str in str::foo to represent
the same thing as the str in &str.

My 2¢ regarding the whole situation: is it too late for an RFC renaming
str to Str and moving it into the library? That way, all naming problems
get solved. We already have both std::string and String, so it makes
sense to me to also have std::str and Str. It also makes sense (to me) to
have str in the library if we ever get string literals for arbitrary
types implementing a trait, because (theoretically) str could stop being
a lang item altogether. Then Str could be on equal footing with types
like Str16 that could represent strings but with different encodings, and
of course fixed-size strings if we ever get them.

(BTW, fixed-size string literals would certainly be useful without CTFE or
integral generic parameters. For example, box *"foo" could be used to
create a Box, where there is no way to do so today in safe code.
More generally speaking, str is currently lacking the fixed-size
equivalent necessary to make DST coercions possible, rendering it quite
hard to use in a few places.)


Reply to this email directly or view it on GitHub
#19612 (comment).

@tbu-
Copy link
Contributor

tbu- commented Dec 16, 2014

@aturon I believe the drawbacks you presented can easily be resolved by making str a real struct, not a tuple struct.

Later, making str generic over its content type would make fixed-size string slices possible. With a default type parameter [u8] nothing will change, but this is a move into that direction.

The magic about this type in the compiler would be reduced to the handling of string literals.

@huonw
Copy link
Member

huonw commented Dec 16, 2014

@tbu-, I believe only one of the drawbacks is solved by using a normal struct (as @P1start describes).

@tbu-
Copy link
Contributor

tbu- commented Dec 16, 2014

You're right, sorry.

@alexcrichton
Copy link
Member

Closing for now to help clear out the queue and because of #19612 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make str a library type (and lang item)
8 participants