Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 3349 precursors #120329

Merged
merged 7 commits into from
Jan 26, 2024
Merged

RFC 3349 precursors #120329

merged 7 commits into from
Jan 26, 2024

Conversation

nnethercote
Copy link
Contributor

Some cleanups I found while working on RFC 3349 that are worth landing separately.

r? @fee1-dead

The parser already does a check-only unescaping which catches all
errors. So the checking done in `from_token_lit` never hits.

But literals causing warnings can still occur in `from_token_lit`. So
the commit changes `str-escape.rs` to use byte string literals and C
string literals as well, to give better coverage and ensure the new
assertions in `from_token_lit` are correct.
The `CString` handling code is erroneously identical to the `ByteString`
handling code.
The `T` type in these functions took me some time to understand, and I
find the explicit `T` in the use of `from` makes the code easier to
read, as does the `u8` annotation in `scan_escape`.
- Rename it as `MixedUnit`, because it will soon be used in more than
  just C string literals.
- Change the `Byte` variant to `HighByte` and use it only for
  `\x80`..`\xff` cases. This fixes the old inexactness where ASCII chars
  could be encoded with either `Byte` or `Char`.
- Add useful comments.
- Remove `is_ascii`, in favour of `u8::is_ascii`.
I find it easier if they describe what's allowed, rather than what's
forbidden. Also, consistent naming makes them easier to understand.
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string`
becomes `unescape_mixed`. Because rfc3349 will mean that C string
literals will no longer be the only mixed utf8 literals.
They can't contain `\x` escapes, which means they can't contain high
bytes, which means we can used `unescape_unicode` instead of
`unescape_mixed` to unescape them. This avoids unnecessary used of
`MixedUnit`.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 25, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jan 25, 2024

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

Copy link
Member

@fee1-dead fee1-dead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@bors r+

@fee1-dead
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jan 26, 2024

📌 Commit 6be2e56 has been approved by fee1-dead

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 26, 2024
@nnethercote
Copy link
Contributor Author

Thanks for the fast review :)

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 26, 2024
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#117420 (Make `#![allow_internal_unstable(..)]` work with `stmt_expr_attributes`)
 - rust-lang#117678 (Stabilize `slice_group_by`)
 - rust-lang#119917 (Remove special-case handling of `vec.split_off(0)`)
 - rust-lang#120117 (Update `std::io::Error::downcast` return type)
 - rust-lang#120329 (RFC 3349 precursors)
 - rust-lang#120339 (privacy: Refactor top-level visiting in `NamePrivacyVisitor`)
 - rust-lang#120345 (Clippy subtree update)
 - rust-lang#120360 (Don't fire `OPAQUE_HIDDEN_INFERRED_BOUND` on sized return of AFIT)
 - rust-lang#120372 (Fix outdated comment on Box)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 5f1f617 into rust-lang:master Jan 26, 2024
11 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 26, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 26, 2024
Rollup merge of rust-lang#120329 - nnethercote:3349-precursors, r=fee1-dead

RFC 3349 precursors

Some cleanups I found while working on RFC 3349 that are worth landing separately.

r? `@fee1-dead`
@nnethercote nnethercote deleted the 3349-precursors branch January 28, 2024 20:32
nnethercote added a commit to nnethercote/rust that referenced this pull request Jan 29, 2024
Currently the parser will interpret any label in certain positions as a
mistyped char literal, on the assumption that the trailing single quote
was accidentally omitted. This is reasonable for a label like 'a (because
'a' would be valid) but not reasonable for a label like 'abc (because
'abc' is not valid).

This commit restricts this behaviour only to labels that would be valid
char literals, via the new `could_be_unclosed_char_literal` function.
The commit also augments the `label-is-actually-char.rs` test in a
couple of ways:
- Adds testing of labels with identifiers longer than one char, e.g.
  'abc.
- Adds a new match with simpler patterns, because the
  `recover_unclosed_char` call in `parse_pat_with_range_pat` was not
  being exercised (in this test or any other ui tests).

Fixes rust-lang#120397, an assertion failure, which was caused by this behaviour
in the parser interacting with some new stricter char literal checking
added in rust-lang#120329.
nnethercote added a commit to nnethercote/rust that referenced this pull request Jan 29, 2024
…r literal.

Currently the parser will interpret any label/lifetime in certain
positions as a mistyped char literal, on the assumption that the
trailing single quote was accidentally omitted. This is reasonable for a
something like 'a (because 'a' would be valid) but not reasonable for a
something like 'abc (because 'abc' is not valid).

This commit restricts this behaviour only to labels/lifetimes that would
be valid char literals, via the new `could_be_unclosed_char_literal`
function. The commit also augments the `label-is-actually-char.rs` test
in a couple of ways:
- Adds testing of labels/lifetimes with identifiers longer than one
  char, e.g. 'abc.
- Adds a new match with simpler patterns, because the
  `recover_unclosed_char` call in `parse_pat_with_range_pat` was not
  being exercised (in this test or any other ui tests).

Fixes rust-lang#120397, an assertion failure, which was caused by this behaviour
in the parser interacting with some new stricter char literal checking
added in rust-lang#120329.
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 30, 2024
…-errors

Be more careful about interpreting a label/lifetime as a mistyped char literal.

Currently the parser interprets any label/lifetime in certain positions as a mistyped char literal, on the assumption that the trailing single quote was accidentally omitted. In such cases it gives an error with a suggestion to add the trailing single quote, and then puts the appropriate char literal into the AST. This behaviour was introduced in rust-lang#101293.

This is reasonable for a case like this:
```
let c = 'a;
```
because `'a'` is a valid char literal. It's less reasonable for a case like this:
```
let c = 'abc;
```
because `'abc'` is not a valid char literal.

Prior to rust-lang#120329 this could result in some sub-optimal suggestions in error messages, but nothing else. But rust-lang#120329 changed `LitKind::from_token_lit` to assume that the char/byte/string literals it receives are valid, and to assert if not. This is reasonable because the lexer does not produce invalid char/byte/string literals in general. But in this "interpret label/lifetime as unclosed char literal" case the parser can produce an invalid char literal with contents such as `abc`, which triggers an assertion failure.

This PR changes the parser so it's more cautious about interpreting labels/lifetimes as unclosed char literals.

Fixes rust-lang#120397.

r? `@compiler-errors`
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 30, 2024
Rollup merge of rust-lang#120460 - nnethercote:fix-120397, r=compiler-errors

Be more careful about interpreting a label/lifetime as a mistyped char literal.

Currently the parser interprets any label/lifetime in certain positions as a mistyped char literal, on the assumption that the trailing single quote was accidentally omitted. In such cases it gives an error with a suggestion to add the trailing single quote, and then puts the appropriate char literal into the AST. This behaviour was introduced in rust-lang#101293.

This is reasonable for a case like this:
```
let c = 'a;
```
because `'a'` is a valid char literal. It's less reasonable for a case like this:
```
let c = 'abc;
```
because `'abc'` is not a valid char literal.

Prior to rust-lang#120329 this could result in some sub-optimal suggestions in error messages, but nothing else. But rust-lang#120329 changed `LitKind::from_token_lit` to assume that the char/byte/string literals it receives are valid, and to assert if not. This is reasonable because the lexer does not produce invalid char/byte/string literals in general. But in this "interpret label/lifetime as unclosed char literal" case the parser can produce an invalid char literal with contents such as `abc`, which triggers an assertion failure.

This PR changes the parser so it's more cautious about interpreting labels/lifetimes as unclosed char literals.

Fixes rust-lang#120397.

r? `@compiler-errors`
flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 8, 2024
…1-dead

RFC 3349 precursors

Some cleanups I found while working on RFC 3349 that are worth landing separately.

r? `@fee1-dead`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants