-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support u8
type suffix for UTF8 string literals.
#58991
Support u8
type suffix for UTF8 string literals.
#58991
Conversation
Slightly worried about two lang features going in at the same time that affect strings literals :-) In reply to: 1018191692 |
@cston, @RikkiGibson Please review. In reply to: 1018610271 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments and questions
|
||
if (!inDirective && TextWindow.PeekChar() == 'u' && TextWindow.PeekChar(1) == '8') | ||
{ | ||
info.Kind = SyntaxKind.UTF8StringLiteralToken; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the above assignment to info.Kind
be in an else
below this if
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the above assignment to
info.Kind
be in anelse
below thisif
?
That is an alternative way to get to the same result. If you prefer, I can refactor the code that way.
|
||
if (TextWindow.PeekChar() == 'u' && TextWindow.PeekChar(1) == '8') | ||
{ | ||
info.Kind = SyntaxKind.UTF8StringLiteralToken; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the above assignment to info.Kind
be in an else
below this if
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an alternative way to get to the same result. If you prefer, I can refactor the code that way.
static void Main() | ||
{ | ||
var y = (C2)""dog""u8; | ||
var z = (C3)""cat""u8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why an error is produced here. I thought "cat"u8
has natural type ReadOnlySpan<byte>
and then we can just use the user-defined conversion. Maybe we can sync offline? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why an error is produced here.
You are correct. This doesn't follow the proposed specification. I guess, I forgot to add a PROTOTYPE comment about that in the Binder. I decided to go with byte[]
as the natural type for now. That allows us to get away without special conversion rules for UTF8 string literals (do not confuse with string constant expressions, UTF8 string literals are not constants) because there are already implicit user-defined conversions from byte[]
to Span<byte>
and ReadOnlySpan<byte>
, and there are no existing conversions from ReadOnlySpan<byte>
to byte[]
or Span<byte>
. It feels like the language rules are simpler. Also, even when we are creating a ReadOnlySpan<byte>
, we are creating a byte[]
first. It feels like byte[]
is a better choice for a natural type.
[Fact] | ||
public void Errors_01() | ||
{ | ||
// The behavior is consistent with how type syffixes are handled on numeric literals, see Errors_07. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "suffixes" #Resolved
[Fact] | ||
public void Errors_05() | ||
{ | ||
// The behavior is consistent with how type syffixes are handled on numeric literals, see Errors_06. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "suffixes" #Resolved
[Fact] | ||
public void Errors_08() | ||
{ | ||
// The behavior is consistent with how type syffixes are handled on numeric literals, see Errors_07. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "suffixes" #Resolved
[Fact] | ||
public void Errors_12() | ||
{ | ||
// The behavior is consistent with how type syffixes are handled on numeric literals, see Errors_06. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "suffixes" #Resolved
[Fact] | ||
public void Interpolation_01() | ||
{ | ||
UsingExpression(@"$""hello""u8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to modify the proposal to specify what happens when @
or $
prefix is used, and either:
- the u8 suffix is used, or
- the new utf8 conversion is used
Intuitively I thought that a constant-valued interpolated string would be fine to use with the u8 suffix, but that doesn't appear to be the case. I don't have a strong position on that, but it seems like whatever is decided should be included in the proposal.
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure where confusion is coming from. I think the proposal is clear: "the language will provide the u8
suffix on string literals to force the type to be UTF8." The suffix is for literals, not for constant expressions. Interpolated strings are not literals, even when they have a constant value.
@@ -82,6 +82,7 @@ private static bool AreTokensEquivalent(GreenNode? before, GreenNode? after, Fun | |||
case SyntaxKind.NumericLiteralToken: | |||
case SyntaxKind.CharacterLiteralToken: | |||
case SyntaxKind.StringLiteralToken: | |||
case SyntaxKind.UTF8StringLiteralToken: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we testing this case?
Not explicitly. I'll make note in test plan
"; | ||
var comp = CreateCompilation(source, targetFramework: TargetFramework.NetCoreApp, options: TestOptions.DebugExe); | ||
CompileAndVerify(comp, expectedOutput: @"byte[]"). // The behavior has changed | ||
VerifyDiagnostics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding:
comp = CreateCompilation(source, targetFramework: TargetFramework.NetCoreApp, parseOptions: TestOptions.Regular10, options: TestOptions.DebugExe);
CompileAndVerify(comp, expectedOutput: "string");
``` #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an entry in "docs/compilers/CSharp/Compiler Breaking Changes - DotNet 7.md".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario will become an error. We will not change semantics based on language version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an entry in "docs/compilers/CSharp/Compiler Breaking Changes - DotNet 7.md".
I'll add a PROTOTYPE comment about that for now. We might still change the design around this
CompileAndVerify(comp, expectedOutput: @"array").VerifyDiagnostics(); | ||
} | ||
|
||
[ConditionalFact(typeof(CoreClrOnly))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HelpersSource refers to types that are not available on desktop. I am fairly comfortable to test this scenario only on CoreClr.
").VerifyDiagnostics(); | ||
} | ||
|
||
[ConditionalFact(typeof(CoreClrOnly))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HelpersSource refers to types that are not available on desktop. I am fairly comfortable to test this scenario only on CoreClr.
); | ||
} | ||
|
||
[ConditionalFact(typeof(CoreClrOnly))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HelpersSource refers to types that are not available on desktop. I am fairly comfortable to test this scenario only on CoreClr.
@@ -673,6 +673,9 @@ private BoundExpression BindExpressionInternal(ExpressionSyntax node, BindingDia | |||
case SyntaxKind.NullLiteralExpression: | |||
return BindLiteralConstant((LiteralExpressionSyntax)node, diagnostics); | |||
|
|||
case SyntaxKind.UTF8StringLiteralExpression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, in the raw-string-work we decided to not have a special expression type. it's just a string-literal that consumer can check the token kind on. up to you if you want to do the same. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, in the raw-string-work we decided to not have a special expression type. it's just a string-literal that consumer can check the token kind on. up to you if you want to do the same.
I will add a PROTOTYPE comment for now and will revisit this once raw-string-work gets into this branch.
@@ -77,7 +77,18 @@ private void ScanStringLiteral(ref TokenInfo info, bool inDirective) | |||
} | |||
else | |||
{ | |||
info.Kind = SyntaxKind.StringLiteralToken; | |||
if (!inDirective && TextWindow.PeekChar() == 'u' && TextWindow.PeekChar(1) == '8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this may follow the spec, but we probably want to take back to ldm if we support U8
as well as u8
all our other suffixes are case insensitive. so we probably want the same here. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may follow the spec, but we probably want to take back to ldm if we support
U8
as well asu8
all our other suffixes are case insensitive. so we probably want the same here.
Will add a PROTOTYPE comment and will bring this design question to LDM
@@ -77,7 +77,18 @@ private void ScanStringLiteral(ref TokenInfo info, bool inDirective) | |||
} | |||
else | |||
{ | |||
info.Kind = SyntaxKind.StringLiteralToken; | |||
if (!inDirective && TextWindow.PeekChar() == 'u' && TextWindow.PeekChar(1) == '8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: for raw strings, i allowed them to be parsed in directives, but then give a particular message that it isn't allowed. up to you if you want to do the same. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: for raw strings, i allowed them to be parsed in directives, but then give a particular message that it isn't allowed. up to you if you want to do the same.
I do not believe it is worth the effort
Proposal - https://github.com/dotnet/csharplang/blob/main/proposals/utf8-string-literals.md
Relates to test plan #58848