Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"..."u8 prefers byte[] when it should prefer ReadOnlySpan<byte> #60644

Closed
stephentoub opened this issue Apr 8, 2022 · 6 comments · Fixed by #61532
Closed

"..."u8 prefers byte[] when it should prefer ReadOnlySpan<byte> #60644

stephentoub opened this issue Apr 8, 2022 · 6 comments · Fixed by #61532
Assignees
Labels
Milestone

Comments

@stephentoub
Copy link
Member

Version Used:
e79be97

Steps to Reproduce:
I understand this is currently "by design", but we should revisit the design. That u8's natural type is byte[] leads to a variety of problems.

For example, this:

"abcd"u8.Length

allocates a new byte[4] just to get the length; if this were instead a span, that allocation would be avoided, and even if C# didn't want to recognize this particular pattern of constString.Length, the JIT can:
https://sharplab.io/#v2:EYLgtghglgdgNAFxBAzmAPgAQEwEYCwAUEZgMwAEO5AwkQN5HlOUWwLkCyAFAJTkC8APnIwApgHdyAJVEQAJgHkYAGwCeAZQAOEGAB5gqhKMFcxkg0YDaAXXJ1yXC6J4ByCC7gOnr4B6+HnFwBjP0cA1zkXcgBfHgA6ABlRGABzBAALAG4iaKA==
It's not uncommon to use const strings for their length like this rather than hardcoding a magic constant by manually counting the bytes/characters, e.g.

ReadOnlySpan<byte> data = ...;
if (data.Slice("prefix"u8.Length).SequenceEquals(...)) { ... }

Requiring a cast to ReadOnlySpan<byte> here to do the "right thing" is unexpected and unintuitive.

Same goes for other operations on span, like CopyTo. You want to be able to use UTF8 const strings to populate things, e.g.

byte[] data = ...;
"suffix"u8.CopyTo(data.Slice(123));

but you really don't want to be allocating a byte[] for "suffix" to achieve that.

It's also unfortunate that overload resolution picks byte[] over ReadOnlySpan<byte>:

using System;
public class C
{
    public void M() => N("abcd"u8);
    public void N(ReadOnlySpan<byte> bytes){}
    public void N(byte[] bytes){}
}

https://sharplab.io/#v2:EYLgtghglgdgPgAQEwEYCwAoBBmABM3AYUwG9NcL88EAWXAWQAoBKXAXgD5cA5RgIgjAAxgBM+AVwAczANzlKOfHV4AlAKYQRAeRgAbAJ4BlAA4QYAHmD6ALmq5XbAZ2YkAvvIqLaPRg7UBtAF1cP2c3TFcgA===
It's common to add span-based overloads to things that currently accept byte[], and even to add both overloads at the same time (in support of languages that don't support spans), and having overload resolution prefer the more expensive thing is problematic.

cc: @MadsTorgersen

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged Issues and PRs which have not yet been triaged by a lead label Apr 8, 2022
@tannergooding
Copy link
Member

Many similar points were raised in dotnet/csharplang#5983

@stephentoub
Copy link
Member Author

stephentoub commented Apr 8, 2022

Yes. Though unlike some of those comments, I do believe that allowing "..."u8 to target-type to byte[] or anything implicitly creatable from byte[] (e.g. ReadOnlyMemory<byte>) is valuable and something we should strive to keep. I just don't think it should be the natural type.

@tannergooding
Copy link
Member

Yep, just wanted to link the related bits even if some other bits have differing opinions.

-- Notably, I'm not a huge fan of the implicit assignment to byte[] due to the hidden allocations/etc, I'd rather just .ToArray() myself. That being said, its not a "strong" dislike and I could understand allowing it if string likewise got support for char[] and ReadOnlyMemory<char>, etc. Having something new here and not providing similar functionality for string seems odd/inconsistent to me. But that'd be a separate issue.

@jcouv jcouv added this to the 17.3 milestone Apr 8, 2022
@jcouv jcouv removed the untriaged Issues and PRs which have not yet been triaged by a lead label Apr 8, 2022
@stephentoub
Copy link
Member Author

I'd sent an email which was discussed in C# LDM today. Including it here for @MadsTorgersen to link to:


[...]
Here’s our recommendation for some tweaks to be made to the design:

  1. The u8 suffix should be required. Allowing “…” to implicitly convert to byte-based types leads to ambiguities in overload resolution and raises questions about the encoding being employed to generate bytes.

  2. ReadOnlySpan<byte> should be the natural type of “…”u8. Having the natural type be byte[] leads to very unexpected allocations, e.g. “prefix”u8.CopyTo(span) allocating, “data”u8.Length allocating, var prefix = “prefix”u8 allocating, etc. It also ends up allocating when choosing between two overloads, one based on byte[] and one based on ReadOnlySpan<byte>, and having both overloads for both types is becoming increasingly more common. And it causes problems for null termination…

  3. UTF8 literals should be null terminated. For consistency with string literals, as a defense-in-depth measure, and to help with some specific interop use cases, we should ensure there’s a null byte just after the u8 data but not included in the span’s length. This is primarily valuable for niche cases where code does fixed (byte* ptr = “somestring”u8) Call(ptr);, invoking a native function that expects the UTF8 data to be null terminated. Some feel that since string literals are null terminated and can be used in a similar manner, folks doing such interop will expect this to “just work”, and we’ll be risking vulnerabilities if we don’t. We should also investigate adding an analyzer to help mitigate expectations that such a terminator might apply to any ReadOnlySpan<byte> (or ReadOnlySpan<char> for that matter).

  4. Attributes should support using “…”u8 where byte[]s are expected. This is more of a nice-to-have than something that’s truly required. This is common in certain kinds of testing scenarios, for example. But we also might not want to do this if we don’t do (5) below…

We are in disagreement about one aspect:

  1. Should “…”u8 target type byte[] and things implicitly convertible from byte[]? One the one hand, it makes code that needs a byte[], a ReadOnlyMemory<byte>, an ImmutableArray<byte>, etc. more concise and simpler. On the other hand, it might be surprising to folks that each such use allocates (most such uses wouldn’t be “hidden”, but some could be if there were a method that only accepted a byte[] and you called it with “…”u8 as an argument), and there’s an inconsistency around “…” not implicitly converting to char[] (but that just stems from the broader, accepted inconsistency of there being a heap-based type for UTF16 but not currently for UTF8). This should not have a significant impact on APIs we expose, as we do expose and plan to expose ReadOnlySpan<byte>-based overloads anywhere we expect performance to be an issue. It’s more of a language usability vs predictability decision, so a good discussion to have in LDM. It of course could be a moot decision as well if it’s not feasible or desirable once the natural type is ReadOnlySpan<byte> (which is the more important piece). If we don’t enable it, instead of byte[] bytes = “data”u8;, you’d write byte[] bytes = “data”u8.ToArray();, taking advantage of the ToArray method ReadOnlySpan<byte> already has.

[...]

@BrennanConroy
Copy link
Member

BrennanConroy commented Apr 30, 2022

We just updated the SDK in ASP.NET Core and are hitting two classes of issues. I think both are variations that would be resolved by the first item mentioned above.

string foo = "...";
Method(foo); // now ambiguous, resulting in compile error
void Method(ReadOnlySpan<char>);
void Method(ReadOnlySpan<byte>);

With a Protect method that accepts a byte[] and an extension method that accepts a string with the invocation site of:
protector.Protect("Hello\ud800");
We now get the error:

error CS9026: The input string cannot be converted into the equivalent UTF8 byte representation. Unable to translate Unicode character \uD800 at index 5 to specified code page.

Using a similar API but with a valid utf8 convertible string we now call the byte[] overload which is breaking as we assume the string is base64 encoded and decode it first, so now we get a byte[] that's actually a base64 string and assume it's already base64 decoded.

@AlekseyTs AlekseyTs added the 4 - In Review A fix for the issue is submitted for review. label May 26, 2022
Cosifne added a commit that referenced this issue Jun 1, 2022
* Avoid caching RelativeIndentationData.effectiveBaseToken

* Relax assertion in SyntheticBoundNodeFactory.Convert (#61287)

* Add missing node state transition

It's valid for an input to be modified causing a downstream input to be removed (for example a syntax tree can change what is in it, leading to the downstream node not generating something). 

//cc @jkoritzinsky in case I'm missing something obvious.

* Simplify logic, more tests, rework tests

* Fix several LSP completion kind mappings (#61243)

* Fix several LSP completion kind mappings

* Extension method

* Fix issue where we were getting a raw-string in a skipped token, causing a crash

* Add test

* Add an UWP OptProf test for IDE

* Disable smart copy/paste when line-copy is involved.

* Update src/EditorFeatures/CSharp/StringCopyPaste/StringCopyPasteCommandHandler.cs

* Add additional sanity check

* Fix

* Add pointer for `AnalysisLevel` to warning waves doc (#61196)

* Remove parameter null-checking from the Language Feature Status list (#61302)

* Switch to GetRequiredService

* Simpler approach

* [EE] Implement IDkmClrFullNameProvider2 in Roslyn's ResultProvider Formatter. (#60522)

* Implement IDkmClrFullNameProvider2 in Roslyn's ResultProvider Formatter.

Issue:
- Debugger added IDkmClrFullNameProvider2 API with https://devdiv.visualstudio.com/DevDiv/_git/Concord/pullrequest/301518
- It is currently temporarily implement in Concord, needs to be implemented in Roslyn so implementation can be removed from Concord.

Changes:
1. Formatter:
- Implement IDkmClrFullNameProvider2. It has 2 methods, one to format local names and the other given field metadata.
- Currently only implemented for C#. I'm not that familiar with VB and the GeneratedNames stuff in VB needs some splitting and moving around to get working.

2. Unit tests:
- Add unit tests for the common cases of hoisted locals, synthesized locals, etc.

3. Versions.props:
Update MicrosoftVSSDKVSDConfigToolVersion to a newer version which recognizes IDkmClrFullNameProvider2.

* PR feedback - fix casing of MetadataImport

* PR feedback - move GetOriginalLocalVariableName, GetOriginalFieldName back to CSharpFormatter

Co-authored-by: Ramkumar Ramesh <[email protected]>

* PR feedback

* Fix and/or completion after parenthesized pattern

* [LSP] Disable GoToDef/GoToImpl integration tests  (#61190)

* Added syntax context flag

* Keywords c#

* Keep leadin trivia inside Main method if it is more likely to be a statement comment rather than a file header when converting to 'Program.Main' style program

* Symbols

* Snippets C#

* Remove set accessor of new SyntaxContext property

* Remove PROTOTYPE comments (#61322)

* Added assertions, comments, and refactored for clarity

* Change VB language version Roslyn.sln uses to "latest" (#61313)

To allow us to use the latest VB features, such as setting init-only properties.

* PR feedback

* Remove document options provider (#61228)

* Remove IDocumentOptionsProvider

* Fold DocumentSpecificOptionSet into DocumentOptionSet

* Use an explicit option to control frozen-partial semantics in inheritance margin

* Break into separate methods

* Fix null ref (#61342)

* Simplify internal types search

* Removed unintentional WorkItem's

* Make static

* PR feedback

* [LSP] Support LSP services associated with LSP server instances (with lifetimes that match). (#61266)

* Add support for exporting services that are created for each server
instance.

* Use the correct span to rename after invoking extract-method manually.

* Move Spellcheck capabilities to be activated in all scenarios (#61366)

* move spellcheck capability to always activated server

* Rename type

* rename fields

* [LSP] Add JSON semantic token classifications (#61231)

* Make async

* Test fallout

* Unify nint and IntPtr (#60913)

* restore file

* Minor simplification to rename code

* Remaining fallback options (#60888)

* Add missing fallbacks

* Fallback options from ILegacyGlobalOptionsWorkspaceService

* Pass options to CodeCleaner APIs.

* Fallback options from ILegacyGlobalOptionsWorkspaceService 2

* Fallback in tests

* CodeModel

* Remove CodeActionOptions.Default

* Remove dependency on IGlobalOptionService from inline hints service

* Remove obsolete VS UnitTesting APIs.

* Remote dependency on IGlobalOption service from RemoteProcessTelemetryService

* Remove ExportGlobalOptionProviderAttribute

* Remove PythiaOptions

* Remove DiagnosticOptions from solution snapshot

* Access options via AnalyzerOptionsProvider

* Split ISyntaxFormatting.cs

* Simplify initializers

* Fix

* Move AddImportPlacementOptions to a separate file in compiler extensions

* Move option providers to workspace extensions

* Move CodeCleanupOptions and IdeAnalyzerOptions to workspace extensions

* Layering

* Replace legacy GetOptions with AnalyzerOptionsProvider; add missing options

* Parameter rename, comment

* Add LineFormattingOptionsProviders

* CodeFixOptionsProvider, include CodeStyleOptions in CodeActionOptions, include LineFormattingOptions in ExtractMethodGenerationOptions

* DocumentFormattingOptions

* Move a couple of options from IdeCodeStyle to SyntaxFormatting to make them available to new document formatter

* Generalize using placement option in AddImportPlacementOptions

* Move PreferParameterNullChecking and AllowEmbeddedStatementsOnSameLine to CSharpSimplifierOptions

* Move CodeGen options to compiler extensions

* UseExpressionBody

* Eliminate more calls to Document.GetOptionsAsync

* Cleanup DocumentationCommentOptions

* Line formatting options

* DefaultConditionalExpressionWrappingLength

* insert_final_newline

* Add PreferThrowExpression to simplifier options

* Add AddNullChecksToConstructorsGeneratedFromMembers to CodeGenOptions

* GenerateEqualsAndGetHashCodeFromMembersOptions

* IImplementInterfaceService

* AddParameterCheckCodeRefactoringProvider

* ReplaceMethodWithPropertyService

* NamingStylePreferences

* Eliminate legacy option helpers

* Fix up ExtractMethod options

* Remove SyntaxFormattingOptions ctors

* Replace extra helpers with CodeFixOptionsProvider

* PreferUtf8StringLiterals

* RazorLineFormattingOptionsStorage

* Remove usage of Document.GetOptionsAsync - 1

* Remove internal usage of DocumentOptionSet and Document.GetOptionsAsync

* Simplify and unify option definition patterns

* Fixes and pattern unification

* Serialization and equality

* Simplify

* Rename

* Fixes

* CompletionOption fixes

* Feedback

* Single switch

* Add frozen delegate tests

* Add extract method test

* Add workitem

* Fix

* Add support for CompilerFeatureRequiredAttribute (#61113)

Adds support for decoding and reporting errors when `CompilerFeatureRequiredAttribute` is encountered on metadata type symbols. We also block applying the attribute by hand in both C# and VB.

* Delegate keyword tests

* Verifying interpolation escaping of curlies in content (#61387)

* Verify classification on var pattern (#61376)

* Add lambda parameters in scope in nameof using proper binder (#61382)

* Parse `unchecked` gracefully in operators (#61309)

* Update src/Features/Core/Portable/InheritanceMargin/AbstractInheritanceMarginService_Helpers.cs

* Update src/Features/Core/Portable/InheritanceMargin/AbstractInheritanceMarginService_Helpers.cs

* Use AspNetCoreKey to external access assembly

* Fix typo (#61380)

* Add new collapsing option for metadata files that contain source (#61205)

* More correctly respect background analysis scope (#61392)

* Fix function id (#61400)

* Wait for async operations to complete before proceeding

* Update SDK to .NET 7 Preview 4

* Avoid logging work when no logger is specified

* Emit CompilerFeatureRequired for ref structs when present.

* Support emitting CompilerFeatureRequiredAttribute for contructors of types with required members.

* [LSP] Small cleanup for pull diagnostics logging (#61417)

* Small logging cleanup on pull diagnostics code

* Update tests to account for lsp diagnostics throwing when mismatch in diagnostic mode

* Address feedback from numeric IntPtr feature review (#61418)

* Fix binding for checkbox text in rename dialogs (#61430)

Previously text was bound to properties on the control type using x:Name. This restores that

* Fix build

* Restrict IsGenericConstraintContext for C#

* More instrumentation for ReferenceCachingCS (#61402)

* Do not filter snippets

* Reverted delegate completion

* Allow source link, embedded or decompiled source in Peek Definition (#61427)

* Update status for DIM and numeric IntPtr (#61464)

* Lazily produce semantic models in source generators

* Remove unnecessary finalizer state handling from MethodToStateMachineRewriter (#61409)

* Update src/Tools/IdeCoreBenchmarks/IncrementalSourceGeneratorBenchmarks.cs

* Update src/Tools/IdeCoreBenchmarks/IncrementalSourceGeneratorBenchmarks.cs

* Update src/Tools/IdeCoreBenchmarks/IncrementalSourceGeneratorBenchmarks.cs

* lint

* Address prototype comments (#61436)

* Give a warning when obsolete is applied to a required member and the containing context is not obsolete, or all constructors are not obsolete/setsrequiredmembers.

* Restore nullable constructor warnings for constructors with `SetsRequiredMembersAttribute`.

* Remove prototype comments.

* Add tests and extra state

* Extract checking for generic constraint context to extension method & minor refactoring

* Update test comment

* Generate single OptProf config for compiler vsix

Currently OptProf can't support profiling for multiple flavors of vsix, and Optprof test only runs on X64. Multiple configs with same profiling binary are causing the tests to fail.

* Reduce release/64 limit for EndToEndTests.Constraints (#61480)

* Reduce release/64 limit for EndToEndTests.Constraints

* Lower bar more

* Lower bar more

* Remove parameter nullchecking feature (#61397)

* Fix null ref for JS files (#61472)

* Add file paths to interactive buffers and documents to support LSP requests (#61441)

* Add file paths to interactive buffers and documents to support LSP requests

* Switch to false returning predicate

* more feedback

* mroe feedback

* Reword comment

* Log additional information from CopyRefAssembly (#61384)

* update versioning to use languageserver.client.implementation

* Require VS 17.0 in signed build.

* Bump LSP protocol version (#61494)

* Bump protocol version

* React to breaking changes in foldingrangekind

* Disable inheritance margin for interactive documents (#61476)

* fix

* Improvements to the background compiler component

* add docs

* Simplify

* docs

* Check token

* Simplify

* Simplify

* Rename enum field

* Lifted relational operator implies operands non-null when true (#61403)

* Use ImmutableArray instead of IEnumerable parameters

* Use assignabiilty instead of subclass test in extension loader

* Fix test

* Fix test

* Simplify rename implementation

* Rename

* Unify all end operations that rename performs

* restore code

* Remove unnecessary code'

* Unify error handling in rename

* Add comment

* message severities

* Restore

* Simplify

* Bring main-vs-deps back (#61514)

* Add main-vs-deps back

* Update eng/config/PublishData.json

Co-authored-by: Joey Robichaud <[email protected]>

Co-authored-by: Joey Robichaud <[email protected]>

* Simplify LSP reference update

* Adjust conversion from nuint to float/double (#61345)

* Add embedded classification for field initializers

* Add support for properties

* Update src/EditorFeatures/Core/InlineRename/InlineRenameSession.cs

* Update src/EditorFeatures/Core/RenameTracking/RenameTrackingTaggerProvider.RenameTrackingCommitter.cs

* Update src/EditorFeatures/Core/RenameTracking/RenameTrackingTaggerProvider.RenameTrackingCommitter.cs

* Fix setup authoring bug (#61508)

* Expose VirtualChars to asp.net (through EA) to facilitate route classification

* add docs

* Add member

* NRT

* Make async

* Make async

* Pull token out

* Use feature attribute

* Remove Utf8StringLiteral conversion (#61481)

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-04-18.md#target-typing-a-regular-string-literal-to-utf8-types

* Fix AbstractLanguageService constructor (#61513)

* Fix AbstractLanguageService constructor

* Fix formatting

* Don't throw in logging when the document path contains curly braces (#61524)

* Update unit tests

* Improve normalization to match idiomatic patterns for nested usings and fixed statements. (#61533)

* Better syntax normalization for fixed/using statements

* Add tests

* Prepare VB iterators for EnC support (#61488)

* Remove unused parameters

* Separate iterator finalizer states from resumable states.

* Remove unused

* Skip timing test (#61222)

* Skip test

* Add some APIs on AspNetCoreVirtualCharSequence

* Add IsDefault

* Final prototype comments and top level statements local adjustments (#61551)

Clean up the last of the prototype comments and adjust the parsing of locals named required in top level statements.

* IDE Support for Required Members (#61440)

* Add required keyword recommender.

* Add SyntaxNormalizer test.

* Code generation support.

* Add SymbolDisplay

* F1 help service and test fix.

* Add order modifier tests and update.

* Change natural type of UTF-8 string literals to `ReadOnlySpan<byte>` and null terminate the underlying blob. (#61532)

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-04-18.md#natural-type-of-utf8-literals
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-04-18.md#should-utf8-literals-be-null-terminated

Related to #61517
Closes #60644

* fix assumption of length

* Bind native integers in cref (#61431)

* Use VS2022 for PR Validation builds

* PR feedback

* Add unit test project IVT to ExternalAccess.AspNetCore

* Fix generation location when generating across files

* Add test

* Relax check

* EnC: Allow adding/removing await expressions and yield statements (#61356)

* Implements support for adding and removing await/yield return in C# async, iterator and async iterator methods.

* Fix syntax node associated with BoundTryStatement created from using syntax

* Update required members status (#61602)

* Implements support for adding and removing await/yield return in C# in the IDE (#61521)

Co-authored-by: Sam Harwell <[email protected]>
Co-authored-by: Julien Couvreur <[email protected]>
Co-authored-by: Chris Sienkiewicz <[email protected]>
Co-authored-by: DoctorKrolic <[email protected]>
Co-authored-by: Gen Lu <[email protected]>
Co-authored-by: DoctorKrolic <[email protected]>
Co-authored-by: Cyrus Najmabadi <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: CyrusNajmabadi <[email protected]>
Co-authored-by: dotnet bot <[email protected]>
Co-authored-by: AlFas <[email protected]>
Co-authored-by: Ramkumar Ramesh <[email protected]>
Co-authored-by: Ramkumar Ramesh <[email protected]>
Co-authored-by: AlekseyTs <[email protected]>
Co-authored-by: Allison Chou <[email protected]>
Co-authored-by: Tomáš Matoušek <[email protected]>
Co-authored-by: David Wengier <[email protected]>
Co-authored-by: David Barbet <[email protected]>
Co-authored-by: Fred Silberberg <[email protected]>
Co-authored-by: James Newton-King <[email protected]>
Co-authored-by: Weihan Li <[email protected]>
Co-authored-by: Joey Robichaud <[email protected]>
Co-authored-by: Julien Couvreur <[email protected]>
Co-authored-by: Andrew Hall <[email protected]>
Co-authored-by: Manish Vasani <[email protected]>
Co-authored-by: Jared Parsons <[email protected]>
Co-authored-by: Joey Robichaud <[email protected]>
Co-authored-by: Rikki Gibson <[email protected]>
Co-authored-by: akhera99 <[email protected]>
Co-authored-by: Ankita Khera <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants