Determine what to do with "async ValueTask" caching #13633

stephentoub · 2019-10-24T01:28:59Z

PR dotnet/coreclr#26310 added in an experimental feature, guarded by an environment variable feature flag, that uses cached objects behind the scenes to implement async ValueTask and async ValueTask<T> methods. For .NET 5, we need to decide how to proceed with this feature:

Delete it
Keep it as opt-in
Keep it as opt-out
Always on (delete the fallback)

We need more data to decide how to proceed, and in particular:

Are there workloads where it yields a significant performance benefit when it's enabled?
Are there workloads where it measurably harms performance when it's enabled?
What kind of impact does it have on code size in a representative app?
What is a good strategy to use for the employed cache?

If we decide to keep it, and especially if we decide to keep it as opt-out or always-on, we also need to validate diagnostics, in particular tracing to ensure we're not regressing anything impactful.

Functionally, there's also a behavior change associated with it. While ValueTask/ValueTask<T> are documented to only be used in very specific ways, the fact that instances produced by async methods before .NET 5 were backed by Task meant that you could get away with violating the contract. This change generally ends up requiring that the contract be enforced. We would want to ensure we had good analyzers in place to help catch any misuse: https://github.com/dotnet/corefx/issues/40739.

CALL TO ACTION:

Please try out the bits with your app and share your results: throughput, memory load, etc.

To enable the feature, set the DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS environment variable to either 1 or true.

This will only impact async ValueTask and async ValueTask<T> methods, so you may also want to look through your code to switch some internal and private async Task/async Task<T> methods to instead use ValueTask/ValueTask<T>... just make sure to only do so when the consumers abide by the rules, which is that an instance must only be consumed once: if callers are doing anything other than directly awaiting the result of the method, be careful.

The text was updated successfully, but these errors were encountered:

stephentoub · 2019-10-24T01:29:48Z

cc: @benaadams, @mgravell, @mjsabby

benaadams · 2019-10-24T01:36:12Z

If the choice ends up as

`4 Always on (delete the fallback)

Would there be any disadvantage to keeping the parameter DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKSLIMIT ?

stephentoub · 2019-10-24T01:38:27Z

Would there be any disadvantage to keeping the parameter DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKSLIMIT ?

Mainly a little more code to execute to read the environment variable. And we'd want to ensure it was clearly "unsupported" to enable us to change the algorithms employed in the future, assuming it was still relevant anyway by the time we shipped.

mgravell · 2019-10-24T07:06:42Z

I'll have a go at getting the PR working locally.

Just to mention another option re making the on/off decision; potentially this could also be done in code via the builder; the current API for picking the builder is awkward, but: (pure imaginary code):

[Pooled]
async ValueTask<...> Foo() {...}

This does, however, obviously hugely limit where it would apply. For better or for worse.

mgravell · 2019-10-24T07:16:08Z

I see the PR is merged; presumably that means I can use the nightly for this; sorry if this is a silly question, but is that the 5.0.x? the 3.1.x? neither? both?

stephentoub · 2019-10-24T12:43:19Z

Just to mention another option re making the on/off decision; potentially this could also be done in code via the builder

The two options I've explored here are:

Additional language/compiler support that would enable putting an attribute on methods to change the builder that's used at compile-time by Roslyn: Proposal: Allow [AsyncMethodBuilder(...)] on methods csharplang#1407.
Use reflection in the single builder to see whether an attribute is applied and use it (and potentially arguments on it) to configure behavior. This adds reflection on the first-use of any such async method, which is a non-trivial cost, especially on a startup path.

And as you say, it hugely limits applicability. If we decide that enabling this for all async ValueTask{<T>} methods is untenable, then I think we'd need (1) above to proceed with this (cc: @MadsTorgersen).

Regardless, yeah, it's another option. Thanks.

I see the PR is merged; presumably that means I can use the nightly for this; sorry if this is a silly question, but is that the 5.0.x? the 3.1.x? neither? both?

5.0.x. I just checked the latest from https://dotnetcli.blob.core.windows.net/dotnet/Sdk/master/dotnet-sdk-latest-win-x64.exe and it's not there yet, but I expect it should be soon.

stephentoub · 2019-10-28T02:35:47Z

@mgravell, the latest 5.0.x at https://dotnetcli.blob.core.windows.net/dotnet/Sdk/master/dotnet-sdk-latest-win-x64.exe (https://github.com/dotnet/core-sdk/blob/master/README.md#installers-and-binaries) now contains these changes.

mgravell · 2019-11-04T17:01:25Z

finally had some time to try and look at this, but it looks like the installer is internally inconsistent at the moment - the SDK is installing 5.0.100-alpha1-015573, but the runtime is complaining that

The framework 'Microsoft.NETCore.App', version '5.0.0-alpha1.19521.2' was not found.

reporting that 5.0.0-alpha.1.19554.1 at [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] is available. I'll try again with tomorrow's daily. Sorry.

stephentoub · 2019-11-07T20:27:31Z

Thanks, @mgravell. Did you configure a nuget.config ala the example from https://github.com/dotnet/core-sdk/blob/master/README.md#installers-and-binaries?

benaadams · 2020-02-03T06:04:08Z

Initial test (code) for HttpClient with http; its quite light on async depth, Tasks aren't the dominant form of allocation and they can't be completely removed (public api); however it does make a difference.

On of the things I like about this pooling is the lifetime of async statemachines is very indeterminate compared to a lot of other allocations. They aren't long life in terms of cpu, but are in terms of elapsed time which means they end up moving to higher GC generations perhaps more than the "sync" style of async programming would suggest.

master (Task)

Threads: 1, Request/s: 11,478.0, Time: 87,122 ms, Allocated/Request: 3,200
Threads: 2, Request/s: 24,753.7, Time: 40,398 ms, Allocated/Request: 3,200
Threads: 3, Request/s: 40,130.3, Time: 24,918 ms, Allocated/Request: 3,200
Threads: 4, Request/s: 50,418.6, Time: 19,833 ms, Allocated/Request: 3,199

PR #31623 (ValueTask) SET DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS=0

Threads: 1, Request/s: 11,308.8, Time: 88,426 ms, Allocated/Request: 3,200
Threads: 2, Request/s: 24,235.0, Time: 41,262 ms, Allocated/Request: 3,200
Threads: 3, Request/s: 39,305.8, Time: 25,441 ms, Allocated/Request: 3,200
Threads: 4, Request/s: 50,888.5, Time: 19,650 ms, Allocated/Request: 3,199

PR #31623 (ValueTask) SET DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS=1

Threads: 1, Request/s: 11,424.4, Time: 87,531 ms, Allocated/Request: 3,024
Threads: 2, Request/s: 25,417.1, Time: 39,343 ms, Allocated/Request: 3,024
Threads: 3, Request/s: 41,144.7, Time: 24,304 ms, Allocated/Request: 3,024
Threads: 4, Request/s: 52,013.2, Time: 19,225 ms, Allocated/Request: 3,025

"configProperties": {
  "System.GC.Server": true,
  "System.GC.HeapHardLimit": 20971520
}

mgravell · 2020-02-06T12:52:43Z

I wasn't able to get my previous bits updated to .NET 5 (too many moving parts), however: here's an update that I did using RESP using @davidfowl's "Bedrock" pieces (with some minor tweaks to make it run on the .NET 5 version of the Kestrel API - just the new KestrelServer(...) signature); the test is to send and receive 25k PING/PONG message with a local redis server:

Baseline (environment variable `0`):

Name Total (Allocations)

System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.AsyncStateMachineBox1 73808
System.Threading.QueueUserWorkItemCallbackDefaultContext`1 24601
(and then small-hundreds-level things from general app startup)

c:\Code\Bedrock.Resp\tests\SimpleClient>dotnet run -c Release
Enabled: 0
ExecuteBedrock: time for 25000 ops (val-type): 2162ms

c:\Code\Bedrock.Resp\tests\SimpleClient>dotnet run -c Release
Enabled: 0
ExecuteBedrock: time for 25000 ops (val-type): 2171ms

Comparison (environment variable `1`):

Name Total (Allocations)

System.Threading.QueueUserWorkItemCallbackDefaultContext`1 24909
(and then small-hundreds-level things from general app startup)

c:\Code\Bedrock.Resp\tests\SimpleClient>dotnet run -c Release
Enabled: 1
ExecuteBedrock: time for 25000 ops (val-type): 2166ms

c:\Code\Bedrock.Resp\tests\SimpleClient>dotnet run -c Release
Enabled: 1
ExecuteBedrock: time for 25000 ops (val-type): 2171ms

So: no noticeable negative impact on perf, but a big reduction in allocs. The QueueUserWorkItemCallbackDefaultContext is a known thing that is being handled separately. Basically, this makes things zero-alloc as expected. Nice work, if we can keep it!

(code is available from https://github.com/mgravell/Bedrock.Resp/tree/NET5)

One unrelated thing that is confusing me; I get significantly better performance if I Ctrl+F5 in the IDE (in release mode) vs dotnet run -c Release - down to 1750ish milliseconds; no idea why!

stephentoub · 2020-02-06T13:04:47Z

Thanks, @mgravell!

One of the issues here is that allocations are for the most part a secondary indication of perf, in that the hope is that by reducing allocation, you improve throughput. But here to reduce allocation, we're pooling, which has its own costs. So a decrease in allocations that doesn't improve throughput at all isn't necessarily a win, especially if it brings with it other downsides.

It'd be great if you can try out real workloads when you get a chance to see if there's any meaningful gain.

davidfowl · 2020-02-06T15:31:51Z

Does the working set get better as a result? You should be able to run dotnet counters before and after to get a glimpse of what the metics look like.

stephentoub · 2020-02-06T15:34:10Z

(Working set is good to look at it, but working set improvements could also instead suggest tweaks to GC configuration, as such pooling should primarily help working set if the GC has determined it's not worth doing a collection yet. Even so, as David suggests, it's a good data point.)

YairHalberstadt · 2020-03-17T06:11:14Z

Thought I'd add a link to your blogpost here @stephentoub:

https://devblogs.microsoft.com/dotnet/async-valuetask-pooling-in-net-5/

stephentoub · 2020-03-17T09:10:45Z

Thanks, @YairHalberstadt.

aL3891 · 2020-03-17T18:40:08Z

Coming from the blog post, this is really really cool :)

I haven't done any measurements but I can see this being beneficial for some places in our code, but I'd be worried to enable it globally. Perhaps it could be enabled/disabled with an attribute, that way you could choose the scope (method, class,assembly) and then down the line if we want it do be the default it could be a default msbuild parameter that works the same as the other assembly info attributes

stephentoub · 2020-03-17T18:48:54Z

Thanks, @aL3891. Technically it could be done in a manner you suggest, and I considered something like that. The main problem I hit is it adds a non-trivial amount of expensive reflection on first use of every async method (whether it uses the attribute or not), and that can contribute non-trivially to things like start-up time costs. If we could figure out how to avoid that, it might be a very reasonable opt-in path to consider.

aL3891 · 2020-03-17T19:40:11Z

I was thinking the attribute could be checked/used at compile time similar to CallerMemberNameAttribute, burning it into the generated code (atleast that's how I think that attribute works, apologies if I'm incorrect)
Perhaps that is not possible/practical though

Still, a very cool feature for micro service apps with very chatty APIs

Perhaps another option is adding a new PooledValueTask that is essentially just a marker for saying this task uses pooling, that would also sidestep backcompat issues, but that's also another type to keep track of :)

stephentoub · 2020-06-28T18:49:31Z

For .NET 5, the plan is to leave the code for this in but off by default.

This is experimental feature that adds about 1.7kB in binary footprint per method returning ValueTask. Ifdef it out for native AOT until we figure out what to do with it. See dotnet/runtime#13633.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

mgravell mentioned this issue Jan 31, 2020

ProtocolReader.ReadAsync doesn't advance if read is successful davidfowl/BedrockFramework#70

Closed

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020

stephentoub removed the untriaged New issue has not been triaged by the area owner label Feb 28, 2020

svick mentioned this issue Apr 13, 2020

Proposal: Memory and Span friendly overloads in Ping class #34856

Open

jkotas mentioned this issue Apr 29, 2020

dotnet/runtimelab repo proposal #35609

Closed

stephentoub modified the milestones: 5.0.0, 6.0.0 Jun 28, 2020

jkotas mentioned this issue Oct 26, 2020

Disable value task pooling for native AOT dotnet/runtimelab#271

Merged

YohDeadfall mentioned this issue Dec 31, 2020

Made every copy operation fully async npgsql/npgsql#3433

Merged

This was referenced Mar 19, 2021

API Proposal: AsyncMethodBuilderOverrideAttribute and PoolingAsyncValueTaskMethodBuilders #49903

Closed

Add AsyncMethodBuilderOverride and PoolingAsyncValueTaskMethodBuilders #50116

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 23, 2021

halter73 mentioned this issue Mar 25, 2021

Enable CA2012 (Use ValueTask Correctly) dotnet/aspnetcore#31221

Merged

stephentoub closed this as completed in #50116 Mar 31, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 31, 2021

ghost locked as resolved and limited conversation to collaborators Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine what to do with "async ValueTask" caching #13633

Determine what to do with "async ValueTask" caching #13633

stephentoub commented Oct 24, 2019

stephentoub commented Oct 24, 2019

benaadams commented Oct 24, 2019

stephentoub commented Oct 24, 2019

mgravell commented Oct 24, 2019 •

edited

Loading

mgravell commented Oct 24, 2019

stephentoub commented Oct 24, 2019 •

edited

Loading

stephentoub commented Oct 28, 2019 •

edited

Loading

mgravell commented Nov 4, 2019

stephentoub commented Nov 7, 2019

benaadams commented Feb 3, 2020

mgravell commented Feb 6, 2020 •

edited

Loading

stephentoub commented Feb 6, 2020 •

edited

Loading

davidfowl commented Feb 6, 2020

stephentoub commented Feb 6, 2020

YairHalberstadt commented Mar 17, 2020

stephentoub commented Mar 17, 2020

aL3891 commented Mar 17, 2020 •

edited

Loading

stephentoub commented Mar 17, 2020 •

edited

Loading

aL3891 commented Mar 17, 2020 •

edited

Loading

stephentoub commented Jun 28, 2020

Determine what to do with "async ValueTask" caching #13633

Determine what to do with "async ValueTask" caching #13633

Comments

stephentoub commented Oct 24, 2019

CALL TO ACTION:

Please try out the bits with your app and share your results: throughput, memory load, etc.

stephentoub commented Oct 24, 2019

benaadams commented Oct 24, 2019

stephentoub commented Oct 24, 2019

mgravell commented Oct 24, 2019 • edited Loading

mgravell commented Oct 24, 2019

stephentoub commented Oct 24, 2019 • edited Loading

stephentoub commented Oct 28, 2019 • edited Loading

mgravell commented Nov 4, 2019

stephentoub commented Nov 7, 2019

benaadams commented Feb 3, 2020

mgravell commented Feb 6, 2020 • edited Loading

Baseline (environment variable 0):

Comparison (environment variable 1):

stephentoub commented Feb 6, 2020 • edited Loading

davidfowl commented Feb 6, 2020

stephentoub commented Feb 6, 2020

YairHalberstadt commented Mar 17, 2020

stephentoub commented Mar 17, 2020

aL3891 commented Mar 17, 2020 • edited Loading

stephentoub commented Mar 17, 2020 • edited Loading

aL3891 commented Mar 17, 2020 • edited Loading

stephentoub commented Jun 28, 2020

mgravell commented Oct 24, 2019 •

edited

Loading

stephentoub commented Oct 24, 2019 •

edited

Loading

stephentoub commented Oct 28, 2019 •

edited

Loading

mgravell commented Feb 6, 2020 •

edited

Loading

Baseline (environment variable `0`):

Comparison (environment variable `1`):

stephentoub commented Feb 6, 2020 •

edited

Loading

aL3891 commented Mar 17, 2020 •

edited

Loading

stephentoub commented Mar 17, 2020 •

edited

Loading

aL3891 commented Mar 17, 2020 •

edited

Loading