Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ManagedWebSocket to avoid forcing Task allocations for ReceiveAsync #56282

Merged
merged 2 commits into from
Aug 11, 2021

Conversation

stephentoub
Copy link
Member

The ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation). Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask. So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask. This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton). But once we switch to using the new pooling builder, that's no longer the case.

This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock. CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop. This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well. Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock.

Closes #50921

Method Toolchain Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
PingPong \main\corerun.exe 148.7 ms 2.92 ms 4.00 ms 1.00 29750.0000 3000.0000 250.0000 180,238 KB
PingPong \pr\corerun.exe 108.9 ms 1.56 ms 1.38 ms 0.72 - - - 249 KB
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Net.WebSockets;

[MemoryDiagnoser]
public class Program
{
    public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    private class Connection
    {
        public readonly WebSocket Client, Server;
        public readonly Memory<byte> ClientBuffer = new byte[256];
        public readonly Memory<byte> ServerBuffer = new byte[256];
        public readonly CancellationToken CancellationToken = default;

        public Connection()
        {
            (Stream Stream1, Stream Stream2) streams = ConnectedStreams.CreateBidirectional();
            Client = WebSocket.CreateFromStream(streams.Stream1, isServer: false, subProtocol: null, Timeout.InfiniteTimeSpan);
            Server = WebSocket.CreateFromStream(streams.Stream2, isServer: true, subProtocol: null, Timeout.InfiniteTimeSpan);
        }
    }

    private Connection[] _connections = Enumerable.Range(0, 256).Select(_ => new Connection()).ToArray();
    private const int Iters = 1_000;

    [Benchmark]
    public Task PingPong() =>
        Task.WhenAll(from c in _connections select Task.WhenAll(
                         Task.Run(async () =>
                         {
                             for (int i = 0; i < Iters; i++)
                             {
                                 await c.Server.ReceiveAsync(c.ServerBuffer, c.CancellationToken);
                                 await c.Server.SendAsync(c.ServerBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
                             }
                         }),
                         Task.Run(async () =>
                         {
                             for (int i = 0; i < Iters; i++)
                             {
                                 await c.Client.SendAsync(c.ClientBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
                                 await c.Client.ReceiveAsync(c.ClientBuffer, c.CancellationToken);
                             }
                         })));
}

@stephentoub stephentoub added area-System.Net tenet-performance Performance related issue labels Jul 26, 2021
@stephentoub stephentoub added this to the 6.0.0 milestone Jul 26, 2021
@ghost
Copy link

ghost commented Jul 26, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

The ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation). Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask. So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask. This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton). But once we switch to using the new pooling builder, that's no longer the case.

This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock. CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop. This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well. Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock.

Closes #50921

Method Toolchain Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
PingPong \main\corerun.exe 148.7 ms 2.92 ms 4.00 ms 1.00 29750.0000 3000.0000 250.0000 180,238 KB
PingPong \pr\corerun.exe 108.9 ms 1.56 ms 1.38 ms 0.72 - - - 249 KB
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Net.WebSockets;

[MemoryDiagnoser]
public class Program
{
    public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    private class Connection
    {
        public readonly WebSocket Client, Server;
        public readonly Memory<byte> ClientBuffer = new byte[256];
        public readonly Memory<byte> ServerBuffer = new byte[256];
        public readonly CancellationToken CancellationToken = default;

        public Connection()
        {
            (Stream Stream1, Stream Stream2) streams = ConnectedStreams.CreateBidirectional();
            Client = WebSocket.CreateFromStream(streams.Stream1, isServer: false, subProtocol: null, Timeout.InfiniteTimeSpan);
            Server = WebSocket.CreateFromStream(streams.Stream2, isServer: true, subProtocol: null, Timeout.InfiniteTimeSpan);
        }
    }

    private Connection[] _connections = Enumerable.Range(0, 256).Select(_ => new Connection()).ToArray();
    private const int Iters = 1_000;

    [Benchmark]
    public Task PingPong() =>
        Task.WhenAll(from c in _connections select Task.WhenAll(
                         Task.Run(async () =>
                         {
                             for (int i = 0; i < Iters; i++)
                             {
                                 await c.Server.ReceiveAsync(c.ServerBuffer, c.CancellationToken);
                                 await c.Server.SendAsync(c.ServerBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
                             }
                         }),
                         Task.Run(async () =>
                         {
                             for (int i = 0; i < Iters; i++)
                             {
                                 await c.Client.SendAsync(c.ClientBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
                                 await c.Client.ReceiveAsync(c.ClientBuffer, c.CancellationToken);
                             }
                         })));
}
Author: stephentoub
Assignees: -
Labels:

area-System.Net, tenet-performance

Milestone: 6.0.0

@stephentoub
Copy link
Member Author

@davidfowl, can you help validate this with ASP.NET functional tests and against relevant ASP.NET perf tests? That needs to be done before this can be merged.

Copy link
Member

@karelz karelz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarnaViire if you get a chance to take a look as well, that would be good.

@stephentoub
Copy link
Member Author

@davidfowl, @adityamandaleeka, any update on validating this change for ASP.NET?

Copy link
Member

@CarnaViire CarnaViire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…veAsync

The ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation).  Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask.  So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask.  This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton).  But once we switch to using the new pooling builder, that's no longer the case.

This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock.  CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop.  This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well.  Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock.
@stephentoub
Copy link
Member Author

Once CI is green, I'll go ahead and merge this. In my own tests, this shows up as neutral to positive both locally in microbenchmarks and on asp-perf-lin and asp-citrine-lin in terms of throughput. We can revert it if it ends up having any negative impact once it makes it to dotnet/aspnetcore.

@davidfowl, @adityamandaleeka, I'd still appreciate extra validation here, but at this point if I don't merge it's not going to make the release. The new websockets benchmark doesn't seem to really stress the system with or without this.

@davidfowl
Copy link
Member

Merging and propagating a dependency flow PR is the easiest way to get validation.

@stephentoub stephentoub merged commit 19b86fb into dotnet:main Aug 11, 2021
@stephentoub stephentoub deleted the websocketreceivealloc branch August 11, 2021 02:14
@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use the new async state machine pooling feature on websockets
4 participants