Skip to content
This repository has been archived by the owner on Dec 18, 2018. It is now read-only.

Kestrel: Deadlocked in SocketOutput? #1278

Closed
physhi opened this issue Dec 31, 2016 · 15 comments
Closed

Kestrel: Deadlocked in SocketOutput? #1278

physhi opened this issue Dec 31, 2016 · 15 comments
Assignees

Comments

@physhi
Copy link

physhi commented Dec 31, 2016

There seems to be a dead lock in SocketOutput (and kestrel stops accepting new connections). My server does lock up randomly. See below the two sets of threads that may be causing deadlock. I looked at the code and these two sets of stack does seem to have possibility of deadlocking.

I'd taken 2 other dumps and I saw same pair of stacks getting stuck.

Since I can't repro the deadlock, I can't really tell what's causing these threads to get in deadlock. This issue could be related to #1267, but big difference is that in my case CPU goes down to zero.

image

@davidfowl
Copy link
Member

davidfowl commented Dec 31, 2016

@physhi do you pass a cancellation token to any WriteAsync calls? What does your application look like? Also what version of kestrel are you using.

edit: Actually that stack looks like the request aborted token is being passed into WriteAsync.

@davidfowl
Copy link
Member

Yep there's a deadlock...

@physhi
Copy link
Author

physhi commented Dec 31, 2016

I'm using 1.1.0 version of dotnet core running on windows. My application is really a file store with a lot of big files, and I have bandwidth throttling and content range and bunch of content transformations implemented in code. What I do is I try to pass cancellation token as far as possible so that I can terminate request if user has disconnected from the server.

Currently, because of this deadlock issue, I've to implement a watchdog process with very aggressive ping times and recycling kestrel as soon as the server stops processing a request.

@davidfowl
Copy link
Member

The only way to avoid it would be to stop passing a cancellation token. It's something we need to fix for 1.1.1.

The problem is that we're disposing the cancellation registration in the write callback under the context lock

waitingTask.CancellationRegistration?.Dispose();
on the IO thread and this waits on any pending cancellation callbacks to execute. Turns out we have a cancellation callback waiting on the same lock .

/cc @muratg @halter73 @CesarBS

@davidfowl
Copy link
Member

@physhi Are you manually calling context.Abort() anywhere?

@physhi
Copy link
Author

physhi commented Dec 31, 2016

Yep, I have code path that calls abort, but looking at my logs, I don't see it being called.

@muratg muratg added this to the 1.1.1 milestone Jan 3, 2017
@muratg
Copy link
Contributor

muratg commented Jan 3, 2017

Putting this in 1.1.1 so that it's triaged with the rest of 1.1.1 items.

@davidfowl
Copy link
Member

@physhi have you tried not passing in the token?

@cesarblum cesarblum self-assigned this Jan 4, 2017
cesarblum pushed a commit that referenced this issue Jan 5, 2017
@physhi
Copy link
Author

physhi commented Jan 5, 2017

@davidfowl I'd tried not passing the token but still the problem persists.

@davidfowl
Copy link
Member

@physhi can you take a snapshot of the threads when it hangs without the token?

@physhi
Copy link
Author

physhi commented Jan 5, 2017

It's difficult to take snapshot, as I see hangs in production service and it get's recycled as soon as hang is detected. I just know that the service was restarted and that's it.

@davidfowl
Copy link
Member

The problem is that we're fixing the issue but if no cancellation token is passed into WriteAsync, this won't happen. It's possible there's another hang and what you're experiencing won't be fixed by this fix #1281

@physhi
Copy link
Author

physhi commented Jan 6, 2017

@davidfowl, Let me deploy the proposed fix and see if it fixes the issue. I'll know if there are not hangs in next 24 hours.

cesarblum pushed a commit that referenced this issue Jan 7, 2017
@muratg muratg modified the milestones: 1.2.0, 1.1.1 Jan 10, 2017
@muratg muratg modified the milestones: 1.2.0, 2.0.0 Jan 12, 2017
@muratg muratg removed this from the 1.2.0 milestone Jan 12, 2017
@runxc1
Copy link

runxc1 commented Jan 13, 2017

So I am trying to see if this is related to what I have seen a couple times over the last couple weeks. I have a fairly simple app that is running as an Azure App Service on a single Large Instance. It has sometimes gone more than a week without any Issues serving about 1.5 million pages a day and then will stop processing requests. If I restart the App it will start without any Issues and usually hums along after that. Looking in the log I see the following error just before it stops processing anything.

2017-01-12 21:47:28.036 +00:00 [Warning] Unable to bind to http://localhost:17739 on the IPv6 loopback interface.
System.AggregateException: One or more errors occurred. (Error -4089 EAFNOSUPPORT address family not supported) ---> Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvException: Error -4089 EAFNOSUPPORT address family not supported
at Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv.tcp_bind(UvTcpHandle handle, SockAddr& addr, Int32 flags)
at Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvTcpHandle.Bind(ServerAddress address)
at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.TcpListenerPrimary.CreateListenSocket()
at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Listener.b__8_0(Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.ListenerPrimary.d__12.MoveNext()
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.Wait()
at Microsoft.AspNetCore.Server.Kestrel.Internal.KestrelEngine.CreateServer(ServerAddress address)
at Microsoft.AspNetCore.Server.Kestrel.KestrelServer.Start[TContext](IHttpApplication`1 application)
---> (Inner Exception #0) Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvException: Error -4089 EAFNOSUPPORT address family not supported
at Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv.tcp_bind(UvTcpHandle handle, SockAddr& addr, Int32 flags)
at Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvTcpHandle.Bind(ServerAddress address)
at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.TcpListenerPrimary.CreateListenSocket()
at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.Listener.b__8_0(Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.ListenerPrimary.d__12.MoveNext()<---

@davidfowl
Copy link
Member

That error is harmless, hard to know if it's the same issue without any more information. A process dump when the application hangs would confirm.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants