Skip to content
This repository has been archived by the owner on Dec 18, 2018. It is now read-only.

Add an option to Kestrel to disable threadpool dispatching #1390

Closed
halter73 opened this issue Feb 23, 2017 · 14 comments
Closed

Add an option to Kestrel to disable threadpool dispatching #1390

halter73 opened this issue Feb 23, 2017 · 14 comments

Comments

@halter73
Copy link
Member

No description provided.

@tmds
Copy link
Contributor

tmds commented Apr 6, 2017

@halter73 @davidfowl This is now an InternalKestrelServerOptions, is the plan to make it a KestrelServerOptions?

@halter73
Copy link
Member Author

halter73 commented Apr 6, 2017

I don't think there are currently plans to put it on KestrelServerOptions proper. We are using the internal options in some of our benchmarks, and the improvement seen by disabling threadpool dispatching is negligible in netcoreapp2.0.

@tmds
Copy link
Contributor

tmds commented Apr 6, 2017

I did a benchmark using two 8 core Azure machines (D4 I think) a la Techempower plaintext it was using 1.1 runtime, don't know if that makes a big difference.
With threadpool, giving Kestrel 2 threads had best performance.
Without threadpool, giving Kestrel 8 threads had best performance.
The increase was about 7%. Did you see a similar increase? Perhaps my benchmarking was not ok.
I think enabling this depends on the application and for some Transports the benefit may be more than for others.

@tmds
Copy link
Contributor

tmds commented Apr 6, 2017

Form the looks of aspnet/benchmarks it seems no dispatching is ran with kestrelthreads equal to 1/2.
For plaintext, this compares "dispatching using all cores" to "no dispatching using 2 cores".
Shouldn't no dispatching kestrelthreads be set to the number of cores of the machine to make it a fair comparison?

@benaadams
Copy link
Contributor

benaadams commented Apr 6, 2017

Threadpool has been improved in 2.0. No-dispatching means the application code blocks I/O. Plaintext reposnse stream is just a memory copy of a cached set of 13 bytes so no real application work (just server work). With threadpool the total I/O thread count being less that the physical core count makes sense as you still need CPU for the threadpool to do the application work.

Not dispatching introduces Head-of-Line blocking; (which was one of the "failures" of http1.1 pipelining); where a slow request on one connection will stop the processing of all other connections on the same thread; regardless of CPU being free.

That's not to mention if the user application code makes a sync(blocking) Read/Write Task.Wait() or Task.Result, sync SQL Connect/Execute call when you've knocked out an entire Kestrel thread of I/O processing.

Perhaps just dispatching the application code to the thread pool; but keeping the Kestrel server code on the same thread might seem like a good comprise between the two?

However, picking a site at random... For the new .NET docs api browser site my browser sends 7kB of request headers, mostly cookie, which has to be parsed by the server and is CPU bound work. So again parsing that 7kB to create the header dictionary, request object etc will block all I/O on that thread, regardless of whether there is CPU is free. Whereas all the I/O has been done for that request.

So on balance, for the general case, it likely works out better to dedicate the Kestrel I/O threads to doing the I/O, then dispatching the processing of the data to spare CPU while the I/O thread goes back to getting the next bit of data.

Otherwise you'll have very unbalanced and poor utilization of CPU.

Just my 2c...

@tmds
Copy link
Contributor

tmds commented Apr 6, 2017

I mean 1 or 2.

    <Scenarios Include="-n Plaintext -o KestrelHttpServer@dev --kestrelThreadCount 1" />
    <Scenarios Include="-n Plaintext -o KestrelHttpServer@dev --kestrelThreadCount 1 --kestrelThreadPoolDispatch false" />
    <Scenarios Include="-n Plaintext -o KestrelHttpServer@feature/dev-si --kestrelThreadCount 2" />
    <Scenarios Include="-n Plaintext -o KestrelHttpServer@feature/dev-si --kestrelThreadCount 2 --kestrelThreadPoolDispatch false" />

I think to make it a fair comparison, it should be 6 when kestrelThreadPoolDispatch is false.

@tmds
Copy link
Contributor

tmds commented Apr 6, 2017

6 being the number of cores on an Intel® Xeon® Processor E5-1650.

@Drawaes
Copy link
Contributor

Drawaes commented Apr 6, 2017

I think tweaking on and off at this point is a bit academic. There are enough shifting parts and variables to get a solid baseline for this release. Including new transport types. When the dust has settled I think the whole threading subject could be revisited including numa dispatching, where nics are located relative to sockets, where the pools are etc.

@tmds
Copy link
Contributor

tmds commented Apr 6, 2017

I agree it makes sense to defer request handling to the threadpool to avoid the issues caused by bad application code.

What are your thoughts on the threadcount used for the benchmark?

@benaadams
Copy link
Contributor

What are your thoughts on the threadcount used for the benchmark?

Don't know the rational, at a guess, its examining the overhead associated with dispatching to the threadpool vs not?

@davidfowl
Copy link
Member

@tmds I think you're right, we need to change the benchmark. If we're using libuv threads make threads = number of cores or number of cores * 2.

@davidfowl davidfowl reopened this Apr 9, 2017
@tmds
Copy link
Contributor

tmds commented Apr 11, 2017

Related topic: Kestrel ThreadCount.

So Netty does double amount of cpus ('logical processors').
And Kestrel does half.

When dispatching every request to the threadpool it makes sense to have a lower number of io threads. There is more load to parse the request and generate a response than there is to get data from/to the kernel.
Perhaps Netty doesn't dispatch? Or why would they have double the amount...

@tmds
Copy link
Contributor

tmds commented Apr 11, 2017

From http://stackoverflow.com/questions/5474372/how-netty-uses-thread-pools

Per end point there is a boss thread and a worker thread pool. The worker thread pool defaults to 2 times the number of cores.

  • The boss thread is similar to the ListenerPrimary in that it accepts an distributes connections. It does not handle requests.
  • The worker threads do the reads and writes. The handlers are executed in the worker threads. So no dispatching to a threadpool.

@tmds
Copy link
Contributor

tmds commented Apr 12, 2017

The approach matches well with what @benaadams explained before.
By not doing any work, the boss thread avoids getting blocked.
And there are more worker threads, which reduces the chance of all of them getting blocked.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants