Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carrier Thread pinning on stack CommittingOutputStream.flushBuffer() #5623

Closed
vasanth-bhat opened this issue Apr 24, 2024 · 14 comments · Fixed by #5629 or #5632
Closed

Carrier Thread pinning on stack CommittingOutputStream.flushBuffer() #5623

vasanth-bhat opened this issue Apr 24, 2024 · 14 comments · Fixed by #5629 or #5632

Comments

@vasanth-bhat
Copy link

vasanth-bhat commented Apr 24, 2024

We are running tests for sample application with Helidon 4.x based on Loom virtual threads with JDK 21 in the Kubernetes containers. When running load test with concurrent users we see that application hangs and JVM becomes unresponsive
we do not see any resource issues for CPU, Memory, Heap etc.. The usage are much below the configured capacity.

In JFR, we do seeing large number of carrier thread pinning on the below stack. The pinning is happening when the. org.glassfish.jersey.message.internal.CommittingOutputStream.flushBuffer() is invoking ByteArrayOutputStream.writeTo().

stack snippet below from tracePinnedThreads output:

io.helidon.webserver.observe.tracing.TracingObserver$TracingStreamOutputDelegate.write(TracingObserver.java:525)
java.base/java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:163) <== monitors:1
org.glassfish.jersey.message.internal.CommittingOutputStream.flushBuffer(CommittingOutputStream.java:278) org.glassfish.jersey.message.internal.CommittingOutputStream.commit(CommittingOutputStream.java:232) org.glassfish.jersey.message.internal.CommittingOutputStream.close(CommittingOutputStream.java:247) org.glassfish.jersey.message.internal.OutboundMessageContext.close(OutboundMessageContext.java:865) org.glassfish.jersey.server.ContainerResponse.close(ContainerResponse.java:403)

The recommendation from teh JDK team was to modify the implementation of "org.glassfish.jersey.message.internal.CommittingOutputStream.flushBuffer()" to avoid calling the synchronised method. ByteArrayOutputStream.writeTo()

Based on the suggestion, the below code change was tried out, in the implementation of CommittingOutputStream.flushBuffer()

private void flushBuffer(boolean endOfStream) throws IOException {
.........
............
if (buffer != null && adaptedOutput != null) {
//buffer.writeTo(adaptedOutput);
adaptedOutput.write(buffer.toByteArray());
}
}
}

With this change, we no longer see the carrier thread pinning on this stack.
The ask is to see if this code change can be implemented in Jersey.

@jansupol
Copy link
Contributor

At first glance, adaptedOutput.write(buffer.toByteArray()); seems worse performance-wise as it is synchronized too and it creates a copy of the data buffer for each call. We should measure the performance and have the change for JDK 21+ only, if worse on JDK [8,20)

@vasanth-bhat
Copy link
Author

vasanth-bhat commented Apr 25, 2024

At first glance, adaptedOutput.write(buffer.toByteArray()); seems worse performance-wise as it is synchronized too

Yes, BufferedOutputStream.toByteArray() is synchronised , but it does not lead to the specific issue of carrier thread pinning, as there is no IO happening from with this sync block.

https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/java.base/share/classes/java/io/ByteArrayOutputStream.java

public synchronized byte[] toByteArray() {
return Arrays.copyOf(buf, count);
}

yes, It does make Array copy .

Looking further at the. stack snippet. in this scenario, the concrete type of "adaptedOutput" is of. type. "TracingObserver$TracingStreamOutputDelegate" .

https://github.com/helidon-io/helidon/blob/e93ee0c0c9b8762212a7f07cb473fec6bc52d827/webserver/observe/tracing/src/main/java/io/helidon/webserver/observe/tracing/TracingObserver.java#L490

Further, The BufferedOutputStream.write() is synchronized only when it is subclassed and not used directly.
https://github.com/openjdk/jdk/blob/890adb6410dab4606a4f26a942aed02fb2f55387/src/java.base/share/classes/java/io/BufferedOutputStream.java#L87

Stack Snippet

io.helidon.webserver.http1.Http1ServerResponse$BlockingOutputStream.write(Http1ServerResponse.java:585) io.helidon.webserver.http1.Http1ServerResponse$BlockingOutputStream.write(Http1ServerResponse.java:470) java.base/[java.io](http://java.io/).BufferedOutputStream.implWrite(BufferedOutputStream.java:217) java.base/[java.io](http://java.io/).BufferedOutputStream.write(BufferedOutputStream.java:200) io.helidon.webserver.http1.Http1ServerResponse$ClosingBufferedOutputStream.write(Http1ServerResponse.java:749) io.helidon.webserver.observe.tracing.TracingObserver$TracingStreamOutputDelegate.write(TracingObserver.java:525) java.base/[java.io](http://java.io/).ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:163) <== monitors:1 org.glassfish.jersey.message.internal.CommittingOutputStream.flushBuffer(CommittingOutputStream.java:278) org.glassfish.jersey.message.internal.CommittingOutputStream.commit(CommittingOutputStream.java:232)

@vasanth-bhat vasanth-bhat changed the title Carrier Thread pinning on stack Carrier Thread pinning on stack CommittingOutputStream.flushBuffer() Apr 25, 2024
@jansupol
Copy link
Contributor

jansupol commented Apr 25, 2024

Interesting test results, for buffer=ByteArrayOutputStream, adaptedOutput=FileOutputStream, 32MB written:

|    JDK     |     Actual  Method    |      Proposed Method      |
| -------------------------------------------------------------- |  
|     8      |        90ms           |              98ms         |
|     11     |        35ms           |              40ms         | 
|     17     |        45ms           |              50ms         |
|     21     |        43ms           |              48ms         | 

It looks like the proposed method is about 10% slowlier, no virtual threads. So we should have this just for virtual threads.

@jansupol
Copy link
Contributor

Perhaps a better solution is to have a ByteArrayOutputStream without synchronized blocks, with reentrant locks instead to avoid data copying.

@jansupol
Copy link
Contributor

I have copied the ByteArrayOutputStream into a new file and replaced the synchronized with the ReentrantLock. The synchronized blocks are 10x faster.

@jansupol
Copy link
Contributor

#5629 Is for 2.x and 3.0, we should check for Thread#isVirtual in 3.1 and forward.

@jansupol
Copy link
Contributor

Keep open for 3.1

@jansupol jansupol reopened this Apr 30, 2024
@jansupol jansupol linked a pull request May 1, 2024 that will close this issue
@vasanth-bhat
Copy link
Author

#5632 fixes the issue in 2.4.3.

Helidon 4.x. uses jersey 3.1.5
https://github.com/helidon-io/helidon/blob/main/dependencies/pom.xml
<version.lib.jersey>3.1.5</version.lib.jersey>

Which version of 3.1 would have this fix for Helidon to uptake?

@jansupol
Copy link
Contributor

jansupol commented May 8, 2024

The next one, 3.1.7

@vasanth-bhat
Copy link
Author

Has this change has been delivered into in 3.1.7?

@jansupol
Copy link
Contributor

jansupol commented Jul 4, 2024

Yes

@vasanth-bhat
Copy link
Author

Thank you @jansupol

@vasanth-bhat
Copy link
Author

We have verified this fix, we no longer see pinning on this stack , post fix

@jansupol
Copy link
Contributor

@vasanth-bhat
Great news, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants