Replies: 2 comments 3 replies
-
I believe this behavior is in line with the TCP protocol:
So about 11 mins (75*9/60) after the client stopped responding will the server give up. |
Beta Was this translation helpful? Give feedback.
-
@bzlom After looking over this with the maintainers, we have questions. 🙂 Linkerd doesn't actually send keepalives in the proxy code. What Linkerd does is to request that the underlying TCP stack send keepalives after 10 seconds of idle time -- so Linkerd forces Your wireshark shows pretty much exactly what we'd expect for this: at line 16, we see the first keepalive being sent, after 10s of idle time. Lines 18-26 show 9 keepalives sent at roughly 75-second intervals, then at line 27 we see the TCP reset that closes the connection. (Line 17, where something ACKs the first keepalive on behalf of the client, is strange to me, and makes me wonder exactly what's between the client and this wireshark.) But, overall, what we see here is the TCP stack doing exactly what we'd expect Linkerd to ask it to do. As I said, this leaves us with questions:
Of these, the most important is definitely the first -- is this actually causing you a problem, or is it simply that you're seeing metrics with values that are unexpected? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hello,
EKS Server Version: v1.29.3-eks-adc7111
linkerd version: stable-2.14.10
we're encountering an issue on AWS cloud that we narrowed down to linkerd (Server version:
stable-2.14.10
).We have an EKS environment where all pods are injected with linkerd (default settings). We have a
Client
(on internet) that initiate a TCP connection with aServer
behindlinkerd
(AWS EKS). TheClient
then keeps sending TCP Keep-Alive packets to theServer
and theServer
sends Keep-Alive frames back towards theClient
.If the
Client
closes the connection gracefully theServer
stops sending Keep-Alive frames with no issues in ~1min timeframe. However if theClient
loses Internet connection and stops sending Keep-Alive frames - theServer
will keep on sending Keep-alive frames for another 10 minutes.This issue completely disappears if we remove linkerd from the EKS pod on which the
Server
resides.Does anyone know of any reason for why this is happening?
In the wireshark output below the
red line
before 9:45 indicates when the connection from theclient
(10.5.14.150) was interrupted due to loss of Internet connection to theserver
(10.5.90.0). You can also see the connection is kept alive even though there's no reply from the client in the9:45-9:57
intervalHere's
linkerd check -o short
output below:Beta Was this translation helpful? Give feedback.
All reactions