Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: set TCP_USER_TIMEOUT socket option for linux #2307

Merged
merged 10 commits into from
Nov 5, 2018

Conversation

mikeraimondi
Copy link
Contributor

@mikeraimondi mikeraimondi commented Sep 14, 2018

Implements proposal A18. Not yet manually tested.

gRPC Core issue for reference: grpc/grpc#15889

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. This approach does seem pretty reasonable to me. One comment in-line.

@@ -252,6 +253,9 @@ func newHTTP2Client(connectCtx, ctx context.Context, addr TargetInfo, opts Conne
t.channelzID = channelz.RegisterNormalSocket(t, opts.ChannelzParentID, "")
}
if t.kp.Time != infinity {
if err = syscall.SetTCPUserTimeout(t.conn, kp.Timeout); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be done before handshaking ~line 170, otherwise you won't have a TCPConn anymore.

Also, you need to only do this for TCP connections. The dialer could return a net.Conn backed by a UDS instead.

SetTCPUserTimeout is predicated upon keepaliveEnabled. SetTCPUserTimeout requires access to the underlying TCPConn and must be called before handshaking.
@dfawley dfawley self-assigned this Sep 27, 2018
@dfawley
Copy link
Member

dfawley commented Oct 12, 2018

@mikeraimondi

Sorry for the delay here. This implementation LGTM; could you please rebase and resolve any conflicts, and then see if there's a reasonable way to test this? Maybe something in internal/transport/transport_test.go that creates a transport and then reads the socket options back to make sure it has been set?

net.TCPConn.File() was causing tests to time out.
The docs indicate File() returns a different file descriptor than the connection's.
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some lint errors showing in in CI:

internal/syscall/syscall_linux.go:36:5: error var GetTCPUserTimeoutNoopError should have name of the form ErrFoo
internal/syscall/syscall_linux.go:36:5: exported var GetTCPUserTimeoutNoopError should have comment or be unexported

And this fails in 1.8, which we will be supporting for another 2 weeks:

internal/syscall/syscall_linux.go:82: tcpconn.SyscallConn undefined (type *net.TCPConn has no field or method SyscallConn)
internal/syscall/syscall_linux.go:103: tcpconn.SyscallConn undefined (type *net.TCPConn has no field or method SyscallConn)

If you want to just wait 2 weeks for this, then we will delete 1.8 from CI and this won't need to be fixed. Otherwise, we'll have to deal with more version-specific files.

},
}
for _, tt := range tests {
lis, err := net.Listen("tcp", "localhost:0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should only need to make the listener once, right (i.e. outside the loop)?

You also never call lis.Accept(), which seems like it should be necessary. You would also need to read from the conn on the server side, because the client writes its 24-byte preface before returning (we think this is the cause of a separate test flake we're having).

A negative socket option int should never occur in actual operation.
Returning a nil error is simpler and more consistent with existing functions in
syscall.
@mikeraimondi
Copy link
Contributor Author

No rush on this, so let's wait the 2 weeks for 1.8 support to be dropped. I addressed the linter issues and (I hope) the lack of Accept() and reading on the server side.

@mikeraimondi
Copy link
Contributor Author

mikeraimondi commented Oct 18, 2018

Unsure if the recent CI failure in 1.11 is due to my changes or a pre-existing flake.

@dfawley
Copy link
Member

dfawley commented Oct 18, 2018

@mikeraimondi the 1.11 failure is a known flake (#2380 should fix it).

This recent set of changes LGTM. Hopefully there won't be many conflicts to resolve in 2 weeks when this is good to submit. Thanks again for your contribution!

@dfawley
Copy link
Member

dfawley commented Nov 1, 2018

@mikeraimondi - Go 1.6 and 1.8 have been removed from our CI. Please rebase your branch to master, and I think everything should be good to go. Thanks!

@mikeraimondi
Copy link
Contributor Author

@dfawley all set!

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!!

@dfawley dfawley merged commit d27440d into grpc:master Nov 5, 2018
@dfawley dfawley changed the title Set TCP_USER_TIMEOUT socket option for linux client: set TCP_USER_TIMEOUT socket option for linux Nov 5, 2018
@dfawley dfawley added the Type: Feature New features or improvements in behavior label Nov 9, 2018
@dfawley dfawley added this to the 1.17 Release milestone Nov 9, 2018
@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants