Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing leaking go routines #850

Closed
wants to merge 7 commits into from
Closed

Conversation

guillaumemichel
Copy link
Contributor

addresses #849

depends on libp2p/go-libp2p-kbucket#120

linked: ipfs/kubo#9814

@guillaumemichel guillaumemichel marked this pull request as ready for review June 14, 2023 08:23
@guillaumemichel guillaumemichel requested a review from a team as a code owner June 14, 2023 08:23
@guillaumemichel
Copy link
Contributor Author

Some tests are failing randomly, and it is a hell to debug

go.mod Outdated Show resolved Hide resolved
Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test would be nice but this code looks good.
I would base and merge on top of v0.24.0 directly into a v0.24.1 and then merge the release branch back into master.

dht.go Outdated Show resolved Hide resolved
dht.go Outdated Show resolved Hide resolved
@Jorropo
Copy link
Contributor

Jorropo commented Jun 14, 2023

Note the tests timeout.

Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you havn't updated d373974#r117933266

I'm not saying it's broken just every-time someone connects to you and do a query you very quickly start 2 ping attempts, one from the subscribee one from the handler.
This is wastefull

@guillaumemichel
Copy link
Contributor Author

Some tests seem to enter randomly an infinite loop, and some seem to always enter an infinite loop, resulting in the tests timing out.

This isn't the case without the new go routine in peerFound introduced in this PR. I don't understand the root cause, and it is hard to debug because of the nature of the tests.

go-libp2p-kad-dht/dht.go

Lines 669 to 713 in 43e8891

func (dht *IpfsDHT) peerFound(ctx context.Context, p peer.ID) {
// if the peer is already in the routing table or the appropriate bucket is
// already full, don't try to add the new peer.ID
if !dht.routingTable.UsefulNewPeer(p) {
return
}
// verify whether the remote peer advertises the right dht protocol
b, err := dht.validRTPeer(p)
if err != nil {
logger.Errorw("failed to validate if peer is a DHT peer", "peer", p, "error", err)
} else if b {
// check if the maximal number of concurrent lookup checks is reached
dht.lookupChecksLk.Lock()
if dht.concurrentLookupChecks >= dht.lookupCheckConcurrency {
dht.lookupChecksLk.Unlock()
// drop the new peer.ID if the maximal number of concurrent lookup
// checks is reached
return
}
dht.concurrentLookupChecks++
dht.lookupChecksLk.Unlock()
go func() {
livelinessCtx, cancel := context.WithTimeout(ctx, dht.lookupCheckTimeout)
defer cancel()
// performing a FIND_NODE query
err := dht.lookupCheck(livelinessCtx, p)
dht.lookupChecksLk.Lock()
dht.concurrentLookupChecks--
dht.lookupChecksLk.Unlock()
if err != nil {
logger.Debugw("connected peer not answering DHT request as expected", "peer", p, "error", err)
return
}
// if the FIND_NODE succeeded, the peer is considered as valid
dht.validPeerFound(ctx, p)
}()
}
}

Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider testing for that behaviour else that fine LGTM.

@BigLep
Copy link

BigLep commented Jun 14, 2023

I'm not up on the specifics here, but want to understand the testing situation. Do we have regression tests in place for the fix? Is this change causing other tests to fail? I know we want to get this out, but I'd like to understand what it takes to do it right.

@guillaumemichel
Copy link
Contributor Author

@Jorropo and I debugged the tests, but were not able to get them to work yet. It seems unrelated to the new code, but it wouldn't be wise to push this change. We also uncovered new bugs. More debugging is required to fix the broken tests.

If Kubo has to be released shortly, I would recommend reverting this change and using go-libp2p-kad-dht v0.23.0, and the fix will (hopefully) make it to the next release.

@BigLep
Copy link

BigLep commented Jun 14, 2023

Thanks for the update and work @guillaumemichel and @Jorropo .

Per 2023-06-14 verbal with @Jorropo , we agreed that we won't block the Kubo release on this. It can come in a followup Kubo release.

@BigLep
Copy link

BigLep commented Jun 14, 2023

@Jorropo : to be 100% clear for anyone watching, I assume we'll revert the change and do a patch release (rather than recommend consumers use the previous version)? (The functionality would be restored and include a fix as a followup patch release.)

@Jorropo
Copy link
Contributor

Jorropo commented Jun 14, 2023

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants