Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,discovery: fetch as many Os as possible during discovery before a shorter timeout #1874

Closed
wants to merge 2 commits into from

Conversation

kyriediculous
Copy link
Contributor

@kyriediculous kyriediculous commented May 5, 2021

What does this pull request do? Explain your changes. (required)
This PR changes discovery to take into account all possible orchestrators as long as they respond to a discovery request within 1 second. Previously we would stop gathering responses once we received a small subset of responses.

Specific updates (required)

  • change numOrchs to be equal to the Orchestrator pool size
  • Decreased the timeout to 1 second
  • Changed the suspension time to poolSize/numSessions (not sure if this is better but at least this avoids poolSize/numOrchs which would always be 1)
  • Changed the cut-off point at which we do a refresh to be at less than 20% of the original selected set.

How did you test each of these updates (required)
https://github.com/livepeer/internal-project-tracking/issues/124

Does this pull request close any open issues?
Fixes #1853

Checklist:

@kyriediculous kyriediculous changed the title Fetch as many Os as possible during discovery before a shorter timeout server:discovery: fetch as many Os as possible during discovery before a shorter timeout May 5, 2021
@kyriediculous kyriediculous force-pushed the nv/alt-discovery branch 2 times, most recently from 9b9bb7e to 09160bc Compare May 5, 2021 13:37
Copy link
Member

@yondonfu yondonfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but largely looks good and just missing a changelog entry.

As the next step, let's analyze the following things (the ability to fetch work distribution data should be useful) mentioned in #1853 in tests on the network:

We should experiment with this and observe the following data points:

  • Distribution of work before and after these changes
  • Frequency of session list refreshes
  • Impact of a fixed 1 second delay of discovery on the first segment submission of a stream

server/broadcast.go Outdated Show resolved Hide resolved
server/broadcast.go Outdated Show resolved Hide resolved
@kyriediculous kyriediculous changed the title server:discovery: fetch as many Os as possible during discovery before a shorter timeout server,discovery: fetch as many Os as possible during discovery before a shorter timeout May 12, 2021
@kyriediculous kyriediculous force-pushed the nv/alt-discovery branch 2 times, most recently from bf85e75 to 0404d88 Compare June 26, 2021 10:30
discovery/discovery.go Outdated Show resolved Hide resolved
server/broadcast.go Outdated Show resolved Hide resolved
@@ -17,7 +17,9 @@ import (
"github.com/golang/glog"
)

var getOrchestratorsTimeoutLoop = 3 * time.Second
const MinWorkingSetSize = 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're introducing a min working set size here to determine when to pop Os off the suspension queue. I think an alternative that would avoid explicitly introducing a min working set size variable/function is the following:


Make the following update to the discovery loop:

// This loop runs until either all orchestrators have responded or a timeout
for i := 0; i < numAvailableOrchs && !timeout; i++ {
    // ...
}

Then, only the caller of GetOrchestrators() needs to be aware of the min working set size (which is the case right now outside of this PR) instead of making the GetOrchestrators() function aware of a min working set size variable/function.

The end result should be the same as what is currently implemented in this PR. The caller specifies the min working set size via numOrchestrators and GetOrchestrators() will fetch as many responses as possible from all Os, but if a bunch of Os are suspended such that the number of non-suspended Os is less than numOrchestrators, we'll pop Os off the suspension queue. The caller would be BroadcastSessionsManager which sets its numOrchs field to the min working set size which is passes as numOrchestrators in the GetOrchestrators() call.

WDYT?

Copy link
Contributor Author

@kyriediculous kyriediculous Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you mention makes sense, the main reason I went with this (and I'm open to alternatives) is because if we select as many O's as possible e.g. 50 , and 5 don't respond. We'll still have a working set of 45 which seems sufficient enough to not have to pop on suspended O's until there's 50 O's again. It might make more sense to do that only when the working set would otherwise get too small.

Copy link
Member

@yondonfu yondonfu Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So if numOrchestrators reflected a min working set size that is smaller than 50 (i.e. 8), which is the # of available Os in this scenario, defined in BroadcastSessionsManager then we wouldn't pop any Os from the suspension queue in the scenario that you described.

@yondonfu
Copy link
Member

A variant of this was merged to master previously so closing.

@yondonfu yondonfu closed this Dec 30, 2021
@hjpotter92 hjpotter92 deleted the nv/alt-discovery branch September 26, 2022 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fetch as many Os as possible during discovery before a shorter timeout
2 participants