-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server,discovery: fetch as many Os as possible during discovery before a shorter timeout #1874
Conversation
9b9bb7e
to
09160bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, but largely looks good and just missing a changelog entry.
As the next step, let's analyze the following things (the ability to fetch work distribution data should be useful) mentioned in #1853 in tests on the network:
We should experiment with this and observe the following data points:
- Distribution of work before and after these changes
- Frequency of session list refreshes
- Impact of a fixed 1 second delay of discovery on the first segment submission of a stream
09160bc
to
af40851
Compare
af40851
to
4166103
Compare
4166103
to
b8c39af
Compare
bf85e75
to
0404d88
Compare
0404d88
to
bd73d95
Compare
@@ -17,7 +17,9 @@ import ( | |||
"github.com/golang/glog" | |||
) | |||
|
|||
var getOrchestratorsTimeoutLoop = 3 * time.Second | |||
const MinWorkingSetSize = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're introducing a min working set size here to determine when to pop Os off the suspension queue. I think an alternative that would avoid explicitly introducing a min working set size variable/function is the following:
Make the following update to the discovery loop:
// This loop runs until either all orchestrators have responded or a timeout
for i := 0; i < numAvailableOrchs && !timeout; i++ {
// ...
}
Then, only the caller of GetOrchestrators()
needs to be aware of the min working set size (which is the case right now outside of this PR) instead of making the GetOrchestrators()
function aware of a min working set size variable/function.
The end result should be the same as what is currently implemented in this PR. The caller specifies the min working set size via numOrchestrators
and GetOrchestrators()
will fetch as many responses as possible from all Os, but if a bunch of Os are suspended such that the number of non-suspended Os is less than numOrchestrators
, we'll pop Os off the suspension queue. The caller would be BroadcastSessionsManager which sets its numOrchs
field to the min working set size which is passes as numOrchestrators
in the GetOrchestrators()
call.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you mention makes sense, the main reason I went with this (and I'm open to alternatives) is because if we select as many O's as possible e.g. 50 , and 5 don't respond. We'll still have a working set of 45 which seems sufficient enough to not have to pop on suspended O's until there's 50 O's again. It might make more sense to do that only when the working set would otherwise get too small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. So if numOrchestrators
reflected a min working set size that is smaller than 50 (i.e. 8), which is the # of available Os in this scenario, defined in BroadcastSessionsManager then we wouldn't pop any Os from the suspension queue in the scenario that you described.
2aeb24a
to
04978d3
Compare
3f4f06e
to
cbb762c
Compare
cbb762c
to
1021774
Compare
A variant of this was merged to master previously so closing. |
What does this pull request do? Explain your changes. (required)
This PR changes discovery to take into account all possible orchestrators as long as they respond to a discovery request within 1 second. Previously we would stop gathering responses once we received a small subset of responses.
Specific updates (required)
numOrchs
to be equal to the Orchestrator pool sizeHow did you test each of these updates (required)
https://github.com/livepeer/internal-project-tracking/issues/124
Does this pull request close any open issues?
Fixes #1853
Checklist:
make
runs successfully./test.sh
pass