server,discovery: fetch as many Os as possible during discovery before a shorter timeout #1874

kyriediculous · 2021-05-05T12:26:24Z

What does this pull request do? Explain your changes. (required)
This PR changes discovery to take into account all possible orchestrators as long as they respond to a discovery request within 1 second. Previously we would stop gathering responses once we received a small subset of responses.

Specific updates (required)

change numOrchs to be equal to the Orchestrator pool size
Decreased the timeout to 1 second
Changed the suspension time to poolSize/numSessions (not sure if this is better but at least this avoids poolSize/numOrchs which would always be 1)
Changed the cut-off point at which we do a refresh to be at less than 20% of the original selected set.

How did you test each of these updates (required)
https://github.com/livepeer/internal-project-tracking/issues/124

Does this pull request close any open issues?
Fixes #1853

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
README and other documentation updated
Pending changelog updated

yondonfu

A few minor comments, but largely looks good and just missing a changelog entry.

As the next step, let's analyze the following things (the ability to fetch work distribution data should be useful) mentioned in #1853 in tests on the network:

We should experiment with this and observe the following data points:

Distribution of work before and after these changes

Frequency of session list refreshes

Impact of a fixed 1 second delay of discovery on the first segment submission of a stream

server/broadcast.go

discovery/discovery.go

server/broadcast.go

yondonfu · 2021-07-15T14:39:06Z

discovery/discovery.go

@@ -17,7 +17,9 @@ import (
 	"github.com/golang/glog"
 )

-var getOrchestratorsTimeoutLoop = 3 * time.Second
+const MinWorkingSetSize = 8


Looks like we're introducing a min working set size here to determine when to pop Os off the suspension queue. I think an alternative that would avoid explicitly introducing a min working set size variable/function is the following:

Make the following update to the discovery loop:

// This loop runs until either all orchestrators have responded or a timeout for i := 0; i < numAvailableOrchs && !timeout; i++ { // ... }

Then, only the caller of GetOrchestrators() needs to be aware of the min working set size (which is the case right now outside of this PR) instead of making the GetOrchestrators() function aware of a min working set size variable/function.

The end result should be the same as what is currently implemented in this PR. The caller specifies the min working set size via numOrchestrators and GetOrchestrators() will fetch as many responses as possible from all Os, but if a bunch of Os are suspended such that the number of non-suspended Os is less than numOrchestrators, we'll pop Os off the suspension queue. The caller would be BroadcastSessionsManager which sets its numOrchs field to the min working set size which is passes as numOrchestrators in the GetOrchestrators() call.

WDYT?

What you mention makes sense, the main reason I went with this (and I'm open to alternatives) is because if we select as many O's as possible e.g. 50 , and 5 don't respond. We'll still have a working set of 45 which seems sufficient enough to not have to pop on suspended O's until there's 50 O's again. It might make more sense to do that only when the working set would otherwise get too small.

Right. So if numOrchestrators reflected a min working set size that is smaller than 50 (i.e. 8), which is the # of available Os in this scenario, defined in BroadcastSessionsManager then we wouldn't pop any Os from the suspension queue in the scenario that you described.

…covery

yondonfu · 2021-12-30T15:51:51Z

A variant of this was merged to master previously so closing.

kyriediculous changed the title ~~Fetch as many Os as possible during discovery before a shorter timeout~~ server:discovery: fetch as many Os as possible during discovery before a shorter timeout May 5, 2021

kyriediculous force-pushed the nv/alt-discovery branch 2 times, most recently from 9b9bb7e to 09160bc Compare May 5, 2021 13:37

yondonfu requested changes May 5, 2021

View reviewed changes

server/broadcast.go Outdated Show resolved Hide resolved

server/broadcast.go Outdated Show resolved Hide resolved

kyriediculous force-pushed the nv/alt-discovery branch from 09160bc to af40851 Compare May 6, 2021 12:11

kyriediculous changed the title ~~server:discovery: fetch as many Os as possible during discovery before a shorter timeout~~ server,discovery: fetch as many Os as possible during discovery before a shorter timeout May 12, 2021

yondonfu force-pushed the nv/alt-discovery branch from af40851 to 4166103 Compare May 18, 2021 19:37

kyriediculous force-pushed the nv/alt-discovery branch from 4166103 to b8c39af Compare June 11, 2021 13:25

kyriediculous force-pushed the nv/alt-discovery branch 2 times, most recently from bf85e75 to 0404d88 Compare June 26, 2021 10:30

kyriediculous force-pushed the nv/alt-discovery branch from 0404d88 to bd73d95 Compare July 13, 2021 23:32

kyriediculous requested a review from yondonfu July 14, 2021 17:09

yondonfu requested changes Jul 15, 2021

View reviewed changes

kyriediculous force-pushed the nv/alt-discovery branch from 2aeb24a to 04978d3 Compare August 9, 2021 17:10

yondonfu force-pushed the nv/alt-discovery branch 2 times, most recently from 3f4f06e to cbb762c Compare September 10, 2021 14:31

kyriediculous added 2 commits September 28, 2021 16:46

server, discovery: fetch as many orchestrators as possible during dis…

bcb8165

…covery

discovery: adjust cutoff time for discovery requests to 250ms

1021774

yondonfu force-pushed the nv/alt-discovery branch from cbb762c to 1021774 Compare September 28, 2021 20:47

yondonfu closed this Dec 30, 2021

hjpotter92 deleted the nv/alt-discovery branch September 26, 2022 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server,discovery: fetch as many Os as possible during discovery before a shorter timeout #1874

server,discovery: fetch as many Os as possible during discovery before a shorter timeout #1874

kyriediculous commented May 5, 2021 •

edited

Loading

yondonfu left a comment

yondonfu Jul 15, 2021

kyriediculous Jul 15, 2021 •

edited

Loading

yondonfu Jul 15, 2021 •

edited

Loading

yondonfu commented Dec 30, 2021

server,discovery: fetch as many Os as possible during discovery before a shorter timeout #1874

server,discovery: fetch as many Os as possible during discovery before a shorter timeout #1874

Conversation

kyriediculous commented May 5, 2021 • edited Loading

yondonfu left a comment

Choose a reason for hiding this comment

yondonfu Jul 15, 2021

Choose a reason for hiding this comment

kyriediculous Jul 15, 2021 • edited Loading

Choose a reason for hiding this comment

yondonfu Jul 15, 2021 • edited Loading

Choose a reason for hiding this comment

yondonfu commented Dec 30, 2021

kyriediculous commented May 5, 2021 •

edited

Loading

kyriediculous Jul 15, 2021 •

edited

Loading

yondonfu Jul 15, 2021 •

edited

Loading