Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Fix timeout for stream recording background jobs #2573

Merged
merged 3 commits into from
Aug 31, 2022

Conversation

victorges
Copy link
Member

@victorges victorges commented Aug 31, 2022

What does this pull request do? Explain your changes. (required)
We noticed a couple errors that happen when saving transcoded segments for livestream recordings.
These occur especially in staging for some reason, but the underlying cause can easily happen in
production as well (and it often does).

In short terms, the issue is caused because saving the recorded segments is performed in a background
routine, while at the same time this routine uses the same context.Context to handle cancellation of the
task behind the scenes. This means that one the transcode request is finished (which is where the current
ctx comes from, the http.Request), the background saving operations will also be cancelled.

This fixes it by enforcing a separate timeout on the background saves, independent of the request.

Specific updates (required)

  • Use a separate timeout context for the segment recording operations
  • Increase the number of saving attempts to 3 instead of 2

How did you test each of these updates (required)
Automated tests. Pretty simple change as well, will check effect in staging on catalyst tests.

Does this pull request close any open issues?
livepeer/catalyst#126

Checklist:

The retries last arg to SaveRetried is actually
the number of attempts, not the number of retries.
So to try saving 3 times, increased it to 3.
@victorges victorges requested review from yondonfu, leszko, emranemran and iameli and removed request for yondonfu August 31, 2022 15:34
@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Merging #2573 (7284abf) into master (a0ec61f) will increase coverage by 0.01618%.
The diff coverage is 66.66667%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #2573         +/-   ##
===================================================
+ Coverage   56.41379%   56.42997%   +0.01617%     
===================================================
  Files             88          88                 
  Lines          18850       18857          +7     
===================================================
+ Hits           10634       10641          +7     
  Misses          7637        7637                 
  Partials         579         579                 
Impacted Files Coverage Δ
clog/clog.go 56.00000% <0.00000%> (-1.37705%) ⬇️
server/broadcast.go 77.95414% <100.00000%> (+0.07802%) ⬆️
discovery/db_discovery.go 70.75812% <0.00000%> (+1.08303%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0ec61f...7284abf. Read the comment docs.

Impacted Files Coverage Δ
clog/clog.go 56.00000% <0.00000%> (-1.37705%) ⬇️
server/broadcast.go 77.95414% <100.00000%> (+0.07802%) ⬆️
discovery/db_discovery.go 70.75812% <0.00000%> (+1.08303%) ⬆️

@victorges victorges merged commit 6bf2087 into master Aug 31, 2022
@victorges victorges deleted the vg/fix/recording-timeouts branch August 31, 2022 18:54
oscar-davids pushed a commit that referenced this pull request Sep 14, 2022
* server: Use separate timeout for recording segments

* server: Increase segment save attempts to 3

The retries last arg to SaveRetried is actually
the number of attempts, not the number of retries.
So to try saving 3 times, increased it to 3.

* CHANGELOG
@cyberj0g cyberj0g mentioned this pull request Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants