Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Return from transcode loop if seg chan closed #2591

Merged
merged 4 commits into from
Sep 18, 2022

Conversation

yondonfu
Copy link
Member

@yondonfu yondonfu commented Sep 16, 2022

What does this pull request do? Explain your changes. (required)

This PR updates the transcode loop to exit the loop if the segment channel is closed.

a1fb761 introduced the possibility of the segment channel being closed via a EndTranscodingSession RPC call. When that happens, the bug described in this comment can occur.

Specific updates (required)

See commit history.

45bf1e1 contains a small update to avoid logging a nil error if there was no error when B calls EndTranscodingSession.

How did you test each of these updates (required)

  • Added a unit test that checks of the goroutine for the transcode loop is cleaned up after the transcode session is ended
  • Ran manual tests with an offchain B & O mediaserver.httpTimeout set to 30 seconds so B always sends an EndTranscodingSession RPC call before O's transcode loop times out
    • Test 1: Build B & O off master
    • Test 2: Build B & O off this branch
    • After both tests, I ran kill -ABRT <PID> where <PID> was the process ID for O to dump a stack trace of goroutines
    • The stack trace for test 1 showed 1 core.transcodeSegmentLoop() goroutine which indicates that the goroutine was not cleaned up properly after the transcoding session was ended
    • The stack trace for test 2 showed 0 core.transcodeSegmentLoop() goroutines which indiciates that the goroutine was cleaned up properly after the transcoding session was ended

Does this pull request close any open issues?

Fixes #2589

Checklist:

@yondonfu yondonfu force-pushed the yf/fix-transcode-loop-end-session branch from 407fa45 to 101e96a Compare September 18, 2022 01:07
@codecov
Copy link

codecov bot commented Sep 18, 2022

Codecov Report

Merging #2591 (101e96a) into master (5d4e145) will increase coverage by 0.00461%.
The diff coverage is 100.00000%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #2591         +/-   ##
===================================================
+ Coverage   56.44854%   56.45315%   +0.00460%     
===================================================
  Files             88          88                 
  Lines          18888       18890          +2     
===================================================
+ Hits           10662       10664          +2     
  Misses          7648        7648                 
  Partials         578         578                 
Impacted Files Coverage Δ
core/orchestrator.go 77.55961% <100.00000%> (+0.06313%) ⬆️
server/broadcast.go 77.95414% <100.00000%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d4e145...101e96a. Read the comment docs.

Impacted Files Coverage Δ
core/orchestrator.go 77.55961% <100.00000%> (+0.06313%) ⬆️
server/broadcast.go 77.95414% <100.00000%> (ø)

@yondonfu yondonfu marked this pull request as ready for review September 18, 2022 01:17
Copy link
Contributor

@cyberj0g cyberj0g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 how did I miss that. Wonder if some static analysis tool would catch such errors. We definitely need to review PRs with concurrency more thoroughly as well.

Edit: found an existing issue, related to linting #1801

@yondonfu yondonfu merged commit 03f5ab7 into master Sep 18, 2022
@yondonfu yondonfu deleted the yf/fix-transcode-loop-end-session branch September 18, 2022 12:31
@yondonfu
Copy link
Member Author

I think looking into starting to use some static analysis tooling to catch bugs such as missing context cancellations is a good idea. Adding test cases for resource cleanup for changes that could affect goroutine management should help as well (i.e. the one added in this PR would've caught the bug).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

O transcode loop never returns if the transcode session is ended via B
2 participants