Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NACK queue mishandling #2401

Closed
matiaspl opened this issue Oct 22, 2020 · 8 comments
Closed

NACK queue mishandling #2401

matiaspl opened this issue Oct 22, 2020 · 8 comments

Comments

@matiaspl
Copy link

Our use case - RTP source with H.264 video and Opus audio, bitrate not exceeding 3 Mbps for video and 100 kbps for audio. Streaming plugin, vanilla Janus from snapcraft.

Code included in #1867 (which introduced dynamic queue handling in February) prevents the NACK queue from being used when it's needed the most: on lossy links with high jitter. When Janus registers a keyframe in the incoming stream it flushes the queue, so it's often unable to serve packets he's asked for by the clients through RTCP.

There's slightly more detail in my comment PR #1867 (comment). The change was merged into master at the time 0.8.2 came out.

Using the same RTP source with stream plugin on two different versions of Janus.

  • 0.7.6 - visible peaks, but the counter of the lost packets decreases when it gets NACKed packets from the buffer:

packets_lost_0 7 6

  • 0.8.2 (tested against master with the same results) - lack of incoming NACKed packets, lost packets counter increases monotonically

packets_lost_0 8 2

@lminiero
Copy link
Member

When Janus registers a keyframe in the incoming stream it flushes the queue, so it's often unable to serve packets he's asked for by the clients through RTCP.

As I explained on the group, that's not a bug: it's done on purpose, and many WebRTC implementations do the same. It's only an issue if you have keyframes that are too frequent, and so attempts to recover keyframe N-1 fail because in the meanwhile you already sent keyframe N. Any attempt to send keyframes more often than once every 5s or so in WebRTC is doomed to fail, or be incredibly frail at the very least.

@lminiero
Copy link
Member

Update: as discussed on the group, some considerations changed my mind on some of the things I'm doing right now in the code. At this point, I think it might make sense to either enable/disable this feature by default, but allow developers to change this either globally (configuration property in janus.jcfg that impacts all PeerConnections) or per-connection (maybe a Janus API property that allows each PC to act differently, depending on what it's needed for). I still think it's a needed optimization in many cases, but I agree it might backfire in some cases (like yours). Notice that you should still reconsider the 2-3s keyyframe you're doing right now: even with NACKs working differently, the media delivery will still not be robust at all.

@lminiero
Copy link
Member

Please test and provide feedback.

@matiaspl
Copy link
Author

matiaspl commented Oct 22, 2020

Thanks for your super fast response, we're going to test it right away.

As for your comments - I would agree with you if the NACK queue was only used for storing keyframes, but I don't think it does or it even should. Any frame can get lost and in modern codecs (VP9, higher level H.264, AV1) you often have other types of reference frames that are needed for decoding other frames. Not to mention lost audio frames. Recovering just the keyframe doesn't do you any good.

NACK queue - at least to me - should be content-agnostic. Client looses a RTP packet (be it video, audio, data), we know about it almost instantly, we have it in the NACK queue (because all packets are for a certain amount of time), we send it again. Done, everyone's happy. And this works even if there's no closed feedback loop for forcing keyframe insertion by the sender. I don't see a reason for it to work any other way, but that's just me :)

If you say that there are other implementations of WebRTC doing the same thing... well they are doing it wrong :)
Let's have it the right way in Janus!

As for keyframe distance - imagine a use case where someone is saving horsepower and just copies the elementary streams he has without transcoding. So there are no means to change the keyframe distance. Not a usual use case, but there's no reason why it shouldn't work.

I know low keyframe intervals are suboptimal in terms of bandwidth allocation, but in terms of data delivery the transport backend should work as reliably as possible.

@lminiero
Copy link
Member

Recovering just the keyframe doesn't do you any good.

That's not what we do, and it's not what I said we do. We store all packets for retransmission. What we currently do (and the PR changes) is clearing that list of stored packets after we send a keyframe, as the idea was that if you asked for a retransmission of an intermediate packet because you couldn't decode the video, the keyframe you're about to receive will give you a new working image to start from, and so retransmitting the packet for the previous intermediate packet can be seen as a waste of bandwidth.

I know low keyframe intervals are suboptimal in terms of bandwidth allocation, but in terms of data delivery the transport backend should work as reliably as possible.

I think you misunderstand the main aim of WebRTC, which isn't reliability but real-timeness.

@matiaspl
Copy link
Author

matiaspl commented Oct 27, 2020

In our case it's not a waste of bandwidth - the NACKed packets never leave the server. RTP sequence discontinuities are just 'transmitted' from server to the clients and no NACK retransmissions take place (we quadriple-checked it with wireshark). Plus if we lost enough packets from the intra frames even they could not be recovered (which meant full GOPs being lost). I talked about broken NACK queue logic in the first place because it didn't seem to work at all with our use case - see the graphs from my first post.

@lminiero
Copy link
Member

@matiaspl did you have a chance to test the PR? We're wondering if it can be merged.

@lminiero
Copy link
Member

I'll take that as a yes: PR merged so I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants