Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sctp rate control params to RTCPeerConnection constructor #71

Closed
daonb opened this issue Mar 9, 2021 · 27 comments
Closed

Add sctp rate control params to RTCPeerConnection constructor #71

daonb opened this issue Mar 9, 2021 · 27 comments
Assignees

Comments

@daonb
Copy link

daonb commented Mar 9, 2021

Suggested extension

Add sctp rate control parameters: RTOMax,RTOMin & RTOInitial to RTCPeerConnection.

Use Case

I am developing a terminal app that uses data channels to connect to a remote shell. While my hosted server is local with RTT of ~15ms my home network is slow - 30Mbps download, 3Mbps upload - and congested. I'm sharing it with my two daughters who were in lockdown and they were both using zoom.

Still, the terminal performed very poorly. I opened an issue with pion - pion/sctp#181 - and the discussion there taught me SCTP retransmit timeout is backing off exponentially with a cap of 60 seconds. For my app this is unacceptable. I'm hitting a key and I expect to see it on the terminal in real time. If it fails, I want to quickly retransmit, 2-3 seconds max.

SCTP's RFC provides a way to limit the retransmission timeout (RTO):

c7) A maximum value may be placed on RTO provided it is at least RTO.max seconds

In WebRTC RTOMax is a constant with a recommendation to set it at 60 seconds.

@lgrahl
Copy link

lgrahl commented Mar 10, 2021

Seems legitimate. Since 1.0 is pretty much cast in stone, I suggest opening an issue with the same description in https://github.com/w3c/webrtc-extensions/issues

@lgrahl
Copy link

lgrahl commented Mar 10, 2021

Then again, I don't think it's possible to set RTOs per data channel without adding additional application-level (in this case, the WebRTC implementation) logic in order to achieve this. Min and max RTO can only be set per SCTP association while partial reliability (whether to retransmit, how often and TTL) is possible per message (WebRTC ties it to the data channel though).

So, while this could be added to the spec on peer connection level, it's unlikely going to be a solution for a use case that requires different RTOs for different data channels on the same peer connection.

What you can already do today though is set maxRetransmits to 0 and apply retransmission logic yourselves. In your use case, that might even bring the best user experience since the level of control is much higher.

@dontcallmedom dontcallmedom transferred this issue from w3c/webrtc-pc Mar 10, 2021
@dontcallmedom
Copy link
Member

I've transferred the issue to webrtc-extensions, no need to reopen it there

(and to be fair, I think we as a Working Group need to come to greater clarity on where issues on WebRTC needs to be filed, -pc vs -extensions vs -nv-use-cases)

@daonb
Copy link
Author

daonb commented Mar 11, 2021

@lgrahl I didn't realize all data channels share the same SCTP channel. It makes perfect sense and IMO we shouldn't complicate and support multiple SCTP channels for the same peer connection. As this is the case I prefer to change this issue to be about adding RTOMax to peer connection.

Once we have, it's going to be easier for me to set the RTOMax for real time performance (2-3 seconds) and let SCTP manage the rate. For file transfers I can use data channel's bufferedAmountLowThreshold to ensure I'm not overflowing the buffer.

While adding RTOMax to the peer connection will solve my problem I think we can do better. The best solution will be to change the SCTP RFC which does a poor job of defining the retransmission timeout:

The computation and management of RTO in SCTP follow closely how TCP manages its retransmission timer.

When TCP was developed in the early '80s edge router congestion was the nut to crack. The few who had access to the internet were transferring files and succesfully completing a transfer was all that matter. You would normally run a large transfer overnight an praying for success. Performance was slow, very slow but we were all too impressed with the magic of transferring a file over the net to care.

The Exponential backoff retransmission timer has no place in real time communications. I'm not sure where to suggest such a change so if anyone can point me in the right directions I'll take it with the IETF.

The next best thing will be to change WebRTC's recommendation for the default RTOMax to 3 seconds from 60 seconds today. I imagine many more users on congested network are suffering from data channels freezes. When congestion happens, the users of this millennium never wait 60 seconds for the app to recover.

@lgrahl
Copy link

lgrahl commented Mar 11, 2021

SCTP folks should be in the Transport Area Working Group, so I'd suggest starting with their mailing list. All browsers I know of use usrsctp. @tuexen is active in both that working group and the usrsctp project, if you're looking for a contact person.

@tuexen
Copy link

tuexen commented Mar 11, 2021

Only controlling RTO.Max isn't very useful. If you want to control the retransmission timer value range, you need to make RTO.Min, RTO.Max, RTO.Initial user controllable. The constraint is RTO.Min <= RTO.Initial <= RTO.Max. Most SCTP implementations including usrsctp allow these parameters to be set per SCTP association. This would correspond to a peer connection. Some care is needed. The configuration of these parameters on the one side needs to be in tune with the SACK.Delay parameter configured at the peer.

@daonb
Copy link
Author

daonb commented Mar 13, 2021

Hi @tuexen!

The more I think about this issues, the more I realize settings RTOMax is a hack. I came up with it after reading pion/webrtc code and realizing it will be easy to add it to the settings of pion/webrtc.
But here we're talking about the standard and its client implementation so a hack is not a good idea. Exposing RTO.Min, RTO.Max & RTO.Initial is a much better idea as it gives the the user more control. It will help me but it doesn't deal with the root cause of the problem and won't help many other users who share my pain.
I'm sure there are other users using other WebRTC apps who feel my pain. If you're one of the unlucky ones, using a thin congested pipe, SCTP serves you poorly as retransmissions are very slow. The Retransmission computation and management SCTP uses were designed it the age when there was no real time communication - all traffic was file transfer. The WAN was a mesh of analog modems who's rate measured in bauds. The design goal of the RTO was to minimize the risk of modem overflow as they crashed often so exponential back off was a great idea.
IMO, we should refactor SCTP retransmission to focus on fast recovery and stop worrying about the modem's buffer. Guess I'll now head over to the Transport Area Working Group and see where the task force stand.

@tuexen
Copy link

tuexen commented Mar 14, 2021

So are you looking for improvements in the loss recovery, like using RACK, or are you looking at more aggressive congestion controls?

@daonb
Copy link
Author

daonb commented Mar 23, 2021

@tuexen The way I see it, sctp is a protocol for control messages which are very, very small. In WebRTC, data channels typical bandwidth share is less than 1% so congestion shouldn't be a concern and sctp should retransmit often without worrying about making a dent in the overall throughput.

In the extreme cases where apps use data channel not as intended, and transfer large amount of data, like files, they should be careful to use a small buffer size to avoid congestion.

@tuexen
Copy link

tuexen commented Mar 23, 2021

Providing a congestion control for the data transferred via data channels is a requirement, see Req. 3. If you have a suggestion for an improved loss recovery mechanism, I suggest that you write up a proposal and submit it as an internet draft.

@alvestrand
Copy link
Collaborator

tagging @Orphis since this is about datachannels and SCTP

@aboba aboba self-assigned this Jun 17, 2021
@daonb daonb changed the title Add RTOMax to RTCDataChannelInit Add sctp rate control to RTCPeerConnection constructor Jun 19, 2021
@daonb daonb changed the title Add sctp rate control to RTCPeerConnection constructor Add sctp rate control params to RTCPeerConnection constructor Jun 19, 2021
@daonb
Copy link
Author

daonb commented Jun 19, 2021

hi @aboba, thanks for taking this on. I've updated the title and description based on the discussion here.

@aboba
Copy link
Contributor

aboba commented Jan 6, 2022

There does seem to be a legitimate need here, but it is not obvious to me how to address this within the WebRTC 1.0 API.

It is well documented that the current SCTP congestion control algorithm (New Reno) is problematic in latency sensitive applications. A similar issue has been opened against WebTransport API (BBRv1 or v2 has the same problem): Issue 365

To get around this problem, cloud-based applications routinely customize SCTP congestion control algorithms to enable scenarios such as cloud gaming (e.g. replacing New Reno with an algorithm such as SCREAM, RFC 8298). However, this only works in scenarios where the data flows from the cloud peer (e.g. the server) to a browser, not in the other direction.

Since congestion control operates on an SCTP association, not on an individual data channel, it would not make sense to add parameters to RTCDataChannelInit. In ORTC, parameters can be passed to the RTCSctpTransport constructor to enable customization of SCTP congestion control behavior. However, the WebRTC 1.0 API "vends" RTCSctpTransport objects, rather than allowing them to be constructed. Changing the congestion control algorithm on a live SCTP association seems unnecesarily complex, so an RTCSctpTransport.setParameters() method doesn't make much sense.

@daonb
Copy link
Author

daonb commented Jan 8, 2022

It seems we all agree SCTP congestion control needs to improve. It also looks like researchers are still checking alternatives and more users are feeling the pain.

When I started this issue I wanted control per data channel, today I understand it's very hard to implement and PeerConnection params are good enough for me and most apps. I like the idea of adding method such as RTCSctpTransport.setRTO(), letting me over ride the default for all rate control parameters defined in the RFC.

@aboba
Copy link
Contributor

aboba commented Jan 9, 2022

Do we really need to reset the RTO (or change the congestion control algorithm) while traffic is flowing? The congestion control parameters seem like they should be put in place prior to establishing the SCTP association, without the requirement that they be changeable while the association is active.

@tuexen
Copy link

tuexen commented Jan 9, 2022

Do we really need to reset the RTO (or change the congestion control algorithm) while traffic is flowing? The congestion control parameters seem like they should be put in place prior to establishing the SCTP association, without the requirement that they be changeable while the association is active.

SCTP stacks allow setting these parameters during the lifetime of an association. If it is useful to do so, depends on WebRTC. Just wanted to let you know that it is possible.

@tuexen
Copy link

tuexen commented Jan 9, 2022

It seems we all agree SCTP congestion control needs to improve. It also looks like researchers are still checking alternatives and more users are feeling the pain.

Do you have any congestion control in mind, which you think is appropriate? I think the IETF has NewReno and CUBIC standardised, BBRv2 might be done in the future.

I think loss detection could be improved by using RACK for SCTP.

@daonb
Copy link
Author

daonb commented Jan 10, 2022

Do you have any congestion control in mind, which you think is appropriate?

Can SCTP traffic cause congestion? I think not as control traffic is erratic and the bandwidth is negligible, especially when it's sharing the bandwidth with video & audio streams. If there is congestion on the line, it's not the data channels that are causing it, but the video ones and it's up to SRTP to fix it (and if someone is using data channels to transfer files they should add rate control).

That's why when data is lost, I'd like SCTP to retransmit and be aggressive about it. With the current congestion control, that means setting RTOMax to one second. If I could, I would probably like to retransmit even more often when network conditions are really bad.

@alvestrand
Copy link
Collaborator

SCTP is a protocol exposed to applications. It can be used for anything the application wants to use it for. We've had people who have filed bugs because our Chrome implementation fails to transfer at more than 50 Mbits/second.

That's not negligible.

When network conditions are really bad, being more aggressive in retransmission is a contribution to congestion collapse. Don't try to go there.

@aboba
Copy link
Contributor

aboba commented Jan 10, 2022

To elaborate on Harald's comment, data channels have been used for transport of media in applications such as game streaming and even conferencing. Congestion is commonplace in those applications.

@tuexen
Copy link

tuexen commented Jan 10, 2022

SCTP is a protocol exposed to applications. It can be used for anything the application wants to use it for. We've had people who have filed bugs because our Chrome implementation fails to transfer at more than 50 Mbits/second.

That is interesting. Originally it was important that SCTP traffic does not affect other traffic and it should not use a high bandwidth. Based on that, the default values for parameters like buffer sizes are dimensioned in the usrsctp stack...

That's not negligible.

When network conditions are really bad, being more aggressive in retransmission is a contribution to congestion collapse. Don't try to go there.

@tuexen
Copy link

tuexen commented Jan 10, 2022

To elaborate on Harald's comment, data channels have been used for transport of media in applications such as game streaming and even conferencing. Congestion is commonplace in those applications.

Is there some need for implementing alternate CC modules (if that is possible with respect to IPRs)? If yes, which ones?

@alvestrand
Copy link
Collaborator

A desire has been stated for integrating congestion control across SCTP and media, using a realtime congestion control protocol like Google Transport-wide-CC, SCREAM or NADA. There's some wishful thinking in -rtcweb-transport about how such a combo should work, but I think there are probably dragons once we try implementing it.

@tuexen
Copy link

tuexen commented Jan 11, 2022

Years ago there was the desire to cap the throughput of SCTP. I implemented a way to limit the cwnd used by SCTP. That was a simple way for the media system to coordinate with SCTP. Not sure if anyone is using it, never got any feedback after implementing it.

A possible way of coordinating would RFC 8699. Where you thinking along these lines?

@aboba
Copy link
Contributor

aboba commented Jul 7, 2022

My suggestion is to open a separate issue for integrated congestion control across SCTP and media (or alternative CC algorithms).

It seems to me that this use case (e.g. terminal or "PC in the Cloud") is implementable with the existing API using unreliable/unordered transport with the application taking care of re-transmission and/or forward error correction.

Given this, I'd recommend we resolve this issue as "Won't fix".

@daonb
Copy link
Author

daonb commented Jul 10, 2022

Thanks for taking the time to review this.

I believe the terminal use case is very similar to that of chat application and online games using data channels to sync data. In all these cases the information is time critical and bandwidth is negligible. Under congestion, all these use cases will become unusable due to RTCP's exponential backoff. While there's a place to discuss a better congestion control algorithm, I prefer a simpler solution.

The underling SCTP protocol has support for capping the retransmission timer, in the form of RTO.max. Adding it to PeerConnection will give us a way to ensure the retransmission timer don't reach values that make the payload irrelevant and that our apps don't get "punished" for congestion caused by other transmitters.

@aboba
Copy link
Contributor

aboba commented Jul 19, 2022

Feedback from June 19, 2022 WEBRTC WG Virtual Interim: the WG does not support allowing applications modify congestion control parameters such as RTO.initial, RTO.max, etc. There is interest in better enabling RTCDataChannel to be used for applications requiring low-latency, but that should be discussed in a separate issue such as #111

@aboba aboba closed this as completed Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants