Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent generating huge opus files when received timestamp is 0 #2328

Closed
wants to merge 4 commits into from
Closed

Prevent generating huge opus files when received timestamp is 0 #2328

wants to merge 4 commits into from

Conversation

ihusejnovic
Copy link
Contributor

Hi Meetecho,

Recently we noticed that janus-pp-rec tool generates huge opus files(20GB, 30GB...) for some SIP calls.
We use the latest Janus version, so bugfix introduced in PR #2250 didn't resolve our issue.

We converted the problematic mjr files to pcap and found some packets with very large timestamps(almost max uint32).
But packets before and after the problematic packet were fine. Here is an example:

image

Then we captured RTP packets on the network and found out that those problematic packets are received with a timestamp set to 0 and Janus converts them to large integer due to negative subtraction of timestamp and base_ts values.

Here is an example from the network trace:

image

I couldn't find any RFC how this packet with timestamp=0 and marker=1 should be handled so I decided to calculate timestamp based on the time when the previous packet is received (the same solution was used for an SSRC change). If you have another idea or any suggestion let me know.

We have tested this fix last few days and it's working fine for us.

@neilkinnish can you also please take a look at this? Maybe this might resolve your issue, too.

@atoppi
Copy link
Member

atoppi commented Aug 26, 2020

@alexamirante found out a rationale behind that packet (cfr. https://tools.ietf.org/html/rfc3389#section-5.1)

Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)

The receiver can detect silence suppression on the first
packet received after the silence by observing that the RTP timestamp
is not contiguous with the end of the interval covered by the
previous packet even though the RTP sequence number has incremented
only by one. The RTP marker bit is also normally set on such a
packet.

AFAIK we never encountered this before, but given its existence and the possibility of SIP phones to implement the RFC, we should take it into account maybe by filtering/adjusting those packets in the sip plugin and not in the core methods (like this PR is proposing).

@@ -689,7 +694,12 @@ void janus_rtp_header_update(janus_rtp_header *header, janus_rtp_switching_conte
}
/* Compute a coherent timestamp and sequence number */
context->a_prev_ts = context->a_last_ts;
context->a_last_ts = (timestamp-context->a_base_ts) + context->a_base_ts_prev;
if(timestamp >= context->a_base_ts) {
Copy link
Member

@atoppi atoppi Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct.
Any packet received with a TS lesser than the base TS will have its TS replaced with a value estimated on the real time (janus_get_audio_time_diff).
E.g. a SSRC change imposes a new base ts with a high value (say 4000000000). Once the TS wraps around, any new packet until 4000000000 will have its TS replaced with a new estimated one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When SSRC is changed, context->a_base_ts is set to the current timestamp. So, it will not be replaced with a new estimated timestamp. But, this is irrelevant now because we will move this code to the SIP plugin. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RTP TS will eventually wrap around (keeping the same SSRC). So after a certain amount of time you'll receive a TS lesser than the last known base TS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, thanks.

@ihusejnovic
Copy link
Contributor Author

Thank you for your comments @atoppi.
I agree, let's move this fix to SIP plugin. I will filter these packets based on the provided RFC, but do you have any idea how should we adjust it? Should we replace timestamp like I already did in this PR or something else?

@lminiero
Copy link
Member

Not sure what the proper fix is here, but it looks like a severely broken SIP client, since it doesn't even change payload type or SSRC: it just inserts a completely broken timestamp in there. Not saving in the mjr packets that have timestamp 0 would definitely not be a proper fix, since there are legitimate packets starting like that. My guess is that you'll have to add a check even before the janus_rtp_header_update call, to check timestamp 0 packets on the same SSRC as before and check if the previous timestamp does not justify a reset.

@lminiero
Copy link
Member

@alexamirante found out a rationale behind that packet (cfr. https://tools.ietf.org/html/rfc3389#section-5.1)

Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)

Apologies, missed this post, I didn't know about this either.

@lminiero
Copy link
Member

we should take it into account maybe by filtering/adjusting those packets in the sip plugin and not in the core methods

Giving it some thought, I actually think it might make sense to fix those packets in janus_rtp_header_update instead, so in the RTP core management. We would have to do the same fix in NoSIP anyway (so replicated code), and we don't know if browsers will decide to start supporting this ugly thing in the future. It's just some additional logic that needs to be enforced when the SSRC doesn't change.

What I don't get is how to handle LEGITIMATE cases where the timestamp is very different and the sequence number increases by one: examples are media not being sent for a long time and then starting again, where the sequence number would increase by one (so that the peer knows they didn't lose any packet), but the timestamp reflects the current time (and so would not be the expected increase). This would probably force the code to be aware of the interarrival of packets somehow, which means that the timestamp fix for CN should only be applied if the value changed a lot but the last packet arrived a little time ago.

@ihusejnovic
Copy link
Contributor Author

What I don't get is how to handle LEGITIMATE cases where the timestamp is very different and the sequence number increases by one: examples are media not being sent for a long time and then starting again, where the sequence number would increase by one (so that the peer knows they didn't lose any packet), but the timestamp reflects the current time (and so would not be the expected increase). This would probably force the code to be aware of the interarrival of packets somehow, which means that the timestamp fix for CN should only be applied if the value changed a lot but the last packet arrived a little time ago.

This is stated in RFC:

The RTP marker bit is also normally set on such a packet.

Maybe we can use a marker bit to differentiate this case from legitimate one?

@lminiero
Copy link
Member

Maybe we can use a marker bit to differentiate this case from legitimate one?

I don't think so. The marker bit is also set when you start sending audio after a long time that you didn't, so exactly the legitimate use case we want to allow.

@atoppi
Copy link
Member

atoppi commented Aug 27, 2020

We might compare the received RTP TS with a computed TS based on the elapsed time, and if

those values diverge too much
AND
seq num is greater than the previous one
AND
marker bit is set

we are pretty sure that this is a CN packet.

Anyway this will add a computational effort for any received packet, not sure if this is worth given the uncommon scenario.

@lminiero
Copy link
Member

lminiero commented Aug 31, 2020

@ihusejnovic can you send us a pcap of a call (RTP only) with this issue, a capture of the SIP-to-Janus side? This way we can feed it to a SIPp script and test whatever fix/solution we come up with.

@ihusejnovic
Copy link
Contributor Author

@ihusejnovic can you send us a pcap of a call (RTP only) with this issue, a capture of the SIP-to-Janus side? This way we can feed it to a SIPp script and test whatever fix/solution we come up with.

Sure, shared on email.

@atoppi
Copy link
Member

atoppi commented Sep 1, 2020

@ihusejnovic we are considering to introduce a specific fix for this issue in janus-pp-rec once #2345 will be merged.

@atoppi
Copy link
Member

atoppi commented Sep 1, 2020

@ihusejnovic meanwhile, would you mind sharing the mjr capture that contained the very large timestamps(almost max uint32) ?

@lminiero
Copy link
Member

lminiero commented Sep 1, 2020

As a side note, the SDP offer sent by the SIP device (or was it an answer?) and how Janus answered would help too, to see if they were offering CN in the first place and how browser/Janus reacted. If this happens because the SIP endpoint offers CN and we reject it, apparently as @atoppi found out there's a silenceSupp property we can set to off in the SDP: knowing what to react to would allow us to change the code accordingly.

@ihusejnovic
Copy link
Contributor Author

@ihusejnovic meanwhile, would you mind sharing the mjr capture that contained the very large timestamps(almost max uint32) ?

Sent on email.

As a side note, the SDP offer sent by the SIP device (or was it an answer?) and how Janus answered would help too, to see if they were offering CN in the first place and how browser/Janus reacted. If this happens because the SIP endpoint offers CN and we reject it, apparently as @atoppi found out there's a silenceSupp property we can set to off in the SDP: knowing what to react to would allow us to change the code accordingly.

Here is SDP offer and answer from Janus: https://pastebin.com/4pyEU5f6

Thank you.

@atoppi
Copy link
Member

atoppi commented Oct 5, 2020

hi @ihusejnovic,
getting back to this, have you tried disabling silence suppression in freeswitch through this setting?

@ihusejnovic
Copy link
Contributor Author

hi @ihusejnovic,
getting back to this, have you tried disabling silence suppression in freeswitch through this setting?

Hi,
I don't have access to this Freeswitch so I couldn't change any settings. I have tried to remove line
a=rtpmap:13 CN/8000 from SDP answer on Janus but it didn't help.

@atoppi
Copy link
Member

atoppi commented Oct 13, 2020

Hi,
I don't have access to this Freeswitch so I couldn't change any settings. I have tried to remove line
a=rtpmap:13 CN/8000 from SDP answer on Janus but it didn't help.

Removing that attribute will not help.
Your SIP endpoint is generating RTP comfort noise with no specific payload type. It is using the same PT of media, keeping seq numbers in order, but setting that 0 timestamp.

I am pasting here a untested quick hack that will add silenceSupp off to the SDP that Janus sends to the sip endpoint.
Please give it a try and let us know if it helps.

diff --git a/plugins/janus_sip.c b/plugins/janus_sip.c
index 1d363e9e..6a249a1d 100644
--- a/plugins/janus_sip.c
+++ b/plugins/janus_sip.c
@@ -5896,6 +5896,11 @@ char *janus_sip_sdp_manipulate(janus_sip_session *session, janus_sdp *sdp, gbool
 				ma = ma->next;
 			}
 		}
+		/* Reject RTP silence suppression */
+		if(m->type == JANUS_SDP_AUDIO) {
+			janus_sdp_attribute *a = janus_sdp_attribute_create("silenceSupp","off","----");
+			janus_sdp_attribute_add_to_mline(m, a);
+		}
 		temp = temp->next;
 	}
 	/* If we need to remove some payload types from the SDP, do it now */

Save it to patch.diff and apply it with git apply patch.diff on top of current master.

@ihusejnovic
Copy link
Contributor Author

Hi @atoppi,
I have tested it for a few days but it didn't resolve the issue.
Here are SDPs from FS and Janus: https://pastebin.com/VAtJqiPA
I can share the MJR file by email but here is just a sample from PCAP(converted from MJR):

image

@atoppi
Copy link
Member

atoppi commented Oct 20, 2020

@ihusejnovic I guess FreeSwitch simply ignores that attribute then.
Well, it seems there is no explicit way of disabling this behavior in the SDP, so this leaves the reaction in either the core or pprec as the only chance.

I'd be more hesitant to do this fix in the core, as I think common rtp sources (e.g. browser) do not support this kind of CN.
Maybe we could add a specific flag to pp-rec to enable CN detection, leaving to the user the responsibility to check if the final file is OK.

Anyway I'm putting on hold the decision for this fix until #2345 will be merged.

@atoppi
Copy link
Member

atoppi commented Oct 21, 2020

@ihusejnovic meanwhile would you mind sending another mjr recording of a call experiencing the issue?

@ihusejnovic
Copy link
Contributor Author

ihusejnovic commented Oct 21, 2020

@ihusejnovic meanwhile would you mind sending another mjr recording of a call experiencing the issue?

Sure, sent on email.

@atoppi
Copy link
Member

atoppi commented Oct 30, 2020

Closed the PR for now, let's continue the discussion on the issue #2414.
We'll come up with a proper solution in a new PR as soon as possible.

lminiero added a commit that referenced this pull request Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants