Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.x] Sip session get stuck #3332

Closed
Jonbeckas opened this issue Feb 23, 2024 · 12 comments
Closed

[1.x] Sip session get stuck #3332

Jonbeckas opened this issue Feb 23, 2024 · 12 comments
Labels
multistream Related to Janus 1.x

Comments

@Jonbeckas
Copy link

Jonbeckas commented Feb 23, 2024

What version of Janus is this happening on?
1.2.2; b98e3bb

Have you tested a more recent version of Janus too?
Yes, on the master branch

Was this working before?
Not sure, behaviour of the telephone system we use changed (3cx)

Additional context
If a session has been hangup, the JANUS_ICE_WEBRTC_ALERT flag will be set in janus_ice_webrtc_hangup and removed in the next call in janus_ice_setup_local which is called by janus_plugin_handle_sdp.
For a denied incoming sip call with no sdp body, the JANUS_ICE_WEBRTC_ALERT will not be removed and during the following hangup the
janus_ice_webrtc_hangup will be abortet before the plugin is notified and the establishing attribute will be set to 0, so the session will get stuck and denies all incoming calls.

@Jonbeckas Jonbeckas added the multistream Related to Janus 1.x label Feb 23, 2024
@lminiero
Copy link
Member

lminiero commented Feb 27, 2024

If you can provide replication steps (including maybe a sipp script), I'll try to have a look at what the issue might be. As I wrote in reply to the PR, I don't think the patch is a proper fix, since it would introduce different problems, and so I'd like to investigate a different solution.

@Jonbeckas
Copy link
Author

The replication steps are:

  1. Get INVITE without sdp offer
  2. Decline it with Janus
  3. Get another INVITE without an offer
  4. Decline it with Janus
  5. Janus Sip Plugin will keep the session as establishing an deny further calls

I am not really familliar with sipp, but i try to add a sipp script later.

@lminiero
Copy link
Member

lminiero commented Mar 5, 2024

If these invites are offerless, then I don't think the core or alert states have anything to do with it: there wouldn't be any SDP to trigger a new PeerConnection establishment. It's much more likely an inconsistent state within the SIP plugin itself. I'll try to replicate and let you know.

@lminiero
Copy link
Member

lminiero commented Mar 5, 2024

I think I have a better understanding now, and why you were trying to tinker with the alert flag. It's true, as I said, that there's no PeerConnection establishment involved, but that's apparently the very root of the issue, rather than the reason why it shouldn't happen.

Basically, an offerless INVITE means no SDP and so, again, no PC: at the same time, though, when you decline the call, we invoke the close_pc() function in the core from the plugin, to clean up any WebRTC resource that may have been allocated; this results in the alert flag being set to true, and the hangup_media() callback being called on the plugin, which resets the plugin flags (establishing, established). So the first time it happens, it works fine: the problem, though, is that there's no actual WebRTC cleanup happening (we never initialized a PC) and so alert stays true. At the second offerless INVITE, the same thing happens, but this time the call to close_pc() finds alert already true, which means hangup_media() is not called again on the plugin (we do that to avoid duplicates from the same event). As a result, the plugin establishing flag remains set, and further calls are automatically rejected, due to a broken stats in the plugin itself.

I'm wondering now what the right approach would be to address this. The "easy" fix would be to handle this directly in the SIP plugin, but in practice other plugins could in some cases end up in the same situation (even though it also depends on how they handle signaling, and the same two consecutive close_pc to two consecutive "no PC" should be happening, so much less likely). I'm still not convinced your PR addresses it properly, since it could break some core states. I'll think about it some more and let you know when I come up with a potential fix.

@lminiero
Copy link
Member

lminiero commented Mar 5, 2024

@Jonbeckas can you try this diff?

diff --git a/src/ice.c b/src/ice.c
index da8ffd10..dc5ef226 100644
--- a/src/ice.c
+++ b/src/ice.c
@@ -1685,6 +1685,7 @@ static void janus_ice_webrtc_free(janus_ice_handle *handle) {
 		return;
 	janus_mutex_lock(&handle->mutex);
 	if(!handle->agent_created) {
+		janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_ALERT);
 		janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_NEW_DATACHAN_SDP);
 		janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_READY);
 		janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_CLEANING);

This immediately resets the alert back to false if when we get to the point of freeing the resources (independently of how we got there) there's actually nothing to cleanup. In my local SIPp tests I can't replicate the issue anymore, but it would be better to test this in other SIP scenarios too. As soon as you can confirm it doesn't break anything for you, I'll push the fix upstream.

@Jonbeckas
Copy link
Author

The patch seems to change the bug a bit for me,
The scenario I described above does work now, but after I accept the call and later hang up, the next offerless call, that is denied will leave the session in an establishing=1 state again.

@lminiero
Copy link
Member

lminiero commented Mar 5, 2024

Mh, thinking about it, that was to be expected, and would have happened even before this patch. When a regular call is closed, the same will happen (close_pchangup_media) but in this case alert will remain true: normally it's unset only when a new call starts, in fact. This means that after that successful call, a new offerless invite being declined will find alert set to true and not trigger the hangup_media call, thus causing the same problem as before.

In theory, the most obvious fix would be to ensure we reset the reset flag when we've cleaned up resources, but I'm wondering if that may cause issues in some cases. As I mentioned, we use that flag to also prevent multiple hangup_media occurrences (e.g., different things cause a PC to close), and having it reset right away instead of right before the next call may cause that to break. It may even cause a loop, if the pluginis wrongly wired (e.g., close_pc and hangup_media triggering each other). I'll think about this some more.

@lminiero
Copy link
Member

lminiero commented Mar 5, 2024

While I think of the implications, you can give the following patch a try, which always resets the alert flag when cleaning WebRTC resources:

diff --git a/src/ice.c b/src/ice.c
index da8ffd10..96b149d1 100644
--- a/src/ice.c
+++ b/src/ice.c
@@ -1685,6 +1685,7 @@ static void janus_ice_webrtc_free(janus_ice_handle *handle) {
 		return;
 	janus_mutex_lock(&handle->mutex);
 	if(!handle->agent_created) {
+		janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_ALERT);
 		janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_NEW_DATACHAN_SDP);
 		janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_READY);
 		janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_CLEANING);
@@ -1755,6 +1756,7 @@ static void janus_ice_webrtc_free(janus_ice_handle *handle) {
 		janus_ice_notify_hangup(handle, handle->hangup_reason);
 	}
 	handle->hangup_reason = NULL;
+	janus_flags_clear(&handle->webrtc_flags, JANUS_ICE_HANDLE_WEBRTC_ALERT);
 	janus_mutex_unlock(&handle->mutex);
 	JANUS_LOG(LOG_INFO, "[%"SCNu64"] WebRTC resources freed; %p %p\n", handle->handle_id, handle, handle->session);
 }

Please let me know if you notice any regression.

@Jonbeckas
Copy link
Author

The patch works like a charm for me.

@lminiero
Copy link
Member

lminiero commented Mar 5, 2024

FYI, after careful consideration I've decided this will not be the patch I'll commit, due to the considerations I've made before. I'll instead ensure that alert is set to true as a default, since the anomaly was that a hangup_media was following a close_pc the very first time you sent an offerless INVITE, and that's wrong. This means I'll work on a fix in the SIP plugin itself.

I'll let you know when a patch is ready. I'll probably prepare a PR, so that more people can test the effect on other plugins as well.

@lminiero
Copy link
Member

lminiero commented Mar 5, 2024

@Jonbeckas please test the PR above, which attempts the fix in a different way. It should address both scenarios you had problems with. You may want to test more, though, just to ensure nothing else breaks. Notice I also fixed the error code we send back by default when declining: for some reason it was 486 instead of 603.

@Jonbeckas
Copy link
Author

The PR works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multistream Related to Janus 1.x
Projects
None yet
Development

No branches or pull requests

2 participants