Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming plugin: deadlock when handling API messages #2691

Closed
lionelnicolas opened this issue Jun 10, 2021 · 2 comments · Fixed by #2700
Closed

streaming plugin: deadlock when handling API messages #2691

lionelnicolas opened this issue Jun 10, 2021 · 2 comments · Fixed by #2700

Comments

@lionelnicolas
Copy link
Contributor

lionelnicolas commented Jun 10, 2021

I think one of our janus servers experienced a deadlock in the streaming plugin. The websocket API became unresponsive and I was not able to create/destroy mountpoint anymore (no response from the streaming plugin API)

I think the deadlock happens on the mountpoints_mutex mutex.

To determine that, I use the admin API to communicate with the streaming plugin.

If sending :

{
   "request": "info"
}

I receive the reply a reply from the streaming plugin :

{
    "janus": "success",
    "transaction": "A0WGq3Tkgnaz",
    "response": {
        "streaming": "event",
        "error_code": 453,
        "error": "Missing mandatory element (id)"
    }
}

Which is handled by

if(!string_ids) {
JANUS_VALIDATE_JSON_OBJECT(root, id_parameters,
error_code, error_cause, TRUE,
JANUS_STREAMING_ERROR_MISSING_ELEMENT, JANUS_STREAMING_ERROR_INVALID_ELEMENT);
} else {
JANUS_VALIDATE_JSON_OBJECT(root, idstr_parameters,
error_code, error_cause, TRUE,
JANUS_STREAMING_ERROR_MISSING_ELEMENT, JANUS_STREAMING_ERROR_INVALID_ELEMENT);
}
if(error_code != 0)
goto prepare_response;
, before the mutex is locked.

But, if trying to request info on a non-existing endpoint :

{
   "request": "info",
   "id": "123"
}

In that case the request never completes, and no log messages are shown (I was expecting No such mountpoint/stream), so I guess the query is stuck here :

janus_mutex_lock(&mountpoints_mutex);

I wasn't able to reproduce the issue yet, but I'll update the ticket when I have some news.

Notes:

  • this was on 9eeeb38 (3 commits from master as of today)
  • not sure if that's a coincidence, but if looks like this happened few seconds after 6-7 mountpoint destroy were done in a short period of time
  • existing streaming plugin RTSP mountpoints are still running (I can see them looping on RTSP retries because the source is gone, but I can't destroy those mountpoints)
  • the videoroom plugin is also running on the same server, and continue to work even if the streaming plugin API seems dead
@lminiero
Copy link
Member

If you want to debug locks, you can use our lock debugging feature, which will print any attempt to lock/unlock and the file/line where it happened. Ideally you'll want it to start from the beginning (enabled in janus.jcfg), but it can be enabled/disabled dynamically via Admin API as well. Of course, you should ensure it's enabled before the issue starts, or you may be missing info. Notice it might also greatly increase the log size.

@lionelnicolas
Copy link
Contributor Author

Ok thanks, I'm planning to use that feature as soon as I find a reproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants