-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.11.8] Janus Server Crash #2964
Comments
That's a known thing, and is not caused by Janus but by not configuring limits properly as explained here: https://janus.conf.meetecho.com/docs/FAQ.html#ulimit In fact we checked the logs and I saw the |
I took a closer look, increased Linux limit, and made some tests, and yes the resource of file descriptors/open files is completely and fast exhausted with the command ran in browser, works until limit reached, 100000+ file descriptors easily used up and locked by one browser/user/one session, server crash is a bonus. I did not find anything to fix this issue, reading the ulimit settings on Mongo and did some more searching. I may be wrong, I hope that I am, really do. Looking forward to test your demo site configuration once you make the changes. If Janus does not crash, out of resources:
|
I suspect this is due to the thread pool we use internally to handle messages addressed to plugins. Both the HTTP and WebSockets transports use a single thread for their server functionality, and both then pass incoming requests to the core for processing; the core also has a single thread for processing most of them, with the exception of From what I can see, the problem may be that at startup we create a thread pool for that task with no limitation: Line 5229 in b63e422
meaning the core is free to spawn new threads when there's many incoming requests to process. As explained in the g_thread_pool_new documentation, in fact, Can you try changing that I'd rather not add any shaping functionality for incoming traffic in Janus itself, instead. Janus is often used by companies with a server side component that controls an instance, where a single address may create a single session but many handles to orchestrate the users the service is managing, and adding a shaper there could severely impact the performance of Janus in controlled environments. Besides, my guess is that shapers to HTTP/WS traffic could be better implemented, and more easily, in a proxy component like nginx instead, that is before it reaches Janus in the first place. I'm pretty sure there are also ways to integrate with frameworks like |
I currently use the nginx port forwarding to janus, using it to limit connections to janus websocket, should be safer etc, however there is no way to do anything once the websocket connection is established, like in this case, unless directly applied to janus framework(as long as you use websockets provided by janus), I have done this to another framework, but janus is currently out of my league. Anyways, I changed the janus-gateway/janus.c and restarted the server. Still no luck, error the same. Hundred of these system logs:
I see that Handle is attached for every request but not detached fully or at all, otherwise the resource would not build up . In the frontend, I attach the handle, and if I do not call detach the resource is not freed until I disconnect. Probably the solution for this would be to limit the number of handles one session user can have, since one handle uses pretty much server resources (open files especially) and why you need 100s of them. But that is not the only issue, since I can also abuse the janus .send() system the same way, once I connect to a handle, overwhelm the janus server and crash it (since there are no limits). Is this the case? |
Did you recompile with a
No, that's not going to happen, because as I said in my previous post it's not uncommon at all to have server-side controllers create a single session and multiple handles on behalf of users that talk to the server via a custom API, and I'm not going to cripple potentially major use cases. Janus was conceived to have a "raw" API that can be used to take advantage of the full potential right away, so that comes with the territory. One more thing you can try (besides recompiling after the change I suggested) is set the
Janus is written in C so yes, proficiency in the language would need to to be learnt to contribute. |
This So now I only need to work on a solution to disconnect the user if It bursts requests for my CPU. Solution I came up with: This is not a bug, but Janus miss-configuration. Seriously Lorenzo, Thanks! |
Thanks for the feedback and for testing! Since you needed both to keep your server in shape, I'll add a configuration property for the number of threads in the task pool: I'll leave the default to |
I've just added the property to both |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [meetecho/janus-gateway](https://github.com/meetecho/janus-gateway) | patch | `v1.0.1` -> `v1.0.2` | --- ### Release Notes <details> <summary>meetecho/janus-gateway</summary> ### [`v1.0.2`](https://github.com/meetecho/janus-gateway/blob/HEAD/CHANGELOG.md#v102---2022-05-23) [Compare Source](meetecho/janus-gateway@v1.0.1...v1.0.2) - Abort DTLS handshake if DTLSv1\_handle_timeout returns an error - Fixed rtx not being offered on Janus originated PeerConnections - Added configurable property to put a cap to task threads \[[Issue-2964](meetecho/janus-gateway#2964)] - Fixed build issue with libressl >= 3.5.0 (thanks [@​ffontaine](https://github.com/ffontaine)!) \[[PR-2980](meetecho/janus-gateway#2980)] - Link to -lresolv explicitly when building websockets transport - Fixed RED parsing not returning blocks when only primary data is available - Fixed typo in stereo support in EchoTest plugin - Added support for dummy publishers in VideoRoom \[[PR-2958](meetecho/janus-gateway#2958)] - Added new VideoRoom request to combine subscribe and unsubscribe operations \[[PR-2962](meetecho/janus-gateway#2962)] - Fixed incorrect removal of owner/subscriptions mapping in VideoRoom plugin \[[Issue-2965](meetecho/janus-gateway#2965)] - Explicitly return list of IDs VideoRoom users are subscribed to for data \[[Issue-2967](meetecho/janus-gateway#2967)] - Fixed data port not being returned when creating Streaming mountpoints with the legacy API - Fix address size in Streaming plugin RTCP sendto call (thanks [@​sjkummer](https://github.com/sjkummer)!) \[[PR-2976](meetecho/janus-gateway#2976)] - Added custom headers for SIP SUBSCRIBE requests (thanks [@​oriol-c](https://github.com/oriol-c)!) \[[PR-2971](meetecho/janus-gateway#2971)] - Make SIP timer T1X64 configurable (thanks [@​oriol-c](https://github.com/oriol-c)!) \[[PR-2972](meetecho/janus-gateway#2972)] - Disable IPv6 in WebSockets transport if binding to IPv4 address explicitly \[[Issue-2969](meetecho/janus-gateway#2969)] - Other smaller fixes and improvements (thanks to all who contributed pull requests and reported issues!) </details> --- ### Configuration 📅 **Schedule**: At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox. --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). Reviewed-on: https://git.walbeck.it/walbeck-it/docker-janus-gateway/pulls/79 Co-authored-by: renovate-bot <[email protected]> Co-committed-by: renovate-bot <[email protected]>
What version of Janus is this happening on?
ALL, [0.11.8] and 1.x(on your demo site).
Have you tested a more recent version of Janus too?
Yes, 1.x(on your demo site, today).
Was this working before?
no info.
Is there a gdb or libasan trace of the issue?
no idea
Additional context
Just go to Janus Demo Site(videoroom.plugin), press START and use google chrome console, type at least twice to crash the Janus server, may not work at first try, just refresh and try again, sometimes you can use much smaller loop, since memory is not cleared or something like that, actually have not investigated that much:
On latest v.0.11.8 I get system errors:
The text was updated successfully, but these errors were encountered: