Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crashed Janus with error heap-use-after-free in videoroom plugin #2509

Closed
Sensemakers-Nick opened this issue Dec 31, 2020 · 23 comments
Closed

Comments

@Sensemakers-Nick
Copy link

Yesterday one of our servers crashed with the error: heap-use-after-free using the videorooms plugin.
Couldn't find updates to this particular part of code in GitHub, nor issues so decided to open a ticket.

gdb pastebin: https://pastebin.com/6GZFShE0
libasan pastebin: https://pastebin.com/YRX2j2j8

The load of the server ramped up pretty badly before crashing.
Image of the load

Roughly 40 videorooms connected.

@atoppi
Copy link
Member

atoppi commented Jan 4, 2021

Hi @Sensemakers-Nick, thanks for the detailed report.
It seems that a room has been destroyed while the plugin was handling a negotiation.

More specifically the videoroom->h264_profile string attribute is being read by g_ascii_strdown in janus_sdp_get_codec_pt_full in order to craft a SDP answer, but since the room has been freed that string is not valid anymore, hence the libasan trap.

Maybe we need and extra reference on the videoroom while handling the negotiation.

@Sensemakers-Nick
Copy link
Author

Hi @atoppi,

Thanks for the reply, you're welcome.

@atoppi
Copy link
Member

atoppi commented Jan 7, 2021

@Sensemakers-Nick I'm attaching a patch that tries to address the issue, could you please apply it to your sources and test again?

git apply fix-increase-vroom-ref.txt

then rebuild janus

@Sensemakers-Nick
Copy link
Author

Thanks @atoppi
Will apply the patch next week during a maintenance window and keep you posted!

@tmatth
Copy link
Contributor

tmatth commented Jan 7, 2021

@atoppi we haven't seen this crash but will test the patch and report any issues.

@atoppi
Copy link
Member

atoppi commented Jan 7, 2021

Thank you @tmatth !

@Sensemakers-Nick
Copy link
Author

We've succesfully applied the patch, waiting for results, will let you know @atoppi

@Sensemakers-Nick
Copy link
Author

Sensemakers-Nick commented Jan 19, 2021

@atoppi The same error heap-use-after-free in videoroom plugin happened today, but this time during the alteration of allowed participants in a destroyed room. Our API tried to edit the list of allowed participants in the room, but during this time, the room got destroyed.

Should I open a new ticket?

Pastebin: https://pastebin.com/yQDtQVFm

@atoppi
Copy link
Member

atoppi commented Jan 19, 2021

Yes please open a new issue and try to add some more info.

  1. enable verbose logging (5)
  2. the freed/allocated call stacks in libasan are too short, check your CFLAGS (I suggest -Og -g3 -ggdb3 -fno-omit-frame-pointer -fsanitize=address)
  3. try to capture a refcount debug dump (uncomment #define REFCOUNT_DEBUG in refcount.h and recompile)

@Sensemakers-Nick
Copy link
Author

Sensemakers-Nick commented Jan 19, 2021

@atoppi will do, was compiled with

CFLAGS="-O1 -g3 -ggdb3 -fno-omit-frame-pointer -fsanitize=address -fno-sanitize-recover=all -fsanitize-address-use-after-scope" LDFLAGS="-fsanitize=address"

  • Changing O1 to Og
  • Enabling define REFCOUNT_DEBUG in refcount.h

@Sensemakers-Nick
Copy link
Author

Unfortunately there is no crash log available at this moment, hopefully next time I can also provide you a backtrace.

@atoppi
Copy link
Member

atoppi commented Jan 20, 2021

@Sensemakers-Nick do you think the original issue has been solved with the patch ?

@Sensemakers-Nick
Copy link
Author

u think the original issue has been solved with th

I cannot say for sure, will need to wait it out a bit, if all goes well the next two weeks it has likely worked.
Am getting a lot more feedback now from Janus.

Information like this:
Jan 20 14:31:28 senprtc01.senp.local janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_destroy_session:2649:increase] 0x607000743ba0 (2) Jan 20 14:31:28 senprtc01.senp.local janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_session_destroy:1732:decrease] 0x607000743ba0 (1)

@Sensemakers-Nick
Copy link
Author

Hi @atoppi,

We've trapped the same error yesterday, this time with the log information you've requested:
https://pastebin.com/55RYJPM8

@atoppi
Copy link
Member

atoppi commented Jan 22, 2021

Same error, different issue.
This time a participant session has been freed before iterating over room participants list to notify them of a room destruction.
We need the whole log that includes the refcount lines to understand how that session has been freed.
I suspect the massive usage you're doing of the admin api to forward requests to the plugin is triggering many race conditions we were unaware of.

@Sensemakers-Nick
Copy link
Author

I'll drop the entire log, it's going to be a big one ;-)

@teveelnicks
Copy link

@atoppi couldn't paste it in the pastebin, so I've shared them here.

portion of the log
entire log

Hope this helps

@atoppi
Copy link
Member

atoppi commented Jan 22, 2021

thanks it helps, this is the freed reference of the last crash (to match lines one need to apply the patch that I attached in this issue).

Jan 21 19:04:48 senprtc01 janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_create_session:2521:init] 0x607000b12810 (1)
Jan 21 19:05:00 senprtc01 janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_handle_message:4676:increase] 0x607000b12810 (2)
Jan 21 19:05:00 senprtc01 janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_hangup_media:5448:increase] 0x607000b12810 (3)
Jan 21 19:05:00 senprtc01 janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_hangup_media:5451:decrease] 0x607000b12810 (2)
Jan 21 19:05:00 senprtc01 janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_destroy_session:2649:increase] 0x607000b12810 (3)
Jan 21 19:05:00 senprtc01 janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_session_destroy:1732:decrease] 0x607000b12810 (2)
Jan 21 19:05:00 senprtc01 janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_destroy_session:2677:decrease] 0x607000b12810 (1)
Jan 21 19:05:00 senprtc01 janus[29922]: [plugins/janus_videoroom.c:janus_videoroom_message_free:1775:decrease] 0x607000b12810 (0)

@atoppi
Copy link
Member

atoppi commented Jan 22, 2021

This seems a race condition happening when a user session gets destroyed while the plugin is still handling the setup.
The last unref that makes the count go to 0 is due to janus_videoroom_message_free , called when the plugin has handled the message.

@teveelnicks
Copy link

teveelnicks commented Jan 22, 2021

You're welcome!
I did apply the patch tough.

https://pastebin.com/hFSuk329

@atoppi
Copy link
Member

atoppi commented Jan 22, 2021

@teveelnicks I have updated the patch with another fix for the last crash. Please apply on master and test again.
fixes_vroom.diff.txt

We'll wait for more info to debug the allowed list crash.

@Sensemakers-Nick
Copy link
Author

Thanks @atoppi!
Will apply the patch soon.

@lminiero lminiero added pr-exists and removed WIP labels Feb 1, 2021
lminiero added a commit that referenced this issue Feb 4, 2021
* Fixed missing room references that could cause crashes during race conditions
* Fixed rare race condition on publisher join
@Sensemakers-Nick
Copy link
Author

We haven't seen this issue since the last patch.
Thank you @atoppi !

BogdanovKirill pushed a commit to 3dEYE/janus-gateway that referenced this issue Mar 10, 2021
commit caaba91
Author: Tijmen de Mes <[email protected]>
Date:   Tue Feb 23 14:57:17 2021 +0100

    Added Content type to SIP message (meetecho#2567)

    * Added 'content_type' to received SIP MESSAGE
    * Added optional content type in sending SIP MESSAGE

commit c9baba9
Author: Alessandro Toppi <[email protected]>
Date:   Tue Feb 23 11:46:50 2021 +0100

    clang/ubsan fixes (meetecho#2556)

    * Fix some clang warnings.
    * Fix UBSanitizer error when sending RTCP SR.

commit beb28be
Author: Tvildo <[email protected]>
Date:   Mon Feb 22 09:46:25 2021 -0800

    add call_id in received sip message (meetecho#2563)

    Add call_id in received SIP MESSAGE and INFO

commit 8246452
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 22 11:51:36 2021 +0100

    Fixed missing mutexes around VideoRoom ACL management

commit 4f8943a
Author: Tristan Matthews <[email protected]>
Date:   Wed Feb 17 09:32:57 2021 -0500

    ice: fix conncheck typo (meetecho#2560)

    No functional change since the typo was used consistently.

commit 27dc51a
Author: nicolasduteil <[email protected]>
Date:   Wed Feb 17 15:27:45 2021 +0100

    feat: add "call_id" to "calling", "declining", "updatingcall" & "incomingcall" events (meetecho#2557)

commit 2c81d02
Author: Hritik Utekar <[email protected]>
Date:   Wed Feb 17 19:54:46 2021 +0530

    Video moderation always returns unmuted (meetecho#2559)

commit 6503f42
Author: Lorenzo Miniero <[email protected]>
Date:   Wed Feb 17 13:53:53 2021 +0100

    Fixed typo in keepalive-conncheck usage

commit 1f45e02
Author: Alessandro Toppi <[email protected]>
Date:   Mon Feb 15 16:38:22 2021 +0100

    Set specific versions for Python 3 and meson in janus-ci yml.

commit d7c9ef0
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 15 15:28:48 2021 +0100

    Added audiocodec/videocodec supporto to 'joinandconfigure' in VideoRoom API

commit ad54495
Author: Lorenzo Miniero <[email protected]>
Date:   Fri Feb 12 15:28:21 2021 +0100

    Add new option to configure ICE nomination mode, if libnice is recent enough (meetecho#2541)

    * Add new option to configure ICE nomination mode, if libnice is recent enough
    * Added support for libnice keepalive-conncheck property

commit af8cc6e
Author: Nadin Zajimovic <[email protected]>
Date:   Fri Feb 12 09:40:36 2021 +0100

    if inviting on destroy, send BYE instead of 480 response (meetecho#2554)

commit ad8bf79
Author: Alessandro Amirante <[email protected]>
Date:   Thu Feb 11 17:49:25 2021 +0100

    Fix typo in videoroom docs.

commit 26f5958
Author: Lorenzo Miniero <[email protected]>
Date:   Wed Feb 10 16:51:19 2021 +0100

    Fixed small leak in VideoRoom

commit 8ab7a00
Author: Alessandro Toppi <[email protected]>
Date:   Tue Feb 9 16:51:37 2021 +0100

    Initialize packet.is_rtp to false.

commit 66cf343
Author: Lorenzo Miniero <[email protected]>
Date:   Tue Feb 9 16:05:37 2021 +0100

    Add resolution and bitrate to Record&Play playback

commit 119d220
Author: Aleksander Guryanov <[email protected]>
Date:   Tue Feb 9 20:33:23 2021 +0700

    Update janus.d.ts (meetecho#2553)

    Function getBitrate() actually returns a string

commit 41399db
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 8 16:27:14 2021 +0100

    Allow up to 5 (rather than 3) audio/video codecs in the same VideoRoom

commit b81dd6d
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 8 16:26:19 2021 +0100

    Allow forcing audio/video codec for VideoRoom publishers via query string

commit 576abf5
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 8 15:17:56 2021 +0100

    Initialize VideoRoom participant recording state when room recording is active (fixes meetecho#2550)

commit 0ba74fb
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 8 11:53:42 2021 +0100

    Fixed broken AV1 post-processing

commit 09daec4
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 8 10:46:06 2021 +0100

    Renamed extern janus_callbacks variables in Lua and Duktape plugins (meetecho#2540)

commit 664022b
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 8 10:41:38 2021 +0100

    Bumped to version 0.11.1

commit 7732127
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Feb 8 10:37:41 2021 +0100

    Updated Changelog (0.10.10)

commit 24a0eec
Author: Lorenzo Miniero <[email protected]>
Date:   Thu Feb 4 11:53:36 2021 +0100

    Videoroom race condition fixes (see meetecho#2509) (meetecho#2539)

    * Fixed missing room references that could cause crashes during race conditions
    * Fixed rare race condition on publisher join

commit 62440c5
Author: Lorenzo Miniero <[email protected]>
Date:   Thu Feb 4 11:52:44 2021 +0100

    Fix parsing of SDP to find payload type matching profiles (fixes meetecho#2544) (meetecho#2549)

commit 794e89a
Author: Bender <[email protected]>
Date:   Wed Feb 3 20:12:02 2021 +0300

    janus.js (meetecho#2548)

    customizeSdp callback added to handleRemoteJsep to be able to mangle remote SDP if needed

commit 213b6c7
Author: Alessandro Toppi <[email protected]>
Date:   Fri Jan 29 12:13:44 2021 +0100

    Make compiler fail if implicit-function-declaration is encountered.

commit dfa8016
Author: Lorenzo Miniero <[email protected]>
Date:   Fri Jan 29 10:21:50 2021 +0100

    Fixed non-portable call to strlcpy, and comment styles, in RabbitMQ code (see meetecho#2430)

commit b7b1e9e
Merge: 19ecf48 c0f0e1e
Author: Alessandro Toppi <[email protected]>
Date:   Fri Jan 29 08:02:18 2021 +0100

    Merge pull request meetecho#2430 from vgrid/master

    Updates RabbitMQ logic

commit 19ecf48
Author: Lorenzo Miniero <[email protected]>
Date:   Thu Jan 28 11:54:55 2021 +0100

    Fixed VideoRoom docs on ICE Restarts for subscribers (fixes meetecho#2537)

commit 2454802
Author: Lorenzo Miniero <[email protected]>
Date:   Wed Jan 27 11:22:13 2021 +0100

    Allow marking of RTP extensions in MJR recordings (meetecho#2527)

commit 0bb49bc
Author: Lorenzo Miniero <[email protected]>
Date:   Wed Jan 27 11:21:09 2021 +0100

    Moderator based muting/unmuting of VideoRoom streams (meetecho#2513)

commit 5e685e3
Author: Lorenzo Miniero <[email protected]>
Date:   Wed Jan 27 11:19:35 2021 +0100

    Reject a=extmap-allow-mixed in SDP, when offered

commit c0f0e1e
Author: Chris Wiggins <[email protected]>
Date:   Wed Jan 27 09:59:02 2021 +1300

    Fix code style comments, also enable routing for direct exchanges

commit 257eb80
Author: Lorenzo Miniero <[email protected]>
Date:   Tue Jan 26 15:00:39 2021 +0100

    Configurable media direction when putting calls on-hold (SIP plugin) (meetecho#2525)

commit 7fb08c2
Author: Lorenzo Miniero <[email protected]>
Date:   Tue Jan 26 12:40:26 2021 +0100

    Added starting DTLS MTU to info returned by Janus API

commit 4d97028
Author: Sami Kuhmonen <[email protected]>
Date:   Tue Jan 26 12:53:14 2021 +0200

    Report fail if binding to a socket fails in websockets (meetecho#2534)

commit 674367a
Author: Evgeniy Baranov <[email protected]>
Date:   Mon Jan 25 12:00:37 2021 +0300

    fix race condition in audiobridge plugin changeroom request (meetecho#2535)

commit 3edb780
Author: Alberto Gonzalez Trastoy <[email protected]>
Date:   Sat Jan 23 14:01:27 2021 -0500

    Janus npm types upgrade (meetecho#2528)

commit 78434aa
Author: August Black <[email protected]>
Date:   Sat Jan 23 12:00:52 2021 -0700

    set webrtc-adapter verstion to 7.4.0 (meetecho#2531)

commit 46a6c71
Author: Lorenzo Miniero <[email protected]>
Date:   Thu Jan 21 13:02:06 2021 +0100

    Reduced verbosity of a few LOG_WARN messages at startup

commit 34f6f89
Author: Andrew Lavrentev <[email protected]>
Date:   Thu Jan 21 14:48:20 2021 +0300

    Feature/enhance typings (meetecho#2518)

commit 16173af
Author: Rémi Vansteelandt <[email protected]>
Date:   Thu Jan 21 05:39:17 2021 -0500

    Fixed secret authentication on GET requests (meetecho#2524)

commit 62d75ab
Author: Nadin Zajimovic <[email protected]>
Date:   Wed Jan 20 11:36:43 2021 +0100

    Dont send bye on early dialog (meetecho#2521)

commit 2141d9b
Author: Yurii Cherniavskyi <[email protected]>
Date:   Wed Jan 20 12:30:30 2021 +0200

    Update Webpack instruction after webrtc-adapter dependency update (meetecho#2519)

commit f994f7c
Author: fbellet <[email protected]>
Date:   Wed Jan 20 11:28:17 2021 +0100

    Close nice agent resources asynchronously (meetecho#2492)

commit 79038e0
Author: Sergey Radionov <[email protected]>
Date:   Tue Jan 19 18:16:03 2021 +0700

    mqttevh: tls support implementation finished (meetecho#2517)

    * mqttevh: tls support implementation finished
    * mqttevh: MQTTASYNC_OPERATION_INCOMPLETE is not error
    * mqttevh: allow send messages while connecting is still in progress

commit 97cd054
Author: Lorenzo Miniero <[email protected]>
Date:   Mon Jan 18 11:24:48 2021 +0100

    Fixed broken webrtc-adapter links (see meetecho#2515)

commit c0570a9
Author: Tristan Matthews <[email protected]>
Date:   Thu Jan 14 13:24:48 2021 -0500

    html: update webrtc-adapter to 7.7.0 (meetecho#2515)

commit f57215a
Author: Lorenzo Miniero <[email protected]>
Date:   Thu Jan 14 10:11:57 2021 +0100

    Updated year in demos and docs

commit bbdd3e4
Author: Chris Wiggins <[email protected]>
Date:   Tue Nov 17 11:49:42 2020 +1300

    Adds back in default outgoing queue behaviour. Adds support for auto-generated queue_names

commit ed1b5c6
Author: Chris Wiggins <[email protected]>
Date:   Thu Nov 12 13:21:08 2020 +1300

    Adds RabbitMQ options for queues, durable, exclusive and autodelete

commit 24594f7
Author: Chris Wiggins <[email protected]>
Date:   Wed Nov 11 18:05:24 2020 +1300

    Check RabbitMQ admin topic in a better way

commit 319c6fc
Author: Chris Wiggins <[email protected]>
Date:   Wed Nov 11 16:09:26 2020 +1300

    Increase RabbitMQ logging on publish

commit 505eeef
Author: Chris Wiggins <[email protected]>
Date:   Tue Nov 10 18:29:59 2020 +1300

    Fix queue_name_admin in rabbitmq transport

commit b3f7ad9
Author: Chris Wiggins <[email protected]>
Date:   Tue Nov 10 18:19:07 2020 +1300

    Update rabbitmq logging information

commit f604aeb
Author: Chris Wiggins <[email protected]>
Date:   Tue Nov 10 17:23:11 2020 +1300

    Updates RabbitMQ logic

    - Publishing to a topic does not require an outgoing queue, just the topic, so the outgoing queues are no longer declared
    - When the janus_exchange_type is topic, we want to be able to name the queue, and then bind an incoming topic from the exchange to that queue, so that functionality has been added
    - This is all backwards compatible with original logic, and won't break existing logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants