Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Janus Crash (SIGSEV) on latest master branch #688

Closed
BellesoftConsulting opened this issue Nov 23, 2016 · 16 comments
Closed

Janus Crash (SIGSEV) on latest master branch #688

BellesoftConsulting opened this issue Nov 23, 2016 · 16 comments

Comments

@BellesoftConsulting
Copy link

I am using latest Janus ( commit df5b546) on Debian Jessie 64 bit.

The crash happens while using the videoroom plugin, with about 200 watchers, and a few presenters.

Core was generated by `/opt/janus/bin/janus -o'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 janus_videoroom_handler (data=0x0) at plugins/janus_videoroom.c:2393
2393 if(session->destroyed) {
(gdb) bt
#0 janus_videoroom_handler (data=0x0) at plugins/janus_videoroom.c:2393
#1 0x00007f88d7cc5845 in g_thread_proxy (data=0x2051140) at /build/glib2.0-y6934K/glib2.0-2.42.1/./glib/gthread.c:764
#2 0x00007f88d6a4f0a4 in start_thread (arg=0x7f88d8899700) at pthread_create.c:309
#3 0x00007f88d678462d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) frame
#0 janus_videoroom_handler (data=0x0) at plugins/janus_videoroom.c:2393
2393 if(session->destroyed) {
(gdb) info locals
request = 0x0
msg_sdp_type = 0x0
msg_sdp = 0x7f86b32666e0 "P?%?\206\177"
session = 0xa1679af2d1c2a787
FUNCTION = "janus_videoroom_handler"
msg = 0x7f86cd0c1870
error_code = -674938240
error_cause = "No such feed (6059871534061037)", '\000' <repeats 480 times>

Before the crash here part of the log
Here is the log right before the crash:
http://pastebin.com/QRX0y2c1

before that, I see hundreds of
[ERR] [plugins/janus_videoroom.c:janus_videoroom_handler:2618] No such feed (6059871534061037)
[ERR] [plugins/janus_videoroom.c:janus_videoroom_handler:2618] No such feed (6059871534061037)
[ERR] [plugins/janus_videoroom.c
....

@BellesoftConsulting
Copy link
Author

BellesoftConsulting commented Nov 25, 2016

I am building janus like this:
#!/bin/sh
set -eu

JANUS_REV=master
CONTAINER=false # if true, delete more for smaller image

JANUS_REPO=meetecho/janus-gateway
PREFIX=/opt/janus
SRCDIR=$HOME/src

SU_CMD=sudo
if $CONTAINER; then
SU_CMD=""
fi

OPENSSL_REV="1.0.2j"
LIBSRTP_REV="1.5.4"
LIBNICE_REV="0.1.13"
SOFIASIP_REV="1.12.11"
LIBWEBSOCKETS_REV="1.5-chrome47-firefox41"

JANUS_FEATURE_OPTS="
--disable-rabbitmq --disable-data-channels
--disable-plugin-voicemail --disable-plugin-sip
--disable-plugin-videocall --disable-websockets
--disable-plugin-recordplay --disable-turn-rest-api --disable-mqtt
"

APT="$SU_CMD apt-get --quiet --assume-yes"
export DEBIAN_FRONTEND=noninteractive

if $CONTAINER; then
$APT update
fi

$APT install --no-install-recommends
build-essential
python
autoconf
automake
libtool
pkg-config
gengetopt
cmake
ca-certificates
wget
git
zlib1g-dev
libmicrohttpd-dev
libmicrohttpd-dbg
libjansson-dev
libglib2.0-dev
libglib2.0-0-dbg \

mkdir -p $SRCDIR
cd $SRCDIR
$SU_CMD mkdir -p $PREFIX
export PKG_CONFIG_PATH="$PREFIX/lib/pkgconfig"
export CFLAGS="-I$PREFIX/include"
export LDFLAGS="-L$PREFIX/lib"

OPENSSL_TARBALL="openssl-${OPENSSL_REV}.tar.gz"
wget --no-verbose https://www.openssl.org/source/$OPENSSL_TARBALL
tar -xf $OPENSSL_TARBALL
cd openssl-$OPENSSL_REV

./config --prefix=$PREFIX --openssldir=$PREFIX/openssl -DPURIFY shared
make
$SU_CMD make install_sw
cd $SRCDIR
rm -r $OPENSSL_TARBALL openssl-$OPENSSL_REV

LIBSRTP_TARBALL="libsrtp-${LIBSRTP_REV}.tar.gz"
wget --no-verbose -O $LIBSRTP_TARBALL
https://github.com/cisco/libsrtp/archive/v${LIBSRTP_REV}.tar.gz
tar -xf $LIBSRTP_TARBALL
cd libsrtp-$LIBSRTP_REV
./configure --prefix=$PREFIX --enable-openssl
make shared_library
$SU_CMD make install
cd $SRCDIR
rm -r $LIBSRTP_TARBALL libsrtp-$LIBSRTP_REV

LIBNICE_TARBALL="libnice-${LIBNICE_REV}.tar.gz"
wget --no-verbose http://nice.freedesktop.org/releases/${LIBNICE_TARBALL}
tar -xf $LIBNICE_TARBALL
cd libnice-$LIBNICE_REV
./configure --prefix=$PREFIX --disable-gupnp
make

$SU_CMD make -C nice install-exec
$SU_CMD make -C nice install-data
$SU_CMD make -C stun install-data
$SU_CMD make -C agent install-data
cd $SRCDIR
if $CONTAINER; then
rm -r $LIBNICE_TARBALL libnice-$LIBNICE_REV
fi

rm -Rf janus-gateway
git clone https://github.com/${JANUS_REPO}.git
cd janus-gateway
git checkout $JANUS_REV
./autogen.sh
./configure --prefix=$PREFIX --disable-docs $JANUS_FEATURE_OPTS
LDFLAGS="-L$PREFIX/lib -Wl,-rpath=$PREFIX/lib"
CFLAGS="-I$PREFIX/include -O2 -g"
LDFLAGS="-L/opt/janus/lib -Wl,-rpath=/opt/janus/lib" CFLAGS="-I/opt/janus/include -O2 -g"
make
$SU_CMD make install

@BellesoftConsulting
Copy link
Author

BellesoftConsulting commented Nov 25, 2016

Not sure about the implications, but could we move
if(!session) {
JANUS_LOG(LOG_ERR, "No session associated with this handle...\n");
janus_videoroom_message_free(msg);
continue;
}
if(session->destroyed) {
janus_videoroom_message_free(msg);
continue;
}

at line 2393 in janus_videoroom.c

to be inside the protected janus_mutex_lock(&sessions_mutex); block above?

@lminiero
Copy link
Member

That wouldn't solve anything, as if it's race conditions that are causing this, they may happen slightly later and affect other usages. You may want to also give #403 a try as it tries to address issues like those, although it might not be suitable for production environments yet.

@BellesoftConsulting
Copy link
Author

BellesoftConsulting commented Nov 25, 2016

OK, thanks for the reply.
So if Understand it properly, even tho we get the session object from the table inside the mutex block, the addresses where it points is already corrupt?
Or is the object at the time we get it from the table valid, and gets dealloced after?

The reason I ask, is that for my application and use case,2 separate servers, at separate times, crashed on the same line (2393) if (session->destroyed)

So I might want to move the code to be inside the mutex, no matter what, as if the session is "destroyed" the loop continues and the session ofject is NOT used...

Thoughts?

@lminiero
Copy link
Member

If memory gets corrupted, it's not relevant if it happens before or after you get the object from the table: it means the pointer points to garbage for some reason, which should not happen. The fact they crash on the same line simply means the same object may be getting corrupted, possibly because some property of the object is invalidated in the same part of the code.

Compiling with libasan support may give you better info on what's happening, e.g., if a double free is happening or if other things are causing weird behaviours.

@BellesoftConsulting
Copy link
Author

Got it crash with AddressSAnitizer.
Using the libnice patch to remove the global lock.

http://pastebin.com/HdYXC112

@sailerinteractive
Copy link

sailerinteractive commented Feb 4, 2017

Im experiencing the same issue. Regular crashs in the videoroom plugin at the mentioned code position. Here is my gdb output: http://pastebin.com/rrnz6aCN
Im using the 0.2.1 release from Github on an Ubunutu 16.04 system.

The likelihood of a crash seems to be related to the load. Janus runs fine for hours at lower loads while it crashes about once per hour at higher loads.

Do i understand the issue correctly, that the session is freed by the watchdog code while a message for that session is being processed in janus_videoroom_handler?

@mirkobrankovic
Copy link
Contributor

Have you tried to upgrade libsrtp to 2.0.0+ or maybe use websockets?

@sailerinteractive
Copy link

Im already using websockets but also tried switching to long poll. Makes no difference... Do you think libsrtp 2.0.0 + would make a difference? Im currently using 1.5.4 compiled from source.

@mirkobrankovic
Copy link
Contributor

I would give ot a try yes

@sailerinteractive
Copy link

Just a follow up... i experimented a bit with the settings of the HTTP transport. Setting the thread number to 8 instead of the default "unlimited" seems to improve the stability of Janus in regard of the mentioned SIGSEV. Just a workaround i guess but maybe it helps to spot the real cause.

@bhakimi
Copy link

bhakimi commented Feb 11, 2017

@sailerinteractive i dont see any options in the http transport to set threads, where did you find this setting?

@lminiero
Copy link
Member

@sailerinteractive a thread number of 8 for HTTP will limit the number of concurrent user sessions to about 6-7, as most of the times you'll have one long poll per user keeping one connection (and so thread) busy for a long time, preventing others from injecting other requests. A limit is reasonable, but you have to adapt it to how many users you want to be able to serve at the same time.

@sailerinteractive
Copy link

@lminiero thanks for the input and your work on Janus! Im using websockets for all user sessions. The HTTP transport is only in use for some server side admin requests (such as creating video rooms). Just wanted to share my experiences that this seems to limit the number of crashes at this code position.

@lminiero
Copy link
Member

Have you guys checked the reference counter branch too to see if if's fixed there?

@lminiero
Copy link
Member

Closing as too much has changed in the code. If still an issue, feel free to open a new one with up-to-date details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants