Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICE crashes #481

Closed
YtelDrew opened this issue Mar 11, 2016 · 15 comments
Closed

ICE crashes #481

YtelDrew opened this issue Mar 11, 2016 · 15 comments

Comments

@YtelDrew
Copy link

I've updated from a few revisions ago, and now i'm experiencing this crash

[2556358.156303] ice send thread[21831]: segfault at 110 ip 0000000000000110 sp 00007fa76fffcbb8 error 14 in janus[400000+69000]

Thread 1 (Thread 0x7fa76ffff700 (LWP 21831)):
#0 0x0000000000000110 in ?? ()
#1 0x00007fa8947482ad in CRYPTO_ctr128_encrypt_ctr32 () from /usr/lib64/libcrypto.so.1.0.0
#2 0x00007fa894798390 in ?? () from /usr/lib64/libcrypto.so.1.0.0
#3 0x00007fa89479407b in EVP_EncryptUpdate () from /usr/lib64/libcrypto.so.1.0.0
#4 0x00007fa8939f58f2 in aes_icm_openssl_encrypt () from /usr/lib64/libsrtp.so.1
#5 0x00007fa8939f3378 in srtp_protect () from /usr/lib64/libsrtp.so.1
#6 0x00000000004226b6 in janus_ice_send_thread ()
#7 0x00007fa894f6cf85 in ?? () from /usr/lib64/libglib-2.0.so.0
#8 0x00007fa8937d93a4 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fa89351c18d in clone () from /lib64/libc.so.6

i'm going to wipe it out and try again tomorrow with debugging enabled to see if i can find the source of this crash.

thanks

@lminiero
Copy link
Member

Looks like a problem encrypting an SRTP packet. Not sure if this is happening because the context is not valid anymore (handle removed or PeerConnection closed) and yet a plugin is still pushing data, or if some data somewhere got corrupted. Compiling with AddressSanitizer support may provide some more info: https://janus.conf.meetecho.com/docs/debug

@jing3018
Copy link
Contributor

@lminiero , this is a MT issue, I also encountered the same problem。 In janus, it may be start two ice_send_thread 。fix with following code.
in ice.c::janus_ice_cb_component_state_changed

    if(state == NICE_COMPONENT_STATE_CONNECTED || state == NICE_COMPONENT_STATE_READY) {
        /* NOTICE: can't use handle->mutex in this function, janus_process_incoming_request --> janus_ice_trickle_parse locked it */
        if(!g_atomic_int_compare_and_exchange(&handle->send_thread_creating, 0, 1)) {
            return;
        }
        if(handle->send_thread == NULL) {
            /* Start the outgoing data thread */
            GError *error = NULL;
            handle->send_thread = g_thread_try_new("ice send thread", &janus_ice_send_thread, handle, &error);
            if(error != NULL) {
                /* FIXME We should clear some resources... */
                JANUS_LOG(LOG_ERR, "[%"SCNu64"] Got error %d (%s) trying to launch the ICE send thread...\n", handle->handle_id, error->code, error->message ? error->message : "??");
                janus_mutex_unlock(&handle->mutex);
            }
        }
        g_atomic_int_set(&handle->send_thread_creating, 0);
        return;
    }

@lminiero
Copy link
Member

Not sure what you want me to fix? g_atomic_int_compare_and_exchange ensures you're never going to start the thread twice: it's an atomic operation, and if the value is set it doesn't go on with creating the thread again.

@lminiero
Copy link
Member

Sorry, just noticed YOU added g_atomic_int_compare_and_exchange 😄
You're right, since it's missing it should be added (although I don't ever remember the callback being fired twice).

@jing3018
Copy link
Contributor

first time I use handle->mutex to lock it, but that will cause a dead lock. this callback will be called in ice event loop and nice_agent_set_remote_candidates .

@lminiero
Copy link
Member

But have you actually noticed the thread was being created twice, or was it a guess? Just wondering what may lead to this behaviour: it shouldn't be possible for the thread to start twice, as we have a check in place already (handle->send_thread == NULL), and that callback is invoked by the libnice stack which should be single threaded (there's a GLib loop feeding the events).

@jing3018
Copy link
Contributor

I find the answer from the log, 2077386936 is handle id 。

2016-03-14 11:07:51 [DEBUG] [2077386936] Component state changed for component 1 in stream 1: 3 (connected)
2016-03-14 11:07:51 [DEBUG] [2077386936] Component state changed for component 1 in stream 1: 3 (connected)
2016-03-14 11:07:51 [DEBUG] [2077386936] Component state changed for component 1 in stream 1: 4 (ready)
2016-03-14 11:07:51 [DEBUG] [2077386936] ICE send thread started...
2016-03-14 11:07:51 [DEBUG] [2077386936] ICE send thread started...

@RanceYtel
Copy link

I think we might be running into the same problem again. Getting another Segfault in Ice.

#0 0x00007f6950bcea0a in EVP_DigestFinal_ex () from /usr/lib64/libcrypto.so.1.0.0
#1 0x00007f6950bcebce in EVP_DigestFinal () from /usr/lib64/libcrypto.so.1.0.0
#2 0x00007f694fe31495 in hmac_compute () from /usr/lib64/libsrtp.so.1
#3 0x00007f694fe2e3db in srtp_protect () from /usr/lib64/libsrtp.so.1
#4 0x00000000004226b6 in janus_ice_send_thread ()
#5 0x00007f69513a7f85 in ?? () from /usr/lib64/libglib-2.0.so.0
#6 0x00007f694fc143a4 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f694f95718d in clone () from /lib64/libc.so.6

@lminiero
Copy link
Member

I'll integrate the suggestion @jing3018 made tomorrow, and we'll see if it fixes it for you.

@jing3018
Copy link
Contributor

@lminiero using libnice-0.1.13 janus_process_incoming_request will also call into this function (that mean this callback will invoked more than one thread)。 so I not use handle->mutex

#3  0x0000000000439820 in janus_ice_cb_component_state_changed (agent=0x607204954340 [NiceAgent], stream_id=1, component_id=1, state=3, ice=0x602a0020d180) at ice.c:1411
#4  0x00007f540de59dac in ffi_call_unix64 () at ../src/x86/unix64.S:76
#5  0x00007f540de596d5 in ffi_call (cif=cif@entry=0x7f53a45f59b0, fn=<optimized out>, rvalue=0x7f53a45f5920, avalue=avalue@entry=0x7f53a45f58a0) at ../src/x86/ffi64.c:522
#6  0x00007f5410690818 in g_cclosure_marshal_generic (closure=0x600e003548a0, return_gvalue=0x0, n_param_values=<optimized out>, param_values=<optimized out>, invocation_hint=<optimized out>, marshal_data=0x0) at gclosure.c:1445
#7  0x00007f541068ffd8 in g_closure_invoke (closure=0x600e003548a0, return_value=return_value@entry=0x0, n_param_values=4, param_values=param_values@entry=0x605201a0f800, invocation_hint=invocation_hint@entry=0x7f53a45f5b50) at gclosure.c:768
#8  0x00007f54106a20ad in signal_emit_unlocked_R (node=node@entry=0x60100000bda0, detail=detail@entry=0, instance=instance@entry=0x607204954340, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x605201a0f800) at gsignal.c:3551
#9  0x00007f54106a8fa1 in g_signal_emitv (instance_and_params=0x605201a0f800, signal_id=<optimized out>, detail=detail@entry=0, return_value=return_value@entry=0x0) at gsignal.c:3045
#10 0x00007f541105a9c7 in agent_unlock_and_emit (agent=agent@entry=0x607204954340 [NiceAgent]) at agent.c:225
#11 0x00007f541105c66e in nice_agent_set_remote_candidates (agent=0x607204954340 [NiceAgent], stream_id=<optimized out>, component_id=1, candidates=0x603200078d30 = {...}) at agent.c:3294
#12 0x00000000004b495e in janus_sdp_parse_candidate (stream=0x602000024740, candidate=0x601608a9c0aa "4237562298 1 udp 1686052607 42.120.74.98 36342 typ srflx raddr 30.10.117.90 rport 61061 generation 0", trickle=1) at sdp.c:519
#13 0x000000000042c562 in janus_ice_trickle_parse (handle=0x602a0020d180, candidate=0x600e0033cb70, error=0x7f53a45f6890) at ice.c:571
#14 0x000000000046e460 in janus_process_incoming_request (request=0x60080009b250) at janus.c:1309
#15 0x000000000047823c in janus_transport_task (data=0x60080009b250, user_data=0x0) at janus.c:2416
#16 0x00007f54103bd03c in g_thread_pool_thread_proxy (data=<optimized out>) at gthreadpool.c:307
#17 0x00007f54103bc6b5 in g_thread_proxy (data=0x6052016fb680) at gthread.c:764
#18 0x00007f54112ada98 in ?? () from /lib64/libasan.so.0
#19 0x00007f540e941df5 in start_thread (arg=0x7f53a45f7700) at pthread_create.c:308
#20 0x00007f540e66f1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

@lminiero
Copy link
Member

The janus_process_incoming_request you see is related to a trickle candidate being received, parsed, and then passed to the stack. What I thought was that when you passed new candidates to libnice, this would then be passed to the loop thread of libnice itself: apparently, instead, the signal can be triggered by the calling thread, which surprises me as from an application point of view this might result in locks, exactly as you pointed out (what if you were locking when setting the candidates, and then attempting to lock again in the callback?).

Knowing that this can happen, I'll work on adding the fix you suggested, so that we're really sure the thread is only created once.

@lminiero
Copy link
Member

@YtelDrew @RanceYtel @jing3018 just added the fix, let me know if this solves it for you and I'll close the issue.

@YtelDrew
Copy link
Author

i pulled and updated the latest, its currenting running now. i'll keep you posted on how it goes.

@YtelDrew
Copy link
Author

no crash on the server that was crashy pre-fix so it appears to solve the problem.

@lminiero
Copy link
Member

Cool, thanks for the feedback! Kudos to @jing3018 for finding a fix so quickly.
Closing as it seems to be fixed, feel free to reopen in case this appears again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants