Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

janus crashed in janus_ice_nacked_packet_cleanup #2381

Closed
zhiyong0804 opened this issue Sep 29, 2020 · 21 comments
Closed

janus crashed in janus_ice_nacked_packet_cleanup #2381

zhiyong0804 opened this issue Sep 29, 2020 · 21 comments
Labels

Comments

@zhiyong0804
Copy link

janus crashed since ice handle freed, but the nack cleanup timer try to clean nack packets after 5 sec with wild pointer of ice handle

ice.c line 2718 : create nack packets clean timer, malloc janus_ice_nacked_packet packet, and reference ice handle, but under weak network, after ice handle freed 5 sec, the janus_ice_nacked_packet_cleanup function still use ice handle to free nack packe with wild pointer.

@zhiyong0804
Copy link
Author

we setup janus server in chain, and customer connect the server in Europe for test.

@atoppi
Copy link
Member

atoppi commented Sep 29, 2020

IIRC we fixed a similar (maybe the same) issue months ago.
Are you using a recent version of Janus?
If that's the case please provide a call stack from libasan or gdb.

@lminiero
Copy link
Member

lminiero commented Oct 5, 2020

Any update?

@zhiyong0804
Copy link
Author

zhiyong0804 commented Oct 9, 2020

IIRC we fixed a similar (maybe the same) issue months ago.
Are you using a recent version of Janus?
If that's the case please provide a call stack from libasan or gdb.

@atoppi, i used v0.9.3, but i checked the the code of recent version too, it's easy to find the PR if we review the code of ice.c line 2718 as i described above.
here is call stack:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by /usr/local/bin/janus -b --debug-level=7 --log-file=/var/log/janus.log --stun-se'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000561277a28f07 in janus_ice_nacked_packet_cleanup (user_data=0x7f59cc0e1400) at ice.c:327
327                     JANUS_LOG(LOG_HUGE, "[%"SCNu64"] Cleaning up NACKed packet %"SCNu16" (SSRC %"SCNu32", vindex %d)...\n",
[Current thread is 1 (Thread 0x7f59e4ff9700 (LWP 30399))]
(gdb) where
#0  0x0000561277a28f07 in janus_ice_nacked_packet_cleanup (user_data=0x7f59cc0e1400) at ice.c:327
#1  0x00007f59f5839d03 in ?? () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f59f5839285 in g_main_context_dispatch () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f59f5839650 in ?? () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00007f59f5839962 in g_main_loop_run () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x0000561277a2752e in janus_ice_static_event_loop_thread (data=0x5612795c47f0) at ice.c:128
#6  0x00007f59f5861195 in ?? () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#7  0x00007f59f3d326db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007f59f3a5b88f in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) l
322     } janus_ice_nacked_packet;
323     static gboolean janus_ice_nacked_packet_cleanup(gpointer user_data) {
324             janus_ice_nacked_packet *pkt = (janus_ice_nacked_packet *)user_data;
325
326             if(pkt->handle->stream){
327                     JANUS_LOG(LOG_HUGE, "[%"SCNu64"] Cleaning up NACKed packet %"SCNu16" (SSRC %"SCNu32", vindex %d)...\n",
328                             pkt->handle->handle_id, pkt->seq_number, pkt->handle->stream->video_ssrc_peer[pkt->vindex], pkt->vindex);
329                     g_hash_table_remove(pkt->handle->stream->rtx_nacked[pkt->vindex], GUINT_TO_POINTER(pkt->seq_number));
330             g_hash_table_remove(pkt->handle->stream->pending_nacked_cleanup, GUINT_TO_POINTER(pkt->source_id));
331             }
(gdb) p pkt->handle->stream
$1 = (janus_ice_stream *) 0xf21a7c08c6ef1ae3
(gdb) p *pkt->handle->stream
Cannot access memory at address 0xf21a7c08c6ef1ae3
(gdb) p *pkt->handle->stream->video_ssrc_peer
video_ssrc_peer           video_ssrc_peer_new       video_ssrc_peer_orig      video_ssrc_peer_rtx       video_ssrc_peer_rtx_new   video_ssrc_peer_rtx_orig  video_ssrc_peer_temp
(gdb) p *pkt->handle->stream->video_ssrc_peer[pkt->vindex]
Cannot access memory at address 0xf21a7c08c6ef1b0b```

@atoppi
Copy link
Member

atoppi commented Oct 9, 2020

I understand what can go wrong in the described scenario but we still need a trace from master or freshly tagged version.
As explained in the guidelines, we won't take in consideration crashes that happen on old versions.

@atoppi
Copy link
Member

atoppi commented Oct 9, 2020

Also don't forget to provide a libasan trace (that will catch any use after free before a SIGSEGV).

@zhiyong0804
Copy link
Author

What's libasan trace? do you mean coredump file with master version to reproduce it? : )

@lminiero
Copy link
Member

@zhiyong0804
Copy link
Author

Got it, i solve this issue on my own branch, i will consider provide libasan trace follow the guide when i have spare time, BTW we can review MASTER version of this project to check it and skip so much work on reproduce.

@lminiero
Copy link
Member

If you have a fix, please consider submitting a pull request.

@lminiero
Copy link
Member

@zhiyong0804 any update on this?

@zhiyong0804
Copy link
Author

sorry for late update.

@atoppi
Copy link
Member

atoppi commented Oct 16, 2020

@zhiyong0804 are you using static event_loops ?
I can not understand how the loop of the handle is still active after the handle has been freed.

@zhiyong0804
Copy link
Author

@zhiyong0804 are you using static event_loops ?
I can not understand how the loop of the handle is still active after the handle has been freed.

yes, i config event_loops=8, janus_ice_nacked_packet_cleanup function refrence ice handle, but donot increase ice handle ref, so when ice handle freed, when nack timer timeout and call janus_ice_nacked_packet_cleanup, it will core dump since wild pointer.
:)

@atoppi
Copy link
Member

atoppi commented Oct 19, 2020

yes, i config event_loops=8

Ok, this explains why a loop is still active after a handle detach, now it's starting to make sense: when using static event loops, the loops are never actually stopped.

@atoppi
Copy link
Member

atoppi commented Oct 19, 2020

@zhiyong0804 do you have some instructions on how to replicate the problem?
I tried many times with 2/4/8 event loops and severe packet loss (>10%), with both echotest and videoroom plugin but it never crashed.

Meanwhile, would you please also provide a libasan call stack from janus master ?

@zhiyong0804
Copy link
Author

@atoppi i got you said, we setup janus server in chain, and some develop guys connect Janus from Europe, and attach videoroom plugin to share desktop with my team member, and there is very low probability to reproduce it, i advise you disconnect janus when packet loss to reproduce it.

our Europe dev guys leave my team so i am hard to provide libasam call stack, maybe you can send test tool to me, and i try to connect your janus server to reproduce it via your test tool.

@atoppi
Copy link
Member

atoppi commented Oct 20, 2020

I don't think it will be of much help.
I have already tried many times with ridiculous packet loss and abrupt client disconnection, still no luck.

@atoppi atoppi changed the title janus crashed janus crashed in janus_ice_nacked_packet_cleanup Oct 21, 2020
@atoppi
Copy link
Member

atoppi commented Oct 21, 2020

@zhiyong0804 could you please enable refcount debugging by un-commenting #define REFCOUNT_DEBUG in in refcount.h and then recompiling Janus?
Once you have reproduced the issue, please share the whole log and also the address of the involved handle (read through gdb like you did in the first message).

@atoppi
Copy link
Member

atoppi commented Oct 28, 2020

@zhiyong0804 any update?

@atoppi
Copy link
Member

atoppi commented Nov 18, 2020

closing for lack of feedback, feel free to re-open if you have further details.

@atoppi atoppi closed this as completed Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants