Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSDP crash after receiving IGMP message from Access Point #1826

Closed
ghost opened this issue Mar 28, 2016 · 21 comments
Closed

SSDP crash after receiving IGMP message from Access Point #1826

ghost opened this issue Mar 28, 2016 · 21 comments

Comments

@ghost
Copy link

ghost commented Mar 28, 2016

Basic Infos

Hardware

Hardware: ESP-12E
Core Version: 2.1.0

Description

Recently added a new access point and now SSDP enabled ESP units started to crash.
Also using the sample SSDP sketch crashed at the same event, receiving a IGMP membership query from this new access point.

Settings in IDE

Module: NodeMCU 1.0
Flash Size: 4MB
CPU Frequency: 80Mhz
Flash Mode: dio
Flash Frequency: 40Mhz
Upload Using: OTA
Reset Method: nodemcu

Sketch

Provided SSDP sample sketch

Debug Messages

Exception (0):
epc1=0x40106cf6 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000
ctx: sys
sp: 3ffffd80 end: 3fffffb0 offset: 01a0

>>>stack>>>
3fffff20:  4021aca3 005e0001 402107c4 3ffed250
3fffff30:  3fff21a4 00000001 4021ace2 402107dd
3fffff40:  4021aaf1 3fff21cc 3ffed250 3ffebff0
3fffff50:  3ffe0000 3fff21a4 3ffeed00 4021b4a4
3fffff60:  3fff21cc 3fff1b8c 3ffec018 3ffed250
3fffff70:  3fff1b8c 00000014 4021a7ee 3fff21cc
3fffff80:  3fff1b8c 3fffdc80 3fff1bf4 3fff0610
3fffff90:  402261a7 3fff21cc 00000000 402062ff
3fffffa0:  40000f49 3fffdab0 3fffdab0 40000f49
<<<stack<<<


Decoding 12 results
0x40106cf6: __umodsi3 at d:\ivan\projects\arduinoesp\toolchain\dl\gcc-xtensa\build-2\xtensa-lx106-elf\libgcc/../../../libgcc/config/xtensa/lib1funcs.S line 696
0x4021aca3: igmp_tmr at ?? line ?
0x402107c4: ieee80211_deliver_data at ?? line ?
0x4021ace2: igmp_tmr at ?? line ?
0x402107dd: ieee80211_deliver_data at ?? line ?
0x4021aaf1: igmp_input at ?? line ?
0x4021b4a4: ip_input at ?? line ?
0x4021a7ee: ethernet_input at ?? line ?
0x402261a7: ets_snprintf at ?? line ?
0x402062ff: loop_task at C:\Tools\arduino-1.6.8P210\portable\packages\esp8266\hardware\esp8266\2.1.0\cores\esp8266/core_esp8266_main.cpp line 43

igmp

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@igrr
Copy link
Member

igrr commented Mar 28, 2016

That's super weird, because __umodsi3 function doesn't dereference any pointers, it only operates on values. So how it would fail with a null pointer access, i have no idea.
Did i get this correctly that you need to use some special WiFi router to reproduce this? What is the router brand/model, if I may ask?

@ghost
Copy link
Author

ghost commented Mar 28, 2016

Well it's one of those el cheapo Wifi repeaters sold under many brandnames. Just a reference:
http://www.satechi.net/index.php/wireless-multifunction-mini-router-repeater-access-point-client-bridge
wifirepeater

I've done a Wireshark scan on IGMP traffic and it seems that this device is the only one sending periodic IGMP V2 Membership Queries.
igmptraffic

@mangelajo
Copy link
Contributor

umodsi, couldn't that be because of a division by 0 ?

@liquidfalcon
Copy link

It is absolutely a division by zero, the actual line it blows up on is a deliberate ill instruction if umodsi is called with zero. @mangelajo. This issue has been around for a while, and is the same as #1505, #1050, and #1262. I don't believe it's down to any particular router, as I've always been able to reproduce it on multiple ones, given the same traffic as above. It seems to happen with just enabling the mDNS library, keeping SSDP / DNS-SD off doesn't affect it.

@igrr
Copy link
Member

igrr commented Apr 15, 2016

Latest git version has a new entry in boards menu: "Core development module", which allows one to use lwIP built from source with debugging symbols enables. Perhaps you could try that one to get a better stack trace and narrow down the issue?
I checked mDNS and SSDP on a few routers i had available, but wasn't able to reproduce this, unfortunately.

@liquidfalcon
Copy link

So, after building the lwIP version from source, the errors have changed quite a bit - It's dying on pbuf_free now, which is curious. Relatively the same time-to-die, between 30 seconds and a minute after enabling mDNS. Not entirely sure why it changed.

Decoding 18 results
0x40240104: pbuf_free at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/pbuf.c:723
0x40000f68: ?? ??:0
0x40000f58: ?? ??:0
0x4023ff60: sys_check_timeouts at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/timers.c:384
0x4023ea18: tcp_pcb_purge at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/tcp.c:1423
0x4023ecf0: tcp_slowtmr at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/tcp.c:992
0x4023ee90: tcp_tmr at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/tcp.c:127
0x4023fe30: tcpip_tcp_timer at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/timers.c:87
0x4010598a: wdt_feed at ??:?
0x4023ffd5: sys_check_timeouts at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/timers.c:420
0x4023fe28: tcpip_tcp_timer at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/timers.c:81
0x40232b64: ets_timer_handler_isr at ??:?
0x40232b71: ets_timer_handler_isr at ??:?
0x40232bb6: ets_timer_handler_isr at ??:?
0x402192eb: loop_task at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp:43
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0

@igrr
Copy link
Member

igrr commented Apr 15, 2016

What does the exception line look like now?

@liquidfalcon
Copy link

liquidfalcon commented Apr 15, 2016

Ah, that'd probably be helpful, whoops.

Exception (28):
epc1=0x40240104 epc2=0x00000000 epc3=0x00000000 excvaddr=0x0c477c5a depc=0x00000000

ctx: sys
sp: 3ffffd10 end: 3fffffb0 offset: 01a0

@igrr
Copy link
Member

igrr commented Apr 15, 2016

That likely indicates heap corruption (use after free, or double free). Maybe the workaround for tcp_abort I made for xcc-built LWIP breaks something when using gcc-built one.

@igrr
Copy link
Member

igrr commented Apr 15, 2016

@liquidfalcon do you have any TCP connections happening in your sketch, or just mDNS?

@liquidfalcon
Copy link

liquidfalcon commented Apr 15, 2016

I do, yeah - Every second I attempt to send a couple K's of data over SSL, with mDNS running in the background, and DNS-SD requests every 60 seconds. Disabling the SSL routines makes it run for a longer period of time, but, this time when it crashes, it looks like it gives us the real error message from before:

Exception (0):
epc1=0x40106c96 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: sys 
sp: 3ffffd20 end: 3fffffb0 offset: 01a0


Decoding 13 results
0x40106c96: __modsi3 at /home/igrokhotkov/xtensa/crosstool-NG/.build/src/gcc-4.8.2/libgcc/config/xtensa/lib1funcs.S:759
0x40242fd5: igmp_start_timer at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/ipv4/igmp.c:707
 (inlined by) igmp_delaying_member at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/ipv4/igmp.c:726
0x402431ea: igmp_input at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/ipv4/igmp.c:458
0x401004d8: malloc at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.c:1662
0x40107508: pvPortMalloc at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/cores/esp8266/heap.c:13
0x40230000: ets_timer_handler_isr at ??:?
0x40243704: ip_input at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/core/ipv4/ip.c:570
0x4024297d: ethernet_input at /home/danielm/arduino-esp8266/hardware/esp8266com/esp8266/tools/sdk/lwip/src/netif/etharp.c:1379
0x4021bc13: pp_tx_idle_timeout at ??:?
0x4021bb86: pp_tx_idle_timeout at ??:?
0x4022fc27: ets_snprintf at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0

Now, this would suggest that something is calling igmp_start_timer with a max_time of 0 or 1, which would explain the crash with modulus, as this line is the last call in that function:

 group->timer = (LWIP_RAND() % (max_time - 1)) + 1;

Would it be worth adding a simple conditional to that statement to not subtract one if it's equal to one? Or check if max_time is actually valid, first?

EDIT: Yes, I think that would work. Adding that makes it keep working well past where it used to crash, when max_time is set to 1.

@igrr
Copy link
Member

igrr commented Apr 15, 2016

@liquidfalcon
Copy link

Aha, someone beat us to it. Cheers @igrr. Also, would you happen to have the commit hash you mentioned earlier for tcp_abort? If I can figure out the other crash, I'd be a happy camper with working mDNS again.

@igrr
Copy link
Member

igrr commented Apr 17, 2016

@liquidfalcon efc8dda

@igrr igrr modified the milestones: 2.2.0, 2.3.0 Apr 17, 2016
igrr added a commit that referenced this issue May 6, 2016
@igrr
Copy link
Member

igrr commented May 7, 2016

Should be fixed in git version now.

@CombiesGit
Copy link

CombiesGit commented May 7, 2016

#1505

No Connection with Repeater.

@bsz0206
Copy link

bsz0206 commented Dec 21, 2017

Not sure if this is still an issue. I created an Ethernet frame that reproduces the issues.

You can try to send this frame using http://ostinato.org. One frame is enough for a crash. No need to update any MAC/IP of the frame.

0000 01 00 5e 00 00 01 4c 8b 30 c4 b6 18 08 00 45 00
0010 00 1c 00 00 40 00 01 02 d6 45 c0 a8 02 f1 e0 00
0020 00 01 11 01 ee fe 00 00 00 00

@devyte
Copy link
Collaborator

devyte commented Dec 21, 2017

@bsz0206 does the crash happen with latest git and lwip2?

@bsz0206
Copy link

bsz0206 commented Dec 21, 2017

Sorry, I don't have a free ESP8266 to play and test.

@devyte
Copy link
Collaborator

devyte commented Dec 21, 2017

@bsz0206 ok, I'll rephrase the question: your app that crashes with that ethernet frame, which core version was it built with?

bsz0206 added a commit to bsz0206/OpenGarage-Firmware that referenced this issue Dec 24, 2017
Just recompiled using:
   - Arduino 1.6.13
   - esp8266-2.3.0.zip
   - Blynk-0.5.0
No source changes other than version increase to 1.0.7 at defines.h
The issue is described at esp8266/Arduino#1826
@bsz0206
Copy link

bsz0206 commented Jan 5, 2018

@devyte Sorry for the delay answering. We had to wait for a rebuild and the original developers answered. The buggy firmware was based on 2.2 core. They rebuild using 2.3 and now everything looks good. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants