Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket connections not working anymore for >0.14.1 #3855

Closed
1 task done
smhex opened this issue Mar 25, 2024 · 21 comments
Closed
1 task done

Websocket connections not working anymore for >0.14.1 #3855

smhex opened this issue Mar 25, 2024 · 21 comments
Labels
bug fixed in source This issue is unsolved in the latest release but fixed in master

Comments

@smhex
Copy link

smhex commented Mar 25, 2024

What happened?

I am using a third-party library (wled-client) to access my WLED using websockets. After upgrading my WLED to any version newer than 0.14.1 the library stopped working. I replaced it with a native websocket implementation and it didn't work either. I always get ECONNRESET when trying to open the socket. After downgrading to 0.14.1. everything works again. The web interface is accessible without any issues. Http API is also working as expected.

To Reproduce Bug

Use any websocket client and connect to ws:///ws. It should return immediately the wled state upon connect. In my case it returns nothing.

Expected Behavior

Upon successful connection the state object should be returned.

Install Method

Binary from WLED.me

What version of WLED?

0.14.2

Which microcontroller/board are you seeing the problem on?

ESP32

Relevant log/trace output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@smhex smhex added the bug label Mar 25, 2024
@softhack007 softhack007 added the needs investigation The bug has not yet been reproduced by me. Analysis or more details are needed. label Mar 26, 2024
@Jedden19
Copy link

Same with ESP8266

@blazoncek
Copy link
Collaborator

Please post your WS connection details and @willmmiles may have solution.

@willmmiles
Copy link
Contributor

willmmiles commented Mar 29, 2024

Replicated on 0.14.2, works on 0.15. Investigating. I'm using websocat.

@willmmiles
Copy link
Contributor

Seems to work on 0.14.3 as well. I suspect this may have been fixed by the websocket memory management fixes in the newer version of AsyncWebServer.

@blazoncek blazoncek added the fixed in source This issue is unsolved in the latest release but fixed in master label Mar 30, 2024
@smhex
Copy link
Author

smhex commented Mar 30, 2024

In the meantime I am able to catch a serial dump, however my WLED version does not include debug symbols. So nothing here to see :-(. I tested with 0.15.0-b1 and fails again. The connection request triggers a reboot of my device. If someone can provide a WLED_0.15.0-b1_ESP32.bin or any other affected version with debug symbols enabled I am happy to test.

How do I get 0.14.3?

@blazoncek
Copy link
Collaborator

Both versions available on Discord.

@smhex
Copy link
Author

smhex commented Mar 30, 2024

Thx, will download and continue testing...

@smhex
Copy link
Author

smhex commented Mar 30, 2024

Okay, here is the serial dump from a 0.14.3 debug build. As soon as my client connects, a reboot is triggered. I did this several times (see attached log)

putty.log

Is there anything more I can test/provide?

@willmmiles
Copy link
Contributor

Can you please send a backup of your cfg and presets? Then erase the flash storage (pio run -t erase -e <your_envname>), and reload your firmware and cfg from scratch.

I had a case yesterday where the filesystem seemed to be corrupted, and reading presets.json was causing crashes and other weird behaviour. Rebuilding the filesystem seems to fix it. I'm not sure what caused the corruption yet - I go back and forth between versions a lot.

@smhex
Copy link
Author

smhex commented Mar 30, 2024

Here are the requested files. I changed the servers/user credentials. Hope, that this is not the deciding factor. At the moment I do not have PlatformIO installed. Will a factory reset using der web interface also to rebuilding the file system?

Wled_setup.zip

Thanks for your help!

@blazoncek
Copy link
Collaborator

blazoncek commented Mar 30, 2024

Is there anything more I can test/provide?

Please add this to your environment and monitor from within PIO.
monitor_filters = esp32_exception_decoder
It will tell us where it crashes.

It looks to me as it does not crash within WLED procedures though.

@smhex
Copy link
Author

smhex commented Mar 30, 2024

Okay, please give me some time to setup everything. It has been a long time since I have been working with PlatformIO 😅.

@theapache64
Copy link

happening to me as well. downgrading to 0.13.1 fixes the issue.

@smhex
Copy link
Author

smhex commented Mar 31, 2024

I got the toolchain working :-). Here is the output when connecting via websocket to a freshly uploaded esp32 image.

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DOUT, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1084
load:0x40078000,len:11220
load:0x40080400,len:5360
entry 0x4008067c
Ada
CORRUPT HEAP: Bad head at 0x3ffdd26c. Expected 0xabba1234 got 0x3ffdd558
abort() was called at PC 0x4008eb39 on core 1

ELF file SHA256: 0000000000000000

Backtrace: 0x40089af8:0x3ffd8c40 0x40089e55:0x3ffd8c60 0x4008eb39:0x3ffd8c80 0x4008543a:0x3ffd8ca0 0x40085805:0x3ffd8cc0 0x4000bec7:0x3ffd8ce0 0x401ac671:0x3ffd8d00 0x40128839:0x3ffd8d20 0x401288e0:0x3ffd8d40 0x4012899a:0x3ffd8d90 0x40128aa9:0x3ffd8de0 0x40128d0d:0x3ffd8e20 0x4011eb2d:0x3ffd8e40 0x4011ebc1:0x3ffd8e80 0x4011f1f2:0x3ffd8ea0 0x4008b89e:0x3ffd8ed0
  #0  0x40089af8:0x3ffd8c40 in invoke_abort at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:648
  #1  0x40089e55:0x3ffd8c60 in abort at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:648
  #2  0x4008eb39:0x3ffd8c80 in multi_heap_free at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:321
  #3  0x4008543a:0x3ffd8ca0 in heap_caps_free at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:232
  #4  0x40085805:0x3ffd8cc0 in _free_r at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/newlib/syscalls.c:42
  #5  0x4000bec7:0x3ffd8ce0 in ?? ??:0
  #6  0x401ac671:0x3ffd8d00 in operator delete(void*) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/del_op.cc:46
  #7  0x40128839:0x3ffd8d20 in LinkedList<AsyncWebHeader, LinkedListNode>::_remove(LinkedListNode<AsyncWebHeader>*, LinkedListNode<AsyncWebHeader>*) at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015   
  #8  0x401288e0:0x3ffd8d40 in LinkedList<AsyncWebHeader, LinkedListNode>::remove(LinkedList<AsyncWebHeader, LinkedListNode>::Iterator const&, LinkedList<AsyncWebHeader, LinkedListNode>::Iterator const&) at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015
      (inlined by) AsyncWebServerRequest::_removeNotInterestingHeaders() at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:197
  #9  0x4012899a:0x3ffd8d90 in AsyncWebServerRequest::_parseLine() at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015
  #10 0x40128aa9:0x3ffd8de0 in AsyncWebServerRequest::_onData(void*, unsigned int) at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015
  #11 0x40128d0d:0x3ffd8e20 in std::_Function_handler<void (void*, AsyncClient*, void*, unsigned int), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*, void*, unsigned int)#5}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&, std::_Any_data const&, unsigned int&&) at .pio\libdeps\esp32dev\ESPAsyncWebServerWLED\src/WebRequest.cpp:1015
      (inlined by) _M_invoke at c:\users\thomas\.platformio\packages\toolchain-xtensa32\xtensa-esp32-elf\include\c++\5.2.0/functional:1871
  #12 0x4011eb2d:0x3ffd8e40 in std::function<void (void*, AsyncClient*, void*, unsigned int)>::operator()(void*, AsyncClient*, void*, unsigned int) const at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:1153
      (inlined by) AsyncClient::_recv(tcp_pcb*, pbuf*, signed char) at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:968
  #13 0x4011ebc1:0x3ffd8e80 in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, signed char) at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:1153
  #14 0x4011f1f2:0x3ffd8ea0 in _async_service_task(void*) at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:1153
      (inlined by) _async_service_task at .pio\libdeps\esp32dev\AsyncTCP@src-39de97abf7348c44d4dda815b8aab0ae\src/AsyncTCP.cpp:201
  #15 0x4008b89e:0x3ffd8ed0 in vPortTaskWrapper at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)

Rebooting...
ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DOUT, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1084
load:0x40078000,len:11220
load:0x40080400,len:5360
entry 0x4008067c
Ada

My WLED has no configuration except an IP adress for my Wifi. No presets or any other changes were made. If you need more information just tell. The reboot is triggered by a simple node app trying to connect...

@blazoncek
Copy link
Collaborator

As I suspected, the error is not in WLED code but, unfortunately, in AsyncWebServer library.
Hopefully @willmmiles will find what's wrong.

Can you capture a packet that causes the crash? Using Wireshark or similar.

@willmmiles
Copy link
Contributor

Or alternately, post a link to your node app. There's something different about the headers from the usual browser connections.

@smhex
Copy link
Author

smhex commented Apr 1, 2024

Here are both: the very basic node app and the Wireshark recordings...

Wireshark.zip
wled-ws-test.zip

Please change your WLED's IP in index.ts and run

npm install
npx tsc
node index

If everything works, WLED's current state should be logged to the console.

Happy easter

@willmmiles
Copy link
Contributor

Thanks for bearing with me! I was able to replicate it using your code and config, and tracked this to a use-after-free in AsyncWebServer. It was indeed triggered by the headers sent by node, and wouldn't replicate unless some other code caught it with its pants down, so to speak.

I've pushed AsyncWebServer v2.2.1 which has the fix, and opened PR #3873 to adopt it.

@smhex
Copy link
Author

smhex commented Apr 1, 2024

Perfect, I will re-run my tests after the PR merge.

@blazoncek
Copy link
Collaborator

Perfect, I will re-run my tests after the PR merge.

You can test it immediately by temporarily changing your copy of platformio.ini

@smhex
Copy link
Author

smhex commented Apr 1, 2024

You're right. I quickly rebuilt the image with version 2.2.1 of AsyncWebserver, restored my presets and other settings and .... everything works again 👍

I enabled my Homebridge plugin again and as far as I could see, communication worked as it should be and as it was last seen in WLED 0.14.1. I will keep an eye on it for some of days...

Thank you very much @willmmiles , @blazoncek !

@blazoncek blazoncek removed the needs investigation The bug has not yet been reproduced by me. Analysis or more details are needed. label Apr 2, 2024
@smhex smhex closed this as completed May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fixed in source This issue is unsolved in the latest release but fixed in master
Projects
None yet
Development

No branches or pull requests

6 participants