Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High load websockets causing crashes + SOLUTION #1334

Open
hoeken opened this issue Aug 8, 2023 · 14 comments
Open

High load websockets causing crashes + SOLUTION #1334

hoeken opened this issue Aug 8, 2023 · 14 comments

Comments

@hoeken
Copy link

hoeken commented Aug 8, 2023

I'm writing a server that handles websocket connections using ESPAsyncWebserver and ArduinoJSON. As part of that, I was testing to see how many messages it could handle and I was just generally having things crash a lot. Often, I would get a message like this:

++++ Error when running speed test and sending ping messages.  Looks like async_tcp just got too busy.

E (19871) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (19871) task_wdt:  - async_tcp (CPU 0/1)
E (19871) task_wdt: Tasks currently running:
E (19871) task_wdt: CPU 0: IDLE
E (19871) task_wdt: CPU 1: IDLE
E (19871) task_wdt: Aborting.

abort() was called at PC 0x40106b70 on core 0

After digging around online, I found this good suggestion on Stackoverflow.

Based on that, I implemented something in my code sort of like below. After that, the crashes are pretty much gone unless I try to push it past 150 messages per second. Its more than happy to chug along at 100 messages per second.

When you do a lot of work in the callback, you run the risk of a watchdog timer reset. If you change the callback to basically just take the message and mark it for processing, then that callback becomes very fast. You then process the message and respond in your main loop which can take as long as you like.

I haven't seen anything in the documentation, so I thought I'd share this with you all as it might save you some headache.

As a side note, running the ESPAsyncWebserver + websockets + ArduinoJSON + serving static html+js+css through SPIFFS is such a killer combo. You can build really nice, responsive websites. I absolutely cannot believe that this thing is running on a little microcontroller. So good. If you want to see how I'm doing it, check out Yarrboard.

CircularBuffer<WebsocketRequest*, YB_RECEIVE_BUFFER_COUNT> wsRequests;

void loop()
{
  //sometimes websocket clients die badly.
  ws.cleanupClients();

  //process our websockets outside the callback.
  if (!wsRequests.isEmpty())
    handleWebsocketMessageLoop(wsRequests.shift());
}

void handleWebSocketMessage(void *arg, uint8_t *data, size_t len, AsyncWebSocketClient *client)
{
  AwsFrameInfo *info = (AwsFrameInfo *)arg;
  if (info->final && info->index == 0 && info->len == len && info->opcode == WS_TEXT)
  {
    if (!wsRequests.isFull())
    {
      WebsocketRequest* wr = new WebsocketRequest;
      wr->client_id = client->id();
      strlcpy(wr->buffer, (char*)data, YB_RECEIVE_BUFFER_LENGTH); 
      wsRequests.push(wr);
    }
  }
}

void handleWebsocketMessageLoop(WebsocketRequest* request)
{
  char jsonBuffer[MAX_JSON_LENGTH];
  StaticJsonDocument<2048> output;
  StaticJsonDocument<1024> input;

  DeserializationError err = deserializeJson(input, request->buffer);

  String mycmd = input["cmd"] | "???";

  if (err)
    generateErrorJSON(output, error);
  else
    handleReceivedJSON(input, output, YBP_MODE_WEBSOCKET, request->client_id);

  if (output.size())
  {
    serializeJson(output, jsonBuffer);

    if (ws.availableForWrite(request->client_id))
      ws.text(request->client_id, jsonBuffer);
    else
      Serial.println("[socket] client full");
  }

  delete request;
}
@Belleson
Copy link

Belleson commented Aug 9, 2023

That's an interesting solution, I like it, thanks for posting! I agree with your comment about how a really nice app is possible with the combination of libraries you mention.

My web socket app does not have a high message rate, just some knobs and buttons on a browser for human interaction. At first I had similar watchdog timer crashes and learned (after days of debug) that doing almost anything in the message handler causes the watchdog to bite.

My solution is maybe not so fast as yours but maybe more flexible. The message handler triggers a timer Ticker (with 1millisec delay) that calls a function and passes a JSON argument. The function asynchronously processes the arg, then calls a client notifier to socket.textAll to update all clients. It requires some care to prevent successive timer calls from short-circuiting previous calls but it's not possible to hang the main loop.

It has not rebooted itself in nearly a year of adding code. Here is a sample message handler, where the incoming message has an action and optional value

Ticker socketActionTicker;
void handleSocketMsg(void *arg, uint8_t *data, size_t len)
{
	AwsFrameInfo *info = (AwsFrameInfo *)arg;
	if (info->final && info->index == 0 && info->len == len && info->opcode == WS_TEXT)
	{
		const uint8_t size = JSON_OBJECT_SIZE(8);
		StaticJsonDocument<size> json;
		DeserializationError err = deserializeJson(json, data);
		if (err)
		{
#if DEBUG_OUTPUT
			Serial.printf("\ndeserializeJson() failed with code %s", err.c_str());
			void * dataStr[256];
			memset(dataStr, 0, 256);
			memcpy(dataStr, data, 255);
			Serial.printf("\nSocket message from client:\n%s\n", dataStr);
			serializeJsonPretty(json, Serial);
#endif
			return;
		}

#if DEBUG_OUTPUT
		Serial.println("\nSocket message from client:\n");
		serializeJsonPretty(json, Serial);
#endif

		const char *action = json["action"];
		if (strcmp(action, "setInput") == 0)
		{
			uint_fast8_t newIn = json["value"];
			if (newIn != input)
				socketActionTicker.once_ms(1, setInput, newIn);
		}
		else if (strcmp(action, "setPwrDn") == 0)
		{
			socketActionTicker.once_ms(1, powerDn);
		}
		else if (strcmp(action, "setPwrUp") == 0)
		{
			socketActionTicker.once_ms(1, powerUp);
		}
		else if (strcmp(action, "getInput") == 0)
		{
			socketActionTicker.once_ms(1, sendInput);
		}
		else if (strcmp(action, "getPwrState") == 0)
		{
			uint_fast8_t powerState = digitalRead(POWR); // state is read from hardware
			socketActionTicker.once_ms(1, sendPwrSts);

		}
	}
}

//  and a client notification
void notifyClients(JsonDocument& jsonDoc)
{
#if MONITOR_STATUS
	Serial.printf("\nENTERING notifyClients(jsonDoc)");
#endif

#if DEBUG_OUTPUT
	Serial.printf("\n");
	serializeJsonPretty(jsonDoc, Serial);
#endif
	char buffer[512];
	size_t len = serializeJson(jsonDoc, buffer);
	socket.textAll(buffer, len);
}

@hoeken
Copy link
Author

hoeken commented Oct 25, 2023

So, quick update here.

At high message rates, it was still occasionally crashing (like every 12h or so, at 100 messages/sec)

I changed to this fork (https://github.com/nthd/ESPAsyncWebServer) that has some stability fixes applied and that helped quite a bit. It still crashes occasionally under higher load, but not nearly as much. Under light load I haven't seen any crashes in a while.

@ninja-
Copy link

ninja- commented Nov 20, 2023

see #1363 to see the code that causes mentioned crash

@ninja-
Copy link

ninja- commented Nov 20, 2023

looks like the fork also suffers from this bug

@ninja-
Copy link

ninja- commented Nov 20, 2023

after closer look, the fork will crash but after much higer load as you said...it might be related to active number of connections. maybe in between calls ws.cleanupClients which I do only once per second. unfortunately, platformio esp exception decoder does not seem to be able to decode the crash even with debug build

@ninja-
Copy link

ninja- commented Nov 20, 2023

I believe this will be fixed after applying this set of patches Aircoookie/WLED#3382 (comment), which I plan to do soon on top of nthd fork

@ninja-
Copy link

ninja- commented Nov 22, 2023

I've done some more digging, which might be helpful to someone.
After getting the exception decoder working, abort during stress tests was always coming down to lack of memory so allocation crash.

There doesn't seem to be any leak when connection is open and websockets data is streamed, but there is a per-connection leak after connection is closed.
I am not able to trace it, and I haven't found any information on it in in any of the forks.

So a simple while true; do curl esp32/anyurl; sleep 0.2; done will always result in an out-of-memory crash

I will be looking for more luck with Espressif native websockets server....

@hoeken
Copy link
Author

hoeken commented Nov 22, 2023

@ninja- funny you say that.... I've been working on a new project that is a wrapper in the style of ESPAsyncWebServer but uses the native ESP-IDF server calls under the hood. literally just wrote most of it in the last week but its functional and almost feature complete.

Here's the link: https://github.com/hoeken/PsychicHTTP/

@ninja-
Copy link

ninja- commented Nov 23, 2023

@hoeken that's some really nice code - looks like we've been duplicating effort of porting the websockets demo code to c++, a quite slow and painful process :)

@hoeken
Copy link
Author

hoeken commented Nov 24, 2023

@ninja- thanks man. yeah the esp-idf library is great, but its not exactly user friendly. it does make a fantastic base for a proper HTTP server though.

I don't know where you are in your process, but maybe we can collaborate. I just have a few things I'd like to sort out (uploads) before I make a v1.0.0 release of my library. My personal project Yarrboard has now been running on it for a couple days flawlessly, and I also did a load test on the example code which went well . There are definitely some performance gains to be made, but its definitely acceptable for now, and more importantly rock solid.

@ninja-
Copy link

ninja- commented Nov 24, 2023

I've got caught up in a couple issues today:

  • Arduino library shipped suffers from bug where c_str() can return NULLPTR under memory pressure. Took me a while to debug weird crashes and at least add a clear assert. That was supposedly fixed in one of the forks, where it at least falls back to empty string or ignores the concat operation.
  • Fighting general WiFi/BT instability and crashes after ~1-2h of light load. Not sure if there's a memleak somewhere or just general instability.
  • Trying to update espressif from 5.1.1 to 5.1.2, which is only possible with fork https://github.com/tasmota/platform-espressif32/tree/Arduino/IDF5. Took a ton of trying to get it working, and fighting with sdkconfig generator.
  • Turns out enabling both espressif and arduino frameworks silently downgrades espressif to some crappy old version that's not even 5.1.1, but something ancient. I could have taken the route of removing all Arduino code entirely, but rewrite of BLE code is not worth the time right now. That should be worked around by using this fork
  • huge_app.csv should be replaced with partitions_singleapp_large.csv for new espressif
  • fork author removed support for BLE for time being, seems like copy-pasting the latest Arduino BLE library works fine though
  • Espressif upgrade results in newer toolchain which detected a couple of issues like improper logger format strings in Psychic :) passing String instead of str.c_str() => crash

Sooo only now I got it to compile and upload with espressif 1.5.2, so I am looking forward to testing the stability with WiFI, BLE and WebSockets enabled...

@hoeken
Copy link
Author

hoeken commented Nov 24, 2023

@ninja- have you seen the new v3.0 beta of the official esp-idf/arduino framework? looks like its using latest v5.1.x branch of esp-idf. https://blog.espressif.com/announcing-the-arduino-esp32-core-version-3-0-0-3bf9f24e20d4

What platform are you using? Currently I'm only targeting Arduino, but it would be cool to have pure esp-idf support as well. This is my setup in platformio:

platform = espressif32
board = esp32dev
framework = arduino

If you want, lets move this thread to the psychichttp github issues tracker

@ninja-
Copy link

ninja- commented Nov 24, 2023

I do yes, but it's not compatible with platformio without using a fork.

platform = https://github.com/tasmota/platform-espressif32.git#Arduino/IDF5
framework =
	arduino
	espidf

@hoeken
Copy link
Author

hoeken commented Nov 30, 2023

If anyone is following this thread, PsychicHttp v1.0 has been released. Feel free to give it a try and let me know your feedback. https://github.com/hoeken/PsychicHTTP/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants