client doesn't disconnect when ping task exits #127

christophevg · 2019-07-11T20:05:17Z

Hi,

I'm chasing some kind of deadlock that causes a client (a socketio client to be precise) not to return from its waiting state. So far I've only seen it a few times in my production environment when the wifi network was unstable.

On my development machine I've been trying to replicate the problem, but haven't succeeded so far. While trying though, I ran into the following "similar" problematic behaviour, which might point to the same root cause ... maybe :-)

So, a plain socketio client connects to a flask-socketio server and after connecting, I put the server to sleep (ctrl+z). The logs of the client then look like this:

[2019-07-11 21:35:54 +0200] [engineio.client] [5898] [INFO] Received packet PONG data None
[2019-07-11 21:35:54 +0200] [engineio.client] [5898] [INFO] Sending polling GET request to https://localhost:8000/socket.io/?transport=polling&EIO=3&sid=47cff736ceb549b9974fd0926f12ed2f

Here I put the server to sleep...

[2019-07-11 21:36:19 +0200] [engineio.client] [5898] [INFO] Sending packet PING data None
[2019-07-11 21:36:44 +0200] [engineio.client] [5898] [INFO] PONG response has not been received, aborting
[2019-07-11 21:36:44 +0200] [engineio.client] [5898] [INFO] Exiting ping task

And the client stays in this state endlessly. It seems to "think" it's still connected, but the ping task has ended, while I would suspect that it comes to the conclusion it has lost its connection to the server and better considers itself disconnected, returning from the wait and return control to the application, that now can respond by retrying the connection or whatever ;-)

Also, this little testing app sends out messages from a background task. Again, I would expect that trying to send such a message unsuccessfully would also make the client realise it is no longer actually connected. So after a long time, after the previous log excerpts I even have these:

[2019-07-11 21:50:54 +0200] [engineio.client] [5898] [INFO] Sending packet MESSAGE data 2["report","hello from demo-client"]

This is almost 15 minutes after the ping task exited, and even failing to actually send this message doesn't make the client terminate its wait.

Now, somehow this will be technically correct ;-) but in a way this also seems "wrong" ?!

miguelgrinberg · 2019-07-11T20:25:16Z

Okay. I don't think I've ever tried putting the server machine to sleep. Just to confirm, the system works correctly if you kill the server, correct? So the problem is only when the server goes to sleep.

christophevg · 2019-07-11T21:05:43Z

Yeah, I know, it's an odd situation, but at least it resembles the problem I started out investigating :-)

Yes, simply killing/terminating the server works as expected - been doing that more than enough lately ;-)

Isn't the client supposed to disconnect when the ping times out?

miguelgrinberg · 2019-07-11T21:41:55Z

Yes, it should. It's a bug.

christophevg · 2019-07-12T16:36:47Z

I've been further looking into this. The "problem" seems to be related to the HTTP requests that simply wait forever. Now there are two calls into requests, a POST and a long-polling GET.

The post seems easy, adding timeout=10 to the request call ensures that this doesn't wait forever on the ill responsive server. For the long polling GET this can't be added, since it might every well be that it takes pretty long before some feedback is received.

Maybe if we take the ping interval into consideration, we know that a polling GET only lives as long as the ping interval, so we could take that interval as the timeout for the polling, as it should have already returned the pong anyway.

I was already using an overridden EngineIOClient to allow for unverified SSL connections (see #99 ) So, the following "patch" to this "patch" seems to do the trick for me in this specific case where I put the server process to sleep. I'll have to see if it also solves the similar problem on my production machines.

class MyEngineIoClient(eio.Client):
  def _send_request(self, method, url, headers=None, body=None):
    if self.http is None:
      self.http = requests.Session()
    try:
      wait_for = 10 if method == "POST" else 45
      return self.http.request(method, url, timeout=wait_for, headers=headers, data=body, verify=False)
    except requests.exceptions.Timeout:
      self.logger.info("{0} request to {1} timed out after {2}".format(method, url, wait_for))
    except requests.exceptions.ConnectionError:
      pass

Now, both the write loop (first when the sending of the ping fails) and (max) 30 seconds later the read loop also exits because no pong is received, and the client actually disconnects, terminates its wait and returns control to the application (yeah ;-)):

[2019-07-12 17:47:00 +0200] [engineio.client] [9485] [WARNING] POST request to https://localhost:8000/socket.io/?transport=polling&EIO=3&sid=8fbdefc9c0804b3d9cb0b8d67635effc timed out after 10
[2019-07-12 17:47:00 +0200] [engineio.client] [9485] [WARNING] Connection refused by the server, aborting
...
[2019-07-12 17:47:10 +0200] [engineio.client] [9485] [WARNING] GET request to https://localhost:8000/socket.io/?transport=polling&EIO=3&sid=8fbdefc9c0804b3d9cb0b8d67635effc&t=1562924785.948868 timed out after 45
[2019-07-12 17:47:10 +0200] [engineio.client] [9485] [WARNING] Connection refused by the server, aborting

This is a rough-cut to experiment with. I'll look further into it over the weekend, as I need a (temporary) solution quickly to avoid my hosts become non-responsive ;-)

Until now, I hadn't looked into the WebSocket support. After installing it and testing this sleep-case, it detects a problem:

[2019-07-12 18:06:13 +0200] [engineio.client] [9834] [ERROR] packet queue is empty, aborting

... but then also keeps waiting... So here also some additional work is needed.

Adding a timeout=60 to the websocket.create_connection constructor seems a promising path. Also something I'll continue exploring over the weekend.

miguelgrinberg · 2019-08-04T11:43:43Z

@christophevg would you like to give the master branch a try? All requests are now issued with timeouts, so I believe this problem is fully addressed, but let me know if missed anything.

christophevg · 2019-08-18T11:25:52Z

(Sorry for the late reply, laptop was with service center to fix an issue under extended warranty.)

I've reviewed your commit and you applied the same fixes I applied to my custom class to introduce the timeouts. With that fix in place, I had +200 clients running for over two weeks, so it sure did the trick. In September, I continue to work on this project, so I can do more testing then. If any issue (re)occurs, I'll follow up. But for now, I would surely consider it closed :-)

Dwyte · 2020-03-21T02:15:15Z

I'm experiencing this issue on AsyncClient, disconnect event not getting fired.

...
@sio.event
async def disconnect():
    logging.info("disconnected from server")

when await sio.disconnect() gets called:

INFO:engineio.client:Sending packet MESSAGE data 1
INFO:engineio.client:Sending packet CLOSE data None
INFO:socketio.client:Engine.IO connection dropped
INFO:engineio.client:Unexpected error "WebSocket read returned None", aborting
INFO:engineio.client:Waiting for write loop task to end
WARNING:aiohttp.websocket:websocket connection is closing.
WARNING:aiohttp.websocket:websocket connection is closing.
INFO:engineio.client:Exiting write loop task
INFO:engineio.client:Waiting for ping loop task to end
INFO:engineio.client:Exiting ping task
INFO:engineio.client:Exiting read loop task

or when I stop the server:

INFO:engineio.client:Received packet PONG data None
INFO:engineio.client:Sending packet PING data None
WARNING:aiohttp.websocket:websocket connection is closing.
INFO:engineio.client:PONG response has not been received, aborting
INFO:engineio.client:Exiting ping task
INFO:engineio.client:Exiting write loop task

it's not even trying to reconnect, I think maybe it thinks it's still connected? I tried this on the synchronous client it works okay, automatically reconnects.

Dwyte · 2020-03-21T05:46:24Z

I tried downgrading to 3.9.2, seem to solve the problem.

EDIT: 3.10 seems to work too. not sure what changes 3.11 onward made. I'm not sure if this is a bug or I have to do things differently 3.11+? Ill use 3.10 for now

miguelgrinberg · 2020-04-09T22:16:41Z

@Dwyte can I ask you to install the master branch of python-engineio and retest? If you still get the error, I'm going to need you to provide an example that I can use to reproduce your problem. Thanks.

Dwyte · 2020-05-05T09:17:57Z

@miguelgrinberg, sorry for the late reply. I just tried testing with the master branch, seems the problem is fixed now. Disconnect event gets triggered and reconnects now, cool.

miguelgrinberg added the question label Jul 11, 2019

miguelgrinberg added bug and removed question labels Jul 11, 2019

miguelgrinberg self-assigned this Jul 11, 2019

miguelgrinberg closed this as completed in 6666d6a Aug 4, 2019

miguelgrinberg mentioned this issue Aug 4, 2019

No disconnect event triggered after receiving no pong miguelgrinberg/python-socketio#326

Closed

Repository owner deleted a comment Jun 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client doesn't disconnect when ping task exits #127

client doesn't disconnect when ping task exits #127

christophevg commented Jul 11, 2019

miguelgrinberg commented Jul 11, 2019

christophevg commented Jul 11, 2019

miguelgrinberg commented Jul 11, 2019

christophevg commented Jul 12, 2019

miguelgrinberg commented Aug 4, 2019 •

edited

Loading

christophevg commented Aug 18, 2019

Dwyte commented Mar 21, 2020 •

edited

Loading

Dwyte commented Mar 21, 2020 •

edited

Loading

miguelgrinberg commented Apr 9, 2020

Dwyte commented May 5, 2020

client doesn't disconnect when ping task exits #127

client doesn't disconnect when ping task exits #127

Comments

christophevg commented Jul 11, 2019

miguelgrinberg commented Jul 11, 2019

christophevg commented Jul 11, 2019

miguelgrinberg commented Jul 11, 2019

christophevg commented Jul 12, 2019

miguelgrinberg commented Aug 4, 2019 • edited Loading

christophevg commented Aug 18, 2019

Dwyte commented Mar 21, 2020 • edited Loading

Dwyte commented Mar 21, 2020 • edited Loading

miguelgrinberg commented Apr 9, 2020

Dwyte commented May 5, 2020

miguelgrinberg commented Aug 4, 2019 •

edited

Loading

Dwyte commented Mar 21, 2020 •

edited

Loading

Dwyte commented Mar 21, 2020 •

edited

Loading