Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client doesn't disconnect when ping task exits #127

Closed
christophevg opened this issue Jul 11, 2019 · 10 comments
Closed

client doesn't disconnect when ping task exits #127

christophevg opened this issue Jul 11, 2019 · 10 comments
Assignees
Labels

Comments

@christophevg
Copy link

Hi,

I'm chasing some kind of deadlock that causes a client (a socketio client to be precise) not to return from its waiting state. So far I've only seen it a few times in my production environment when the wifi network was unstable.

On my development machine I've been trying to replicate the problem, but haven't succeeded so far. While trying though, I ran into the following "similar" problematic behaviour, which might point to the same root cause ... maybe :-)

So, a plain socketio client connects to a flask-socketio server and after connecting, I put the server to sleep (ctrl+z). The logs of the client then look like this:

[2019-07-11 21:35:54 +0200] [engineio.client] [5898] [INFO] Received packet PONG data None
[2019-07-11 21:35:54 +0200] [engineio.client] [5898] [INFO] Sending polling GET request to https://localhost:8000/socket.io/?transport=polling&EIO=3&sid=47cff736ceb549b9974fd0926f12ed2f

Here I put the server to sleep...

[2019-07-11 21:36:19 +0200] [engineio.client] [5898] [INFO] Sending packet PING data None
[2019-07-11 21:36:44 +0200] [engineio.client] [5898] [INFO] PONG response has not been received, aborting
[2019-07-11 21:36:44 +0200] [engineio.client] [5898] [INFO] Exiting ping task

And the client stays in this state endlessly. It seems to "think" it's still connected, but the ping task has ended, while I would suspect that it comes to the conclusion it has lost its connection to the server and better considers itself disconnected, returning from the wait and return control to the application, that now can respond by retrying the connection or whatever ;-)

Also, this little testing app sends out messages from a background task. Again, I would expect that trying to send such a message unsuccessfully would also make the client realise it is no longer actually connected. So after a long time, after the previous log excerpts I even have these:

[2019-07-11 21:50:54 +0200] [engineio.client] [5898] [INFO] Sending packet MESSAGE data 2["report","hello from demo-client"]

This is almost 15 minutes after the ping task exited, and even failing to actually send this message doesn't make the client terminate its wait.

Now, somehow this will be technically correct ;-) but in a way this also seems "wrong" ?!

@miguelgrinberg
Copy link
Owner

Okay. I don't think I've ever tried putting the server machine to sleep. Just to confirm, the system works correctly if you kill the server, correct? So the problem is only when the server goes to sleep.

@christophevg
Copy link
Author

Yeah, I know, it's an odd situation, but at least it resembles the problem I started out investigating :-)

Yes, simply killing/terminating the server works as expected - been doing that more than enough lately ;-)

Isn't the client supposed to disconnect when the ping times out?

@miguelgrinberg
Copy link
Owner

Yes, it should. It's a bug.

@miguelgrinberg miguelgrinberg self-assigned this Jul 11, 2019
@christophevg
Copy link
Author

I've been further looking into this. The "problem" seems to be related to the HTTP requests that simply wait forever. Now there are two calls into requests, a POST and a long-polling GET.

The post seems easy, adding timeout=10 to the request call ensures that this doesn't wait forever on the ill responsive server. For the long polling GET this can't be added, since it might every well be that it takes pretty long before some feedback is received.

Maybe if we take the ping interval into consideration, we know that a polling GET only lives as long as the ping interval, so we could take that interval as the timeout for the polling, as it should have already returned the pong anyway.

I was already using an overridden EngineIOClient to allow for unverified SSL connections (see #99 ) So, the following "patch" to this "patch" seems to do the trick for me in this specific case where I put the server process to sleep. I'll have to see if it also solves the similar problem on my production machines.

class MyEngineIoClient(eio.Client):
  def _send_request(self, method, url, headers=None, body=None):
    if self.http is None:
      self.http = requests.Session()
    try:
      wait_for = 10 if method == "POST" else 45
      return self.http.request(method, url, timeout=wait_for, headers=headers, data=body, verify=False)
    except requests.exceptions.Timeout:
      self.logger.info("{0} request to {1} timed out after {2}".format(method, url, wait_for))
    except requests.exceptions.ConnectionError:
      pass

Now, both the write loop (first when the sending of the ping fails) and (max) 30 seconds later the read loop also exits because no pong is received, and the client actually disconnects, terminates its wait and returns control to the application (yeah ;-)):

[2019-07-12 17:47:00 +0200] [engineio.client] [9485] [WARNING] POST request to https://localhost:8000/socket.io/?transport=polling&EIO=3&sid=8fbdefc9c0804b3d9cb0b8d67635effc timed out after 10
[2019-07-12 17:47:00 +0200] [engineio.client] [9485] [WARNING] Connection refused by the server, aborting
...
[2019-07-12 17:47:10 +0200] [engineio.client] [9485] [WARNING] GET request to https://localhost:8000/socket.io/?transport=polling&EIO=3&sid=8fbdefc9c0804b3d9cb0b8d67635effc&t=1562924785.948868 timed out after 45
[2019-07-12 17:47:10 +0200] [engineio.client] [9485] [WARNING] Connection refused by the server, aborting

This is a rough-cut to experiment with. I'll look further into it over the weekend, as I need a (temporary) solution quickly to avoid my hosts become non-responsive ;-)

Until now, I hadn't looked into the WebSocket support. After installing it and testing this sleep-case, it detects a problem:

[2019-07-12 18:06:13 +0200] [engineio.client] [9834] [ERROR] packet queue is empty, aborting

... but then also keeps waiting... So here also some additional work is needed.

Adding a timeout=60 to the websocket.create_connection constructor seems a promising path. Also something I'll continue exploring over the weekend.

@miguelgrinberg
Copy link
Owner

miguelgrinberg commented Aug 4, 2019

@christophevg would you like to give the master branch a try? All requests are now issued with timeouts, so I believe this problem is fully addressed, but let me know if missed anything.

@christophevg
Copy link
Author

(Sorry for the late reply, laptop was with service center to fix an issue under extended warranty.)

I've reviewed your commit and you applied the same fixes I applied to my custom class to introduce the timeouts. With that fix in place, I had +200 clients running for over two weeks, so it sure did the trick. In September, I continue to work on this project, so I can do more testing then. If any issue (re)occurs, I'll follow up. But for now, I would surely consider it closed :-)

@Dwyte
Copy link

Dwyte commented Mar 21, 2020

I'm experiencing this issue on AsyncClient, disconnect event not getting fired.

...
@sio.event
async def disconnect():
    logging.info("disconnected from server")

when await sio.disconnect() gets called:

INFO:engineio.client:Sending packet MESSAGE data 1
INFO:engineio.client:Sending packet CLOSE data None
INFO:socketio.client:Engine.IO connection dropped
INFO:engineio.client:Unexpected error "WebSocket read returned None", aborting
INFO:engineio.client:Waiting for write loop task to end
WARNING:aiohttp.websocket:websocket connection is closing.
WARNING:aiohttp.websocket:websocket connection is closing.
INFO:engineio.client:Exiting write loop task
INFO:engineio.client:Waiting for ping loop task to end
INFO:engineio.client:Exiting ping task
INFO:engineio.client:Exiting read loop task

or when I stop the server:

INFO:engineio.client:Received packet PONG data None
INFO:engineio.client:Sending packet PING data None
WARNING:aiohttp.websocket:websocket connection is closing.
INFO:engineio.client:PONG response has not been received, aborting
INFO:engineio.client:Exiting ping task
INFO:engineio.client:Exiting write loop task

it's not even trying to reconnect, I think maybe it thinks it's still connected? I tried this on the synchronous client it works okay, automatically reconnects.

@Dwyte
Copy link

Dwyte commented Mar 21, 2020

I tried downgrading to 3.9.2, seem to solve the problem.

EDIT: 3.10 seems to work too. not sure what changes 3.11 onward made. I'm not sure if this is a bug or I have to do things differently 3.11+? Ill use 3.10 for now

@miguelgrinberg
Copy link
Owner

@Dwyte can I ask you to install the master branch of python-engineio and retest? If you still get the error, I'm going to need you to provide an example that I can use to reproduce your problem. Thanks.

@Dwyte
Copy link

Dwyte commented May 5, 2020

@miguelgrinberg, sorry for the late reply. I just tried testing with the master branch, seems the problem is fixed now. Disconnect event gets triggered and reconnects now, cool.

Repository owner deleted a comment Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants