Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-JSON packets are sometimes parsed incorrectly. #75

Closed
Shaptic opened this issue Aug 21, 2018 · 9 comments
Closed

Non-JSON packets are sometimes parsed incorrectly. #75

Shaptic opened this issue Aug 21, 2018 · 9 comments
Labels

Comments

@Shaptic
Copy link

Shaptic commented Aug 21, 2018

Hi,

To preface this issue, my stack is Flask-SocketIO server-side, and SocketIO-Client client-side.

The Problem

When Engine.IO receives an empty Socket.IO ACK packet (i.e. 431, "message, ACK, ID 1"), the payload it is parsed as a single integer (in this case, 31) which results in the lower layer (Socket.IO) interpreting this as the packet type, throwing an error (since 31 is not a valid packet). This is because of json.loads(), which returns an integer given an integer string (i.e. json.loads("31") == 31). Specifically, this line in the codebase.

Caveats

This could be a bug with the client library; it's unclear as to whether or not empty ACK packets are allowed by the Socket.IO specification. If they aren't, or the client is incorrectly not appending an "empty" JSON payload, that wouldn't trigger this problem.

In either case, though, if Engine.IO is meant to be a generic low-level protocol, it doesn't matter what the expectations of the higher-level are -- I believe this is still technically a parsing bug.

@miguelgrinberg
Copy link
Owner

I believe the correct format for an empty ACK package like in your example would be 43[]1, right? Without the data field, even as an empty list, there is no way to separate the 3 from the 1 unless I add an assumption that all packet types are one byte long.

Note that ACK and EVENT packets are fairly similar. How does this client encode an EVENT packet that is empty?

@Shaptic
Copy link
Author

Shaptic commented Aug 22, 2018

That sounds plausible to me! I'm no expert the protocol. The client library encodes packets like this:

def format_socketIO_packet_data(path=None, ack_id=None, args=None):
    socketIO_packet_data = json.dumps(args, ensure_ascii=False) if args else ''
    if ack_id is not None:
        socketIO_packet_data = str(ack_id) + socketIO_packet_data
    if path:
        socketIO_packet_data = path + ',' + socketIO_packet_data
    return socketIO_packet_data

I'm having trouble generating an empty event through the API; it requires an event type and there's a whole slew of code w.r.t. handling that. If I pass an empty string, the encoded packet has [""], still a bit different than the [] that we want...

But, theoretically, according to this code, an empty event would just be the packet type, and if the event requires an ACK (thus appending an ID), the same problem would be encountered. It's probably an untested aspect of this library because of the aforementioned requirement.

The solution, though, seems like else '[]' makes sense over else '', right?

Side Note: To clarify, it seems like you're saying the data should come before the ACK ID, correct? That doesn't seem to be the case for this client (for which ACKs work as expected if they have data):

(Pdb) ws._ack("", 1, "some ack data")
Outgoing: 1["some ack data"]

And it seems like flask-socketio's parsing code assumes this as well (edited for brevity):

ep = encoded_packet[1:]    # after extracting packet type at [0]
if ep and ep[0].isdigit():
    self.id = 0
    while ep[0].isdigit():
        self.id = self.id * 10 + int(ep[0])
        ep = ep[1:]

@miguelgrinberg
Copy link
Owner

Sorry, I was mistaken early. The correct format for an empty ACK would be 431[]. Need to look at this in more detail, maybe I do have a way to determine if that 1 is an id or not.

@Shaptic
Copy link
Author

Shaptic commented Aug 22, 2018

Well it's still a client error for leaving off the empty payload, since that would cause the server-side decoder to not interpret 31[] as int(31)... :)

That brings up the purely hypothetical (as I imagine there's either [a] a good reason for it or [b] it's not worth actually taking the time to try and see the side-effects of, since it doesn't affect anyone but those with buggy client encoders 🙃) question as to why Engine.IO makes any interpretations to the payload data anyway. Couldn't the Socket.IO level perform the json.loads (or lack thereof) at its leisure?

In either case, feel free to close the issue since it's a matter of an improper Socket.IO payload!

@miguelgrinberg
Copy link
Owner

After looking at this earlier, I believe a 431[] packet would also fail to be parsed correctly. The problem is not the 3 next to the 1, the problem is that the 1 is interpreted as an attachment count, which is something that is used with binary packets. It is also not specific to ACK, the problem exists with EVENT packets as well.

@Shaptic
Copy link
Author

Shaptic commented Aug 23, 2018

Are you sure? That happens below the Engine.IO level, right? I ran the attachment_count parsing snippet of the Socket.IO-level packet parsing code:

>>> ep = "31[]"   # 4 is removed at the Engine.IO level as its type
>>> # Extracted from socketio/packet.py :: Packet.encode()
>>> ep = ep[1:]
>>> dash = (ep + '-').find('-')
>>> attachment_count = 0
>>> if ep[0:dash].isdigit():
...     attachment_count = int(ep[0:dash])
...     ep = ep[dash + 1:]
... 
>>> attachment_count
0
>>> ep
'1[]'

The problem I described wouldn't occur because 31[] isn't loaded by the Engine.IO level as an int, and thus Socket.IO's parsing code would load 3 as the type and 1 as the ID (the 4 being removed as the type at the Engine.IO level). Whereas if it was interpreted as an int, the indexing on this line fails and means self.packet_type = 31, exactly what this bug outlines. :)

@miguelgrinberg
Copy link
Owner

I think we are both correct, as we are looking at two separate problems. At the engine.io level there is a problem with the parsing of packets that have only digits, as these are misinterpreted as numeric payloads. If this problem is addressed, such packets will also fail to be parsed at the socket.io layer, because the second digit will be incorrectly interpreted as an attachment count, while it should be the id attribute.

I have a fix almost ready to go for the socket.io bug. With the fix, a 31 packet will be interpreted the same as 31[] (well, the data attribute will be None in the first case and an empty list in the second). I still need to look at the engine.io problem.

@Shaptic
Copy link
Author

Shaptic commented Aug 23, 2018

Ahh yes, I see what you're saying. You initially said that 31[] -> attachment count which wouldn't be the case as I showed in my snippet (since "31[]".isdigit() == False), but it will be the case for "31".isdigit(), but that would only happen if the Engine.IO bug is solved!

You could argue that 31 is an invalid Socket.IO packet if the data field is required, so there'd be no need for the workaround.

@miguelgrinberg
Copy link
Owner

miguelgrinberg commented Aug 25, 2018

@Shaptic see the commit above, I think it addresses this problem by not allowing integer payloads anymore.

Also put in a fix for the socketio parsing problem: miguelgrinberg/python-socketio@f2d28ad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants