response.iter_lines() did not print the last line of stream log. #3577

haocs · 2016-09-16T22:12:50Z

python: 3.4
requests: 2.9.1

I implemented the following function to fetch stream log from server continuously.

def stream_trace(streaming_url, user_name, password):
    import requests
    r = requests.get(streaming_url, auth=(user_name, password), stream=True)
    for line in r.iter_lines(decode_unicode=True):
        if line:
            print(line)

The above code could fetch and print the log successfully however its behavior was different as expected.
Basically, it holds the last line of current content/chunk and prints it together with the next chunk of logs.
For example, let's say there are two chunks of logs from server and the expected print:

# first chunk
a
#  second chunk
b
c

what stream_trace function printed out('a' printed as 2nd chunk and 'c' was missing)

# first chunk
# second chunk
a
b

Could you help me figure out what may went wrong? Thanks.

The text was updated successfully, but these errors were encountered:

Lukasa · 2016-09-16T22:19:43Z

So iter_lines has a somewhat unexpected implementation. Naïvely, we would expect that iter_lines would receive data as it arrives and look for newlines. In practice, this is not what it does. Instead it waits to read an entire chunk_size, and only then searches for newlines. This is a consequence of the underlying httplib implementation, which only allows for file-like reading semantics, rather then the early return semantics usually associated with a socket.

You can get the effect you want by setting the chunk size to 1. However, this will drastically reduce performance. If you can tolerate late log delivery, then it is probably enough to leave the implementation as it is: when the connection is eventually closed, all of the lines should safely be delivered and no data will be lost. The only caveat here is that if the connection is closed uncleanly, then we will probably throw an exception rather then return the buffered data.

haocs · 2016-09-20T10:39:48Z

Thanks @Lukasa
However, setting chunk_size to 1 or None did not change the results in my case. It seems that my issue is related to https://github.com/kennethreitz/requests/issues/2020 . Requests somehow handles chucked-encoding differently as curl does.
The following example shows different results GET from my log-server using curl and requests.

# curl --raw -v 'https://some-url.com'
< Transfer-Encoding: chunked
< Content-Type: text/plain; charset=utf-8

4f
2016-09-20T09:56:36  Welcome, you are now connected to log-streaming service.

37
PORT is:\\\\.\\pipe\\xxxxxx-xxxx-xxxx-xxxx-xxxxxxxx

11
Something wrong

12
Something normal

# r.iter_content(hunk_size=None, decode_unicode=False)

b'2016-09-20T10:12:09  Welcome, you are now connected to log-streaming service.'
b'\r\nPORT is:\\\\.\\pipe\\xxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\r'
b'\nSomething normal'
b'\r\nSomething wrong\r'
b'\n'
b'PORT is:\\\\.\\pipe\\xxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\r\n'
b'Something wrong\r\nSomething '
b'normal\r\n'
b'PORT is:\\\\.\\pipe\\xxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\r\nSomething '
b'wrong\r\nSomething '

We can see that iter_content get the correct data as well as CRLF but chunks them in a different way. Since iter_lines internally called iter_content, the line split differently accordingly.

Lukasa · 2016-09-20T10:55:21Z

I didn't realise you were getting chunked content. In that case, can you try the latest Requests with iter_content(None)? That should actually give you chunks.

haocs · 2016-09-20T11:32:50Z

I tried with v2.11 but saw the same issue. Does iter_content chunk the data based on the chuck_size provided by server?

Lukasa · 2016-09-20T11:33:27Z

@haocs Only when using chunk_size=None.

haocs · 2016-09-20T22:31:23Z

Even with chunk_size=None, the length of content generated from iter_content is different to chunk_size from server.

# r.iter_contents(chunk_size=None)
[length of chunk] 0x4d
b'2016-09-20T21:41:52  Welcome, you are now connected to log-streaming service.'
[length of chunk] 0x38
b'\r\nPORT is:\\\\.\\pipe\\xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\r'

######
#curl --raw
4f
2016-09-20T22:00:30  Welcome, you are now connected to log-streaming service.

37
PORT is:\\.\pipe\xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

The above snippet shows two chunks that fetched by requests and curl from server. It seems that requests did not handle trailing CRLF(which is part of the chunk) properly. For example, chunk_size of the first chunk indicate the size is 4F but iter_content only received 4D length and add \r\n to the beginning of the next chunk.

Lukasa · 2016-09-21T07:21:11Z

I don't observe this problem on Python when using https://mkcert.org/generate/, where Requests generates exactly the same chunk boundaries as curl. Can you confirm for me please that server really is generating the exact same chunk boundaries in each case?

Can you also confirm for me that you ran your test on v2.11?

haocs · 2016-09-21T22:04:39Z

yes, I tested against v2.11.1. Requests works fine with https://mkcert.org/generate/.
One difference I noticed is that chunks from my testing server contains a \r\n explicitly at the end of each line(and the length of \r\n has been included in chunk length). Versus the mkcert.org ones don't have. Will this cause any trouble for Requests to process chunks?
BTW. my testing is running against Azure kudu server. If necessary, I can provide a testing account as well as repro steps.

Lukasa · 2016-09-22T08:34:12Z

mkcert.org provides a \r\n at the end of each chunk too, because it's required to by RFC 7230 Section 4.1. That section says that a chunked body looks like this:

<chunk size in hex>\r\n
<chunk data, whose total length is == chunk size>\r\n

Note that the \r\n at the end is excluded from the chunk size. Does your output end each chunk with two \r\n, one counted in the body and one that isn't? Because it's supposed to. The raw body above seems to be overcounting its chunk sizes by counting the CRLF characters in the chunk size, when it should not.

haocs · 2016-09-23T20:27:41Z

I understand the end \r\n of each chunk should not be counted in chunk_size. But another \r\n should be, right? However, per my testing, requests ignored both \r\n if I understand correctly.
I implemented another request function using urllib3 and it performed same as curl did. Which makes me believe that requests skipped \r\n when iterates contents.
Please see the following results from urllib3 and requests.

# python req_urllib3.py
[len of chunk] 0x4f
b'2016-09-23T19:27:27  Welcome, you are now connected to log-streaming service.\r\n'
[len of chunk] 0x39
b'2016-09-23T19:28:27  No new trace in the past 1 min(s).\r\n'


# python req_requests.py
[len of chunk] 0x4d
b'2016-09-23T19:25:09  Welcome, you are now connected to log-streaming service.'
[len of chunk] 0x39
b'\r\n2016-09-23T19:26:09  No new trace in the past 1 min(s).'

Lukasa · 2016-09-23T22:20:37Z

Requests uses urllib3 directly and performs no additional post processing in this case. iter_content(None) is identical to stream(None).

haocs · 2016-09-24T00:23:38Z

What's the urllib3 version shipped with requests v2.11?

haocs · 2016-09-24T01:41:01Z

Ok, I could repro this "issue" with urllib3. If I use urllib3 and set accept_encoding=True, it will give me exactly what. Seems Requests by default set header Accept-Encoding=Ture if called by requests.get()

Lukasa · 2016-09-24T07:21:52Z

So do you see the problem go away if you set headers={'Accept-Encoding': 'identity'}?

haocs · 2016-09-26T18:50:14Z

Yes. After I set headers={'Accept-Encoding': 'identity'}, iter_content(chunk_size=None, decode_unicode=False) worked as expected.

Lukasa · 2016-09-26T19:15:50Z

Ok. This strongly suggests that the problem is the way that the server is handling gzipping a chunked body. It would be very interesting if possible to see the raw data stream.

haocs · 2016-09-30T17:16:46Z

Since I could observe same problem using curl or urllib3 with gzip enabled, obviously this not necessary to be an issue of requests. Thank you very much for the help, issue closed.

haocs mentioned this issue Sep 20, 2016

webapp: try not to use pycurl for live trace streaming Azure/azure-cli#905

Closed

haocs closed this as completed Sep 30, 2016

fengyehong mentioned this issue Jun 29, 2017

Event streaming not realtime thefactory/marathon-python#206

Open

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

response.iter_lines() did not print the last line of stream log. #3577

response.iter_lines() did not print the last line of stream log. #3577

haocs commented Sep 16, 2016

Lukasa commented Sep 16, 2016

haocs commented Sep 20, 2016

Lukasa commented Sep 20, 2016

haocs commented Sep 20, 2016

Lukasa commented Sep 20, 2016

haocs commented Sep 20, 2016

Lukasa commented Sep 21, 2016

haocs commented Sep 21, 2016

Lukasa commented Sep 22, 2016

haocs commented Sep 23, 2016

Lukasa commented Sep 23, 2016

haocs commented Sep 24, 2016

haocs commented Sep 24, 2016

Lukasa commented Sep 24, 2016

haocs commented Sep 26, 2016

Lukasa commented Sep 26, 2016

haocs commented Sep 30, 2016

response.iter_lines() did not print the last line of stream log. #3577

response.iter_lines() did not print the last line of stream log. #3577

Comments

haocs commented Sep 16, 2016

Lukasa commented Sep 16, 2016

haocs commented Sep 20, 2016

Lukasa commented Sep 20, 2016

haocs commented Sep 20, 2016

Lukasa commented Sep 20, 2016

haocs commented Sep 20, 2016

Lukasa commented Sep 21, 2016

haocs commented Sep 21, 2016

Lukasa commented Sep 22, 2016

haocs commented Sep 23, 2016

Lukasa commented Sep 23, 2016

haocs commented Sep 24, 2016

haocs commented Sep 24, 2016

Lukasa commented Sep 24, 2016

haocs commented Sep 26, 2016

Lukasa commented Sep 26, 2016

haocs commented Sep 30, 2016