M1 Mac "EOF" during http access to docker-hosted webserver "Connection reset by peer" #5407

rfay · 2021-03-01T21:28:30Z

I have tried with the latest version of Docker Desktop
I have tried disabling enabled experimental features
I have uploaded Diagnostics
Diagnostics ID: 165EF212-DB99-4FE1-90A4-E516A309B23F/20210609215723

Expected behavior

Predictable behavior on docker supporting an nginx webserver.

Actual behavior

Intermittent EOF when hitting a webserver.

I now have Mac M1 CI going for ddev, currently using build 3.1.0 (60984)

Every day I see a few tests where an http request gets an EOF. This is different from many docker fails I've experienced before, and seems new to the M1 docker.

I know it's not much help, but




testcommon.go:396:
  | Error Trace:	testcommon.go:396
  | config_test.go:789
  | Error:      	Received unexpected error:
  | Get "http://127.0.0.1/phpinfo.php": EOF
  | Test:       	TestPHPOverrides
  | Messages:   	GetLocalHTTPResponse returned err on rawurl http://testpkgdrupal8.ddev.site/phpinfo.php: Get "http://127.0.0.1/phpinfo.php": EOF

The thing is that EOF on M1 has happened intermittently on different tests, and it's the most likely failure on mac M1. It usually succeeds on a retry (there's an hour of testing, with hundreds or thousands of http requests). There is only one machine in the pool at this point, so it's not about different computers with different state.

I just thought you should know about this one because it's reasonably common.

Information

MacOS Big Sur 11.2.1
Docker Desktop for Mac 3.1.0 (60984)

Steps to reproduce the behavior

...
...

The text was updated successfully, but these errors were encountered:

stephen-turner · 2021-03-02T10:00:45Z

Thanks for the report, @rfay. If you have a spare Big Sur Intel machine to test it on, I'd be interested to know whether it fails on that using the new virtualization framework.

rfay · 2021-03-02T13:45:33Z

I went ahead and switched a Big Sur amd64 test runner to new virtualization, we'll see what happens. Note that my initial experience wasn't good - it just went off into spinning-circle land forever. However, when I closed the window after many minutes and came back it seemed to be set. Not satisfied, I reset to factory defaults and then changed it to new virt again. It didn't do the infinite spinning the second time.

rfay · 2021-03-02T14:46:26Z

@stephen-turner Networking to container (host.docker.internal) is nonfunctional with new virtualization enabled, so that's a no-go for NFS or xdebug. Opened #5410

rfay · 2021-03-30T12:12:24Z

@stephen-turner @djs55 The EOF issue may have gone away in RC2, I'm not sure.

However, and I assume related, I still get "Connection reset by peer" quite a lot. "read tcp 127.0.0.1:62795->127.0.0.1:80: read: connection reset by peer". It's rare that I don't have to restart an M1 test run due to this one. Doesn't happen on the same test each time.

rfay · 2021-04-18T02:10:30Z

I see this related things quite a lot still "Get "http://127.0.0.1//README.txt": read tcp 127.0.0.1:60801->127.0.0.1:80: read: connection reset by peer"

Sometimes I have to restart tests many times to get a clean run. Of course, while I'm doing that every other platform, Docker for Mac amd64, Docker for Windows, WSL2, Linux, have all completed without any of these errors on the same code.

rfay · 2021-04-22T04:13:06Z

I do still see the EOF, it hasn't gone away:

testcommon_test.go:194:
  | Error Trace:	testcommon_test.go:194
  | Error:      	Received unexpected error:
  | Get "http://127.0.0.1/readme.html": EOF
  | Test:       	TestGetLocalHTTPResponse

LeZuse · 2021-04-28T17:09:22Z

We just started having the same issue on Apple Silicon macs with Docker 3.3.1. Reverting to Preview 7 solves the issue. Happy to provide more details.

rfay · 2021-05-05T13:24:45Z

This problem is not solved in Docker Desktop 3.3.2. I had hopes that it might have been related to this in the 3.3.2 release notes:

Fixed a bug with an Apple chip where the last byte in a network transfer was occasionally lost

rfay · 2021-05-05T14:10:18Z

Not sure how you're going to chase this @djs55 but it's a really significant problem. Casual users probably just click through it and try again, but I haven't had a successful M1 test run of the ddev test suite for more than a month, I'm starting to ignore the failures. And it's always the EOF

djs55 · 2021-05-06T17:20:40Z

@rfay thanks for trying with 3.3.2.

The code path which handles docker run -p forwarded ports should be the same on both Intel and Apple Silicon so I suspect the bug might actually be present on both platforms, even if it is only visible on Apple Silicon. I'll take a look in more detail to see if I can spot something.

A bit of a long shot but: Is the EOF from the first request to a container or does it happen after successful requests? I ask because running docker -p 80:80 -d nginx could possibly return before the nginx process has called listen, leading to a transient EOF on the first request. This would probably be obvious if it was happening ... unless the container is silently crashing and auto-restarting? It's probably worth double-checking that the container isn't accidentally running through qemu emulation. We're still chasing down and building multiarch images in a few places ourselves. Qemu works just well enough to make simple tests pass but then fails during more stressful tests.

rfay · 2021-05-06T20:35:07Z

Thanks for the thinking and attention on this.

This seems to happen randomly in no-particular test, so I doubt that it has to do with container crashing or that sort of thing.
These happen after the container is fully up and registered healthy, and the healthcheck does an http request, and that's not when this is happening. It's happening on traffic after it's come up.
Images are absolutely arm64. We use only native images on all platforms. I know how crazy those qemu-container-wrong-arch things can be. So qemu should not be in play here. Also note that we run the same tests on linux/arm64 with the same arm64 images. No problems there.

Again, thanks. And I know this is a hard one. I'll try to re-investigate a few of these and keep some notes to see if there's any kind of pattern. But it seems like... ddev start, wait until it's up, curl something, EOF failure. That's a common pattern in all the tests, but there doesn't seem to be any particular pattern in what tests fail.

LeZuse · 2021-05-07T00:02:01Z

We had problem with HTTP clients complaining about mismatched content length (response got cut off) and now it seems to work in 3.3.2 so either this is not the same issue or there are more variables at play

_{Sent with GitHawk}

rfay · 2021-05-07T00:14:13Z

@LeZuse I'm betting that your fix was recorded in the 3.3.2 release notes,

Fixed a bug with an Apple chip where the last byte in a network transfer was occasionally lost

Sadly, this one seems to be different.

rfay · 2021-06-04T14:43:20Z

This problem remains. I'll work on a recreation scenario. It's only one time in 10 that the ddev test suite completes without this problem on M1 (problem is ONLY on M1, same tests everywhere).

rfay · 2021-06-09T22:01:38Z

I think I have a recreation scenario, and here's a diagnostic: 165EF212-DB99-4FE1-90A4-E516A309B23F/20210609215723 (Edit: Got another one: 165EF212-DB99-4FE1-90A4-E516A309B23F/20210609221754 )

Most of the time this is probably "Connection reset by peer"

It appears that in this situation both ddev-router and ddev-webserver are completely ready (and have already served something, but internally in the container)

rfay · 2021-06-24T12:50:53Z

Upgraded all test runners to 3.5.0, but this remains a failure somewhere in almost every test run.

hmaesta · 2021-07-07T16:27:50Z

Oh, man... After 24 hours of extreme frustration, here I am.

We have some piece of code that Just Don't Work™ – no errors, no exceptions, no log. Nothing. The request just have a sudden stop coming from nowhere, like being hit by a lightning on a beautiful summer day.

And then I realized that I was the only one suffering from this rare phenomenon. My coleges –followers of Linus Torvalds– didn't even notice that something was odd – just me.

"We use Docker so everyone can have the same environment. It's our code. It's not possible that is just me."

Well, it was. After accepting that M1 could be the cause I started looking for someone as unlucky as me and found this issue.

I uninstalled every piece of Docker in my computer and downgraded from 3.5.1 to 3.3.1, the first to support Apple Silicon, and everything is back to normal – except me, that even now hadn't accepted that I spent 8+ hours of working hours looking for a bug in code.

rfay · 2021-07-28T12:54:58Z

This remains an issue on 3.5.2.

rfay · 2021-08-12T13:20:50Z

Same on 3.6.0

Although I can usually get a full ddev test suite to pass on mac amd64 and Windows amd64, it's very rare that I can get through a full suite on mac M1. Sometimes I retry several times. It's always some random test that has an EOF, connection reset by peer

docker-robott · 2021-11-16T01:00:08Z

Issues go stale after 90 days of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

rfay · 2021-11-16T01:20:29Z

/remove-lifecycle stale
/lifecycle frozen

Just because issues don't get any attention doesn't mean they're stale. This is a consistent problem.

dossy · 2021-12-16T05:35:32Z

Possibly related? #3448

rfay · 2022-05-05T21:18:21Z

This remains a consistent problem. As speculated here it may be related to the use of localhost.

icemanmelting · 2022-07-06T16:55:12Z

I have the same issue with a test. I have a mac mini and a studio that are running 2 different java apps that need to communicate with each other through TCP Sockets, and I get intermitent issues with socket disconnection quite often. This is a stress test, and I have never been able to run it with the 2 computers for more than 1 hour, or 1.5 hours. If I run everything in the mac studio using a docker internal network, then everything runs ok, and I am able to have the communication going for days on end.

ColeoCofer · 2023-05-08T18:08:30Z

The only solution to this problem that I found was to reduce the allocated resources in docker-desktop down to 1 CPU (inspired by this post). My containers are now consistently passing, whereas with 4 CPUs, I would get the EOF error around 4/5 times.

Docker-Desktop Version: 4.18.0 (104112)
Engine: 20.10.24

rfay · 2023-09-23T19:36:07Z

This still happens regularly, but it's not going to be addressed here. Closing.

tisba · 2023-09-24T07:27:00Z

Hey @rfay, could you point us to the issue where this is going to be addressed so we can keep track of the issue? Thanks! 🙏

rfay · 2023-09-24T22:14:06Z

Unfortunately, if it hasn't gotten any attention in 2 1/2 years, I don't think it will get any.

However, it's a poor issue and doesn't have a repro case, as it's intermittent. If you have a good repro case, something that the docker team can easily reproduce, perhaps they'll pay attention if you open a new issue with uploaded diagnostics and a repro case. Consider making a github test repo that demonstrates it.

stephen-turner added the area/m1 M1 preview builds label Mar 2, 2021

rfay mentioned this issue Mar 2, 2021

amd64 New Virtualization Framework host.docker.internal networking broken #5410

Closed

3 tasks

docker-robott added the version/3.4.0 label Jun 9, 2021

hmaesta mentioned this issue Jul 8, 2021

Docker for M1 Mac Hanging with Postgres SQL File Execution #5830

Closed

3 tasks

rfay mentioned this issue Aug 16, 2021

Skip TestGetLocalHTTPResponse on mac m1 - fails always ddev/ddev#3177

Merged

rfay changed the title ~~M1 Mac "EOF" during http access to docker-hosted webserver~~ M1 Mac "EOF" during http access to docker-hosted webserver "Connection reset by peer" Aug 18, 2021

rfay mentioned this issue Aug 18, 2021

[tests only] Mute some tests on mac M1 to avoid random EOF/connection reset by peer problems ddev/ddev#3182

Merged

docker-robott added the lifecycle/stale label Nov 16, 2021

docker-robott removed the lifecycle/stale label Nov 16, 2021

docker-robott added the lifecycle/frozen label Nov 16, 2021

saxomagic mentioned this issue Dec 7, 2021

Docker image compatible with ARM: is that possible? PostgREST/postgrest#1117

Closed

rfay mentioned this issue May 24, 2022

[Tests only] Turn off some tests that commonly fail on mac M1 ddev/ddev#3866

Merged

rfay mentioned this issue Jun 7, 2022

[tests only] Minor test cleanups and exclusions ddev/ddev#3896

Merged

docker-robott added the version/4.18.0 label May 8, 2023

rfay closed this as completed Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M1 Mac "EOF" during http access to docker-hosted webserver "Connection reset by peer" #5407

M1 Mac "EOF" during http access to docker-hosted webserver "Connection reset by peer" #5407

rfay commented Mar 1, 2021 •

edited

Loading

stephen-turner commented Mar 2, 2021

rfay commented Mar 2, 2021

rfay commented Mar 2, 2021

rfay commented Mar 30, 2021

rfay commented Apr 18, 2021

rfay commented Apr 22, 2021

LeZuse commented Apr 28, 2021

rfay commented May 5, 2021 •

edited

Loading

rfay commented May 5, 2021

djs55 commented May 6, 2021

rfay commented May 6, 2021

LeZuse commented May 7, 2021

rfay commented May 7, 2021

rfay commented Jun 4, 2021

rfay commented Jun 9, 2021 •

edited

Loading

rfay commented Jun 24, 2021

hmaesta commented Jul 7, 2021 •

edited

Loading

rfay commented Jul 28, 2021

rfay commented Aug 12, 2021

docker-robott commented Nov 16, 2021

rfay commented Nov 16, 2021

dossy commented Dec 16, 2021

rfay commented May 5, 2022

icemanmelting commented Jul 6, 2022

ColeoCofer commented May 8, 2023

rfay commented Sep 23, 2023

tisba commented Sep 24, 2023

rfay commented Sep 24, 2023

M1 Mac "EOF" during http access to docker-hosted webserver "Connection reset by peer" #5407

M1 Mac "EOF" during http access to docker-hosted webserver "Connection reset by peer" #5407

Comments

rfay commented Mar 1, 2021 • edited Loading

Expected behavior

Actual behavior

Information

Steps to reproduce the behavior

stephen-turner commented Mar 2, 2021

rfay commented Mar 2, 2021

rfay commented Mar 2, 2021

rfay commented Mar 30, 2021

rfay commented Apr 18, 2021

rfay commented Apr 22, 2021

LeZuse commented Apr 28, 2021

rfay commented May 5, 2021 • edited Loading

rfay commented May 5, 2021

djs55 commented May 6, 2021

rfay commented May 6, 2021

LeZuse commented May 7, 2021

rfay commented May 7, 2021

rfay commented Jun 4, 2021

rfay commented Jun 9, 2021 • edited Loading

rfay commented Jun 24, 2021

hmaesta commented Jul 7, 2021 • edited Loading

rfay commented Jul 28, 2021

rfay commented Aug 12, 2021

docker-robott commented Nov 16, 2021

rfay commented Nov 16, 2021

dossy commented Dec 16, 2021

rfay commented May 5, 2022

icemanmelting commented Jul 6, 2022

ColeoCofer commented May 8, 2023

rfay commented Sep 23, 2023

tisba commented Sep 24, 2023

rfay commented Sep 24, 2023

rfay commented Mar 1, 2021 •

edited

Loading

rfay commented May 5, 2021 •

edited

Loading

rfay commented Jun 9, 2021 •

edited

Loading

hmaesta commented Jul 7, 2021 •

edited

Loading