Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chrome crashing on startup on TaskCluster #28209

Closed
jgraham opened this issue Mar 24, 2021 · 15 comments · Fixed by #29095
Closed

Chrome crashing on startup on TaskCluster #28209

jgraham opened this issue Mar 24, 2021 · 15 comments · Fixed by #29095

Comments

@jgraham
Copy link
Contributor

jgraham commented Mar 24, 2021

This is blocking all PRs.

A log excert from a recent master run:

 1:17.94 INFO Starting runner
 1:18.40 pid:2682 [2694:2714:0324/082219.693986:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
 1:18.46 pid:2682 
 1:18.46 pid:2682 DevTools listening on ws://127.0.0.1:40825/devtools/browser/25e20313-7194-4801-b0a8-f8533f4327cd
 1:18.47 pid:2682 [2694:2726:0324/082219.759661:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
 1:18.47 pid:2682 [2694:2726:0324/082219.759696:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
 1:18.48 pid:2682 [2694:2726:0324/082219.773367:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
 1:18.48 pid:2682 [2694:2726:0324/082219.773401:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
 1:18.50 pid:2682 [2694:2710:0324/082219.794201:ERROR:gpu_process_host.cc(990)] GPU process exited unexpectedly: exit_code=132
 1:18.50 pid:2682 [2694:2710:0324/082219.794227:WARNING:gpu_process_host.cc(1298)] The GPU process has crashed 1 time(s)
 1:18.53 pid:2682 [1616574139.828][WARNING]: You are using an unsupported command-line switch: --disable-build-check. Please don't report bugs that cannot be reproduced with this switch removed.
 1:18.54 pid:2682 [2694:2764:0324/082219.840326:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
 1:18.55 pid:2682 [2694:2764:0324/082219.840379:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
 1:18.55 pid:2682 [2694:2764:0324/082219.840432:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
 1:18.55 pid:2682 [2694:2764:0324/082219.840445:WARNING:property.cc(144)] DaemonVersion: GetAndBlock: failed.
 1:18.55 pid:2682 [2694:2764:0324/082219.840482:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
 1:18.55 pid:2682 [2694:2764:0324/082219.840521:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
 1:18.59 pid:2682 [2694:2710:0324/082219.888187:ERROR:gpu_process_host.cc(990)] GPU process exited unexpectedly: exit_code=132
 1:18.59 pid:2682 [2694:2710:0324/082219.888216:WARNING:gpu_process_host.cc(1298)] The GPU process has crashed 2 time(s)
 1:18.68 pid:2682 [2694:2710:0324/082219.978010:ERROR:gpu_process_host.cc(990)] GPU process exited unexpectedly: exit_code=132
 1:18.68 pid:2682 [2694:2710:0324/082219.978036:WARNING:gpu_process_host.cc(1298)] The GPU process has crashed 3 time(s)
 1:18.75 pid:2682 [2694:2710:0324/082220.048966:ERROR:gpu_process_host.cc(990)] GPU process exited unexpectedly: exit_code=132
 1:18.75 pid:2682 [2694:2710:0324/082220.048994:WARNING:gpu_process_host.cc(1298)] The GPU process has crashed 4 time(s)
 1:18.81 pid:2682 [2694:2710:0324/082220.112150:ERROR:gpu_process_host.cc(990)] GPU process exited unexpectedly: exit_code=132
 1:18.82 pid:2682 [2694:2710:0324/082220.112197:WARNING:gpu_process_host.cc(1298)] The GPU process has crashed 5 time(s)
 1:18.90 pid:2682 [2694:2710:0324/082220.192910:ERROR:gpu_process_host.cc(990)] GPU process exited unexpectedly: exit_code=132
 1:18.90 pid:2682 [2694:2710:0324/082220.192944:WARNING:gpu_process_host.cc(1298)] The GPU process has crashed 6 time(s)
 1:18.91 pid:2682 [2779:2779:0324/082220.201331:WARNING:vaapi_wrapper.cc(588)] VAAPI video acceleration not available for disabled
 1:18.91 pid:2682 [2779:2779:0324/082220.201523:ERROR:gpu_init.cc(430)] Passthrough is not supported, GL is disabled
 1:18.91 pid:2682 [2694:2710:0324/082220.205091:WARNING:gpu_process_host.cc(1018)] Reinitialized the GPU process after a crash. The reported initialization time was 0 ms
mem avail: 13565 of 14812 MiB (91 %), swap free:    0 of    0 MiB ( 0 %)
 1:47.50 WARNING Failed to start protocol connection
 1:47.50 WARNING Traceback (most recent call last):
  File "/home/test/web-platform-tests/tools/wptrunner/wptrunner/executors/protocol.py", line 47, in setup
    self.connect()
  File "/home/test/web-platform-tests/tools/wptrunner/wptrunner/executors/executorwebdriver.py", line 329, in connect
    self.webdriver.start()
  File "/home/test/web-platform-tests/tools/webdriver/webdriver/client.py", line 529, in start
    value = self.send_command("POST", "session", body=body)
  File "/home/test/web-platform-tests/tools/webdriver/webdriver/client.py", line 570, in send_command
    response = self.transport.send(
  File "/home/test/web-platform-tests/tools/webdriver/webdriver/transport.py", line 235, in send
    response = self._request(method, uri, payload, headers, timeout=None)
  File "/home/test/web-platform-tests/tools/webdriver/webdriver/transport.py", line 260, in _request
    response = self.connection.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1332, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 303, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 272, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

@foolip

@stephenmcgruer
Copy link
Contributor

Last successful run on master: https://community-tc.services.mozilla.com/tasks/groups/LrP_5K24Sj2GbBqettnASQ

First failing run on master: https://community-tc.services.mozilla.com/tasks/groups/Hr2Ik-iXS6ivgZJfBWsn-Q

There's definitely a chrome update here, from:

Unpacking google-chrome-unstable (91.0.4449.6-1)
to
Unpacking google-chrome-unstable (91.0.4455.2-1)

https://chromium.googlesource.com/chromium/src/+log/91.0.4449.0..91.0.4455.0?pretty=fuller&n=10000 should be the changelog I think, but it's pretty long.

@stephenmcgruer
Copy link
Contributor

If needed, we should be able to pin Chrome Dev to 91.0.4449.6-1 using a similar method as #19360

@stephenmcgruer
Copy link
Contributor

Note that wpt-chrome-dev-print-reftest-1 passed in https://community-tc.services.mozilla.com/tasks/groups/Hr2Ik-iXS6ivgZJfBWsn-Q, which implies that --headless is unaffected (not surprising, given GPU relation)

@stephenmcgruer
Copy link
Contributor

This is strange: I started trying to bisect in #28211, but the most recent build (865011) does not seem to exhibit the crash. (Technically 91.0.4455.2 is 865012 but there's no chrome-linux.zip for that currently). I'm not immediately sure what to conclude from that >_<

@stephenmcgruer
Copy link
Contributor

cc @foolip @jpchase - I'm not going to have time to work on this before vacation. Feel free to ping me tomorrow with any questions, but I think one of y'all will have to take the next steps here .

foolip added a commit that referenced this issue Mar 25, 2021
This is a temporary measure while the cause of the crash is being
investigated. This follows a similar approach taken before:
#19360

Mitigation (not fix) for :wqhttps://github.com//issues/28209.
foolip added a commit that referenced this issue Mar 25, 2021
This is a temporary measure while the cause of the crash is being
investigated. This follows a similar approach taken before:
#19360

Mitigation (not fix) for #28209.
foolip added a commit that referenced this issue Mar 25, 2021
This is a temporary measure while the cause of the crash is being
investigated. This follows a similar approach taken before:
#19360

Mitigation (not fix) for #28209.
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Mar 27, 2021
…9.6, a=testonly

Automatic update from web-platform-tests
[Taskcluster] pin Chrome Dev to 91.0.4449.6 (#28238)

This is a temporary measure while the cause of the crash is being
investigated. This follows a similar approach taken before:
web-platform-tests/wpt#19360

Mitigation (not fix) for web-platform-tests/wpt#28209.
--

wpt-commits: c89882cb8571e454e0643c3584206a48f9b37049
wpt-pr: 28238
@foolip
Copy link
Member

foolip commented Apr 14, 2021

I've sent a revert in #28464 now and it looks like the problem is gone. So we didn't need to investigate :)

@foolip
Copy link
Member

foolip commented Apr 15, 2021

Fixed by #28464.

@foolip foolip closed this as completed Apr 15, 2021
@foolip
Copy link
Member

foolip commented Apr 23, 2021

@foolip foolip reopened this Apr 23, 2021
@foolip
Copy link
Member

foolip commented Apr 23, 2021

It began happening again in 161110d but worked in the previous run for aaf4be7.

When it worked it was Chrome 91.0.4472.19, then it failed with 92.0.4484.7. There's probably a lot of changes between those points, I'll begin by pinning to 91.0.4472.19.

foolip added a commit that referenced this issue Apr 23, 2021
It has begun crashing on startup in 92.0.4484.7 and 91.0.4472.19 is the
last version where it worked:
#28209 (comment)

Mitigation (not fix) for #28209.
@foolip
Copy link
Member

foolip commented Apr 23, 2021

https://chromium.googlesource.com/chromium/src/+log/91.0.4472.19..92.0.4484.7?n=10000 is huge so there's little hope of pinpointing this by just looking at the logs.

@stephenmcgruer did you ever manage to bisect an issue like this by building a binary from the chromium tree and putting it in Taskcluster?

jgraham pushed a commit that referenced this issue Apr 23, 2021
It has begun crashing on startup in 92.0.4484.7 and 91.0.4472.19 is the
last version where it worked:
#28209 (comment)

Mitigation (not fix) for #28209.
@stephenmcgruer
Copy link
Contributor

@foolip Yes. You can start without building Chromium yourself:

$ smcgruer@stiglet2:~/chromium/src$ git diff
diff --git a/tools/bisect-builds.py b/tools/bisect-builds.py
index 1f4fad5fe513..4ea0fffcc2f6 100755
--- a/tools/bisect-builds.py
+++ b/tools/bisect-builds.py
@@ -583,6 +583,7 @@ def FetchRevision(context, rev, filename, quit_event=None, progress_event=None):
       sys.stdout.write('\r' + progress)
       sys.stdout.flush()
   download_url = context.GetDownloadURL(rev)
+  print("Fetching %s" % download_url)
   try:
     urllib.urlretrieve(download_url, filename, ReportHook)
     if progress_event and progress_event.isSet():
$ python tools/bisect-builds.py --verify-range -a linux64 -g GOOD_REV -b BAD_REV --use-lo
cal-cache -- -- no-first-run --user-data-dir=/tmp/

This will print URLs like http://commondatastorage.googleapis.com/chromium-browser-snapshots/Linux_x64/863584/chrome-linux.zip for each rev. You can then edit the Taskcluster code to download that zip and unzip it to get the Chrome binary (if you track down the last time I dealt with this issue there's a PR I used for bisecting that shows how).

Bisecting builds like this usually gets you to a few dozen changes. If things are still unclear at that point you can switch to local Chromium builds - basically you need to do a build, zip it (you don't need every output file, but you do need more than just the chrome executable), and put it on GCS or somewhere else reachable by Taskcluster.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Apr 25, 2021
…2.19, a=testonly

Automatic update from web-platform-tests
[Taskcluster] pin Chrome Dev to 91.0.4472.19

It has begun crashing on startup in 92.0.4484.7 and 91.0.4472.19 is the
last version where it worked:
web-platform-tests/wpt#28209 (comment)

Mitigation (not fix) for web-platform-tests/wpt#28209.

--

wpt-commits: 8bc9fdfd1dfdd7303ed2753029f0f5f4db10e70e
wpt-pr: 28661
@foolip
Copy link
Member

foolip commented May 7, 2021

Thanks @stephenmcgruer! That makes a lot of sense, but also sounded like work, so I procrastinated util now and have just sent #28902 to see if the issue has maybe magically resolved itself. If it hasn't, I'll have to do some bisecting :/

@foolip
Copy link
Member

foolip commented May 7, 2021

Nope, the problem has not been fixed, so I'm in for some debugging...

@foolip
Copy link
Member

foolip commented May 13, 2021

I've filed https://bugs.chromium.org/p/chromium/issues/detail?id=1208904 to hopefully get some help from GPU experts in Chromium.

Not really knowing what else to try, I've also sent #28992 to see how much the results change if we just use headless Chrome. That is a well-maintained way of running Chrome without a real GPU, after all.

foolip added a commit that referenced this issue May 24, 2021
This should allow unpinning Chrome to the latest version, but it's first
done while keeping the version pinned to vet the differences.

Part of #28209.
foolip added a commit that referenced this issue May 25, 2021
This should allow unpinning Chrome to the latest version, but it's first
done while keeping the version pinned to vet the differences.

Part of #28209.
foolip added a commit that referenced this issue Jun 2, 2021
This should allow unpinning Chrome to the latest version, but it's first
done while keeping the version pinned to vet the differences.

Part of #28209.
foolip added a commit that referenced this issue Jun 3, 2021
This should allow unpinning Chrome to the latest version, but it's first
done while keeping the version pinned to vet the differences.

Part of #28209.
@foolip
Copy link
Member

foolip commented Jun 3, 2021

Chrome Dev is now running the latest version again. Diff:
It worked! https://wpt.fyi/results/?diff&filter=ADC&run_id=5733201030938624&run_id=5132966231539712

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jun 8, 2021
… version used on Taskcluster, a=testonly

Automatic update from web-platform-tests
Introduce an --enable-swiftshader argument for Chrome

This should allow unpinning Chrome to the latest version, but it's first
done while keeping the version pinned to vet the differences.

Part of web-platform-tests/wpt#28209.

--
Revert "[Taskcluster] pin Chrome Dev to 91.0.4472.19"

This reverts web-platform-tests/wpt#28661.

This requires updating infrastructure/server/context.any.js
expectations.

--

wpt-commits: b0e65e62d150a0cd1556dd7266aad765c49fe0d6, 8ebb14dc93f18c6558faa1608378df64ca4ee72f
wpt-pr: 29095
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Jun 9, 2021
… version used on Taskcluster, a=testonly

Automatic update from web-platform-tests
Introduce an --enable-swiftshader argument for Chrome

This should allow unpinning Chrome to the latest version, but it's first
done while keeping the version pinned to vet the differences.

Part of web-platform-tests/wpt#28209.

--
Revert "[Taskcluster] pin Chrome Dev to 91.0.4472.19"

This reverts web-platform-tests/wpt#28661.

This requires updating infrastructure/server/context.any.js
expectations.

--

wpt-commits: b0e65e62d150a0cd1556dd7266aad765c49fe0d6, 8ebb14dc93f18c6558faa1608378df64ca4ee72f
wpt-pr: 29095
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants