faster/more efficient testing in CI #3763

wileyj · 2023-06-22T14:54:11Z

Goal: reduce the time it takes to run all required tests when CI runs.

currently, there are 2 several jobs that are either repeating build steps, or loading a large amount of data that is affecting how quickly a test can run.
further, there seems to be some improvements that can be made by using nextest (https://github.com/nextest-rs/nextest) vs cargo test.

I'm looking into a few approaches here, firstly using default github action workflows with modified testing jobs:

cache binaries and other data as much as possible per run vs building extremely large docker images that need to be copied to a new job (currently a workflow is copying an 8GB image > 60 times).
archive (and cache) binaries that can be used with nextests's --archive flag
keep the structure of the job matrix as-is, but run the tests using the cached data and cached nextest archive.

in a fork, for the most part, the above changes cut the test time nearly in half - but i'm finding it to be a bit inconsistent due to the way github actions provisions runners (on occasion, i've had a job wait for 30 minutes for some cpu time on a simple task such as retrieving cache data, which on average takes about 10s).

a second approach for the above i'm looking at is removing the current test matrix, and creating a single job with a bash list/for loop to process tests. this will get around the issue of waiting for cpu time, but the drawbacks may outweigh the benefit - namely, if there are flaky tests, or any issue out of our control, the entire workflow will fail vs maybe a single test or two.
nevertheless, i'm trying it to see if it's a possible solution.

finally, there are ways to get more reliable CPU for the actions, i.e. using a service like buildjet (this will also be checked as an option).

I am also currently having an issue starting bitcoind in a few of the tests that require it, but overall most of the tests are completing rather quickly. i don't consider it a blocker; i'll figure it out.

@obycode @pavitthrap @kantai i would appreciate you all looking at the tests we're currently running and advising if we still need them for CI (looking at the naming alone, i'm curious about the tests for older epochs, like tests::epoch_205::transition_empty_blocks

The text was updated successfully, but these errors were encountered:

kantai · 2023-06-22T18:30:36Z

in a fork, for the most part, the above changes cut the test time nearly in half - but i'm finding it to be a bit inconsistent due to the way github actions provisions runners (on occasion, i've had a job wait for 30 minutes for some cpu time on a simple task such as retrieving cache data, which on average takes about 10s).

In my opinion, this seems like a pretty big win already.

a second approach for the above i'm looking at is removing the current test matrix, and creating a single job with a bash list/for loop to process tests. this will get around the issue of waiting for cpu time, but the drawbacks may outweigh the benefit - namely, if there are flaky tests, or any issue out of our control, the entire workflow will fail vs maybe a single test or two.

I worry that this would reduce the potential concurrency. Failed tests could also step on other tests by failing to close bitcoind, reap zombie processes, etc.

@obycode @pavitthrap @kantai i would appreciate you all looking at the tests we're currently running and advising if we still need them for CI (looking at the naming alone, i'm curious about the tests for older epochs, like tests::epoch_205::transition_empty_blocks

My preference is to leave any non-flaky tests in place. These kinds of tests are important to detect potential regressions. It may be that the run frequency for these tests could be reduced, i.e., regression tests only run before merging, not on every PR open + update, but that may require a bunch of work to figure out how to implement something like that.

wileyj · 2023-06-23T16:28:32Z

@kantai i did attempt the second approach merely as an idea - i had no confidence it would be useful, and my test simply proved that. it took far longer looping through the list of tests in a run step (i.e. a for loop) so i can safely say this won't be an approach i'll be pursuing.

there are a few other issues i'm dealing with regarding the tests, namely bitcoind isn't starting properly on the github runner (not sure why, needs more investigation) so a single test is reliably failing still. Otherwise, i see good test times for the other 63 tests - but one thing I don't think can be solved with the current platform is this:
on occasion, a test that usually completes quickly can sometimes take > 30 mins due to factors out of our control. because github actions is a bit opaque, i'm guessing that we simply don't have access to any CPU to run the job.
for example, i've seem a step (download a cache) take > 30 minutes. re-running the test completes in ~2 minutes.

so, another aspect i'm looking into here is how to make these workflows consistent and reliable (alongside how to run them faster). buildjet has been mentioned, which i'll be looking into (once the bitcoind issue is resolved above).
i'll also be looking around to see what other options are available as well.

you also raise an interesting point about re-running tests on PR updates etc. as part of this work, i think can mock up something to simulate how it could work, and we can work out the details later on.

wileyj added chore Necessary but less impactful tasks such as cleanup or reorg CI labels Jun 22, 2023

wileyj self-assigned this Jun 22, 2023

wileyj linked a pull request Aug 15, 2023 that will close this issue

CI changes to improve build/testing times #3847

Closed

wileyj closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster/more efficient testing in CI #3763

faster/more efficient testing in CI #3763

wileyj commented Jun 22, 2023

kantai commented Jun 22, 2023

wileyj commented Jun 23, 2023 •

edited

Loading

faster/more efficient testing in CI #3763

faster/more efficient testing in CI #3763

Comments

wileyj commented Jun 22, 2023

kantai commented Jun 22, 2023

wileyj commented Jun 23, 2023 • edited Loading

wileyj commented Jun 23, 2023 •

edited

Loading