Enable test duration writes on all runs #14

sondrelg · 2021-05-30T22:43:30Z

This PR:

Adds the possibility of outputting durations using --store-durations even when splitting. This seems useful for maintaining a "warm cache" of durations. If test fixtures change, or test environments are upgraded, test durations can now continuously update to reflect those changes.
Add the possibility of running pytest-split with no durations file present. In that case we now just split tests evenly. This is a small detail, but one that will make implementing continuous updates in a ci-environment a little easier I think.
Changes the saved data structure from a list of lists to a dict. This makes continuous updates easier since we can just update or add affected keys.

There's also a slight restructuring wrt. pytest plugin structure, which is not a big deal if you prefer to revert. Just let me know - I just thought it might make more sense to register cache/split functionality based on certain criteria rather than run it every time and return early when not relevant. At least that was what I was trying to do 🙂

jerry-git · 2021-05-31T07:21:00Z

Some thoughts based on the PR description (haven't looked at the code yet 🙂 ).

Implements writes to the durations file on every run, even when splitting tests.

This is OK if it's still behind --store-durations. I.e. if that flag is not used, I would not store the durations.

Moves the default test-duration storage file to the pytest cache at .pytest_cache/v/cache/pytest-split

This is problematic as basically all the projects have .pytest_cache in their gitignore.

sondrelg · 2021-05-31T07:59:06Z

Implements writes to the durations file on every run, even when splitting tests.

This is OK if it's still behind --store-durations. I.e. if that flag is not used, I would not store the durations.

Ok that sounds good 👍 I'll make that the default behavior

Moves the default test-duration storage file to the pytest cache at .pytest_cache/v/cache/pytest-split

This is problematic as basically all the projects have .pytest_cache in their gitignore.

Yeah I should maybe have waited to let you respond in #11 before starting, but had some time yesterday so thought I'd jump into it 🙂

Personally I would prefer that the cached durations be left out of version control, since the files generated can become quite big. At work we have a medium sized project with ~4000 tests which produces a 550kb sized test-durations output file, which is quite big and will become part of our PR reviews every time it changes.

At the same time, there's no need to make this the default behavior if you don't want it to be - we could skip the pytest-cache stuff altogether, and I could just put the output file (.test-durations) in my .gitignore at work. Do you maybe want to reply to the posts in the issue first, and then we can progress from there?

sondrelg · 2021-05-31T11:32:12Z

I think I have a good idea of how to proceed now @jerry-git; thanks for responding in the issue.

One final thing before finalizing this - would you mind quickly taking a look at the _split_tests method I added? I had a little bit of trouble understanding the original split-function (looking at it now it seems makes sense, I think I was just tired), so I changed it slightly.

Some of the changes are new and have to be there (I think), like:

We need to set avg_duration_per_test = 1 when no durations are found to support running tests when no durations file has been added yet. I guess this would be considered a new backwards-compatible feature.

Then there's a change that's a consequence of changing how we select (and deselect) tests a little

items[:] = selected_tests        
config.hook.pytest_deselected(items=deselected_tests)

The deselection results in pytest outputting something like Found 45, selected 12, deselected 33 which I think is a slight improvement 🙂

The final change is that instead of finding the index at which the runtime threshold is reached, I've now sorted the tests by slowest first and then add faster tests to get as close to the threshold as possible. It might be overkill and could be reverted if you don't see any value in it.

If you could review that function specifically I think I have everything I need to finish 👍

jerry-git · 2021-05-31T13:46:39Z

We need to set avg_duration_per_test = 1 when no durations ...

Sounds legit 👍

The deselection results in pytest outputting something like Found 45, selected 12, deselected 33 which I think is a slight improvement

👍

I've now sorted the tests by slowest first and then add faster tests to get as close to the threshold as possible

One of the key features of pytest-split is that it maintains the order of tests, just make sure we don't lose that 🙂 One future feature idea would be to randomise based on some flag (i.e. something similar as pytest-randomly is doing) but let's maintain the alphabetical order by default.

sondrelg · 2021-05-31T13:54:38Z

Sounds good 👍 🙂

sondrelg · 2021-05-31T16:28:13Z

I'm more or less done I think 🙂 Just have one quick question @jerry-git: what do you think about making the pytest-split output stand out a little?

This is all the output messages I've left in currently (the top one will only be output when no durations file is found)

It might not be a big deal, but right now it might be hard to differentiate between regular pytest output and plugin output.

How do you feel about doing changing the color of the output slightly? Like this (bold white):

Or this (yellow)? 😄

The last one might be a little much - a color change might not be necessary - perhaps adding a >> to the start of the output strings or an equivalent marker would have the same effect.

Might also not be necessary at all, just thought I'd ask 🙂

sondrelg · 2021-05-31T20:29:49Z

A lot of back and forth here, but I think the PR is now ready for a second look when you have time @jerry-git 🙂

.github/workflows/test.yml

pyproject.toml

src/pytest_split/plugin.py

tests/test_plugin.py

mbkroese · 2021-06-01T08:45:21Z

@sondrelg this PR will have a lot of conflicts with this one: #12

In that PR I'm basically adding the extra printouts that you're also adding in this PR. I also change the logic somewhat to use a different PyTest hook. I agree the idea for the colours is nice, but I think it would be better to keep these changes in separate PRs. What do you think about commenting on my PR, so we can merge that, and afterwards apply your PR on top of it?

sondrelg · 2021-06-01T08:55:10Z

@mbkroese that sounds good to me 👍

Sorry I didn't see it before - not my intention to create conflicts 🙂 If you and Jerry can agree on how to resolve #12 I'll pull master and resolve conflicts afterwards.

sondrelg · 2021-06-07T13:16:11Z

Looked up the durations and yeah it looks like the start-time is recorded:

These are the tests over 100s in my durations file with the recorded time and the time it took to re-run them locally.

::test 243.08475018500008 - 2.63s
::test 233.48524261500006 - 2.14s
::test[variation1] 202.96748632899997 -
::test[variation2] 202.61207506399998 - 8.51s
::test 237.58321302000002 - 4.41
::test 236.68534560499998 - 4.53
::test 190.07624340700005 - 5.36
::test 190.2456836280001 - 5.34
::test 194.08574962100002 - 4.46
::test 194.78907412200005 - 4.31

For some reason I have 10 (9) tests with long runtimes, when I split in 8 groups 🤔 But looks like a good explanation of the unevenness!

jerry-git · 2021-06-08T07:41:39Z

What happens in a situation in which I'd have the .test_durations file in the old format and I'd run with a version of pytest-split which has this changeset present?

The implementation would fail 🙂
Adding handling for this would be easy though - should we just re-format the data on-read if we see the old format and write with the new format?

I see two options:

We accept the non-backwards-compatibility and communicate it clearly (e.g. with some release notes).
Reading would support both the old and the new format. Writing would always use the new format. However, I would only write when --store-durations is used.
would be easiest and 2. would be most user-friendly. If 2 would be used, there could be some EOL for the old format support so that the code could be simplified at some point.

Any thoughts?

jerry-git · 2021-06-08T07:45:13Z

Great insights with the durations! I assume @sondrelg's findings are related to running db migrations in some fixture once per sub suite?

jerry-git · 2021-06-08T07:47:26Z

I won't do anything new with this PR though, I'll just wait until #12 is merged, fix conflicts, and fix the backwards compatibility issue + any new feedback you might have 🙂

Great! I'll merge #12 so we can move forward 🙂 Awesome work and knowledge sharing from both @sondrelg & @mbkroese! 💎

sondrelg · 2021-06-08T08:11:03Z

I see two options:

We accept the non-backwards-compatibility and communicate it clearly (e.g. with some release notes).

Reading would support both the old and the new format. Writing would always use the new format. However, I would only write when --store-durations is used.

would be easiest and 2. would be most user-friendly. If 2 would be used, there could be some EOL for the old format support so that the code could be simplified at some point.

Any thoughts?

I don't mind adding support for reading both. I can add a comment to remember to remove support for the old format when you release v1 for example 🙂

Great insights with the durations! I assume @sondrelg's findings are related to running db migrations in some fixture once per sub suite?

Yep! This fixture is what takes so long, and it looks like the way pytest is set up, the setup times of all fixtures (including those scoped for a whole session) are attributed to the first test in the test suite. I opened this issue in the pytest repo; hopefully it's possible to fix this there in time.

In the meantime though, I would suggest pytest-split should do something like this where we just remove the setup time of the first test in the test suite.

I'm testing that branch on my project at work now and it seems to have gotten rid of a lot of the variability 🎉

If you're keen to work on the split logic and variability fixes after this PR is merged, feel free to tag me in the PR and I'd be happy to test/review it for you when it's ready 🙂

jerry-git · 2021-06-08T09:28:42Z

In the meantime though, I would suggest pytest-split should do something like this where we just remove the setup time of the first test in the test suite.

This sounds like a potential solution but how about if the first test doesn't use the heavy fixture? 🙂

sondrelg · 2021-06-08T11:21:15Z

In the meantime though, I would suggest pytest-split should do something like this where we just remove the setup time of the first test in the test suite.

This sounds like a potential solution but how about if the first test doesn't use the heavy fixture? 🙂

I don't think a perfect solution exists. We either risk reducing the setup time of the first test too much, or too little (current behaviour is also not correct right). And I guess an important point is that pytest-split cannot know which of them it is.

Since pytest-split cannot know, but we know that all session scoped fixtures will always be attributed to the first test, then I think it's pretty safe to assume that the first test will almost always be inflated. And the larger a project gets, the more pronounced the problem becomes, while under-estimation for a single test would probably matter less and less. There is a chance that some project will have a ridiculous setup time for the first test in their test suite, and changing the behaviour as suggested above would cause them problems, but I think it's about 1000x times less likely than the other way around 😛

Another option though is to let the user decide - the user actually can know what plugins they use, and what their fixture setups look like, so they are in a much better position to make the decision. A possible solution might be to discard the first setup time if --discard-initial-setup is passed; or perhaps that should be the default and users could opt-out instead?

I guess the question boils down to whether adding the choice is worth it, or if discarding the initial setup time of the first test is so unlikely to cause significant problems for users that it should just be default behaviour. I don't see a harm in adding an opt-out 🙂

sondrelg · 2021-06-08T11:28:47Z

These are the outputs after the last commit

Running without a test durations file

❯ pytest . --splits 2 --group 1 --store-durations

[pytest-split] No test durations found. Pytest-split will split tests evenly when no durations are found. 
[pytest-split] You can expect better results in consequent runs, when test timings have been documented.

========================================================================================================================== test session starts ==========================================================================================================================
platform darwin -- Python 3.9.4, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/sondrelg/Documents/pytest-split
plugins: cov-2.12.0, mock-3.6.1, django-4.4.0, split-0.1.6.dev28+gc506046.d20210608
collecting ... 

[pytest-split] Running group 1/2

collected 16 items / 7 deselected / 9 selected                                                                                                                                                                                                                          

tests/test_plugin.py .........                                                                                                                                                                                                                                    [100%]

[pytest-split] Stored test durations in /Users/sondrelg/Documents/pytest-split/.test_durations

==================================================================================================================== 9 passed, 7 deselected in 1.33s ====================================================================================================================

Since we now deselect tests, we get this nice collected 16 items / 7 deselected / 9 selected output, so just outputting the group in [pytest-split] Running group 1/2 seemed sufficient 🙂

sondrelg · 2021-06-08T11:37:19Z

PR should be ready to review when you're ready @jerry-git 👏

jerry-git · 2021-06-08T11:49:09Z

but we know that all session scoped fixtures will always be attributed to the first test

This is not true, consider e.g.

import time
import pytest

@pytest.fixture(scope="session")
def expensive_session_fixture():
    time.sleep(10)

def test_1():
    ...

def test_2(expensive_session_fixture):
    ...

Yup, we could have some flag for enabling / disabling the "skip measuring the first setup". Considering your use case (and the whole thing in general), I think more valuable flag could be something like --discard-long-durations 20 which would basically mean that it doesn't store durations for tests which are measured to last more than 20 seconds.

sondrelg · 2021-06-08T11:54:05Z

Would you only apply the long-duration threshold for the setup time on the first test still?

jerry-git · 2021-06-08T12:10:03Z

Would you only apply the long-duration threshold for the setup time on the first test still?

I would treat every test identically 🙂

A configurable threshold would also provide a workaround against misc bugs with other libraries, see e.g. #1

jerry-git · 2021-06-08T12:27:11Z

src/pytest_split/plugin.py

+        # This code provides backwards compatibility after we switched
+        # from saving durations in a list-of-lists to a dict format
+        # Remove this when bumping to v1


💪

Is there a test case for this?

I'll add one 👍

Should probably add a 100% coverage requirement as well at some point 🤔

Do it tomorrow 😂

Added the old format to test in 794fedf

I could help you write some tests if you want, and if you'd like I could also help implement poetry for package management and dependency stuff 🙂

Looks like their website is down - probably for the same reason Github was down earlier.

Poetry would be welcome indeed 👍

…oth correctly

jerry-git

💪

jerry-git · 2021-06-08T13:25:32Z

FYI, available in 0.2.0!

mbkroese · 2021-06-08T16:59:23Z

collected 16 items / 7 deselected / 9 selected

This looks really good 👍

sondrelg added 4 commits May 30, 2021 13:54

Add typed metadata and py.typed pep561 file

711e0bc

Remove --store-durations arg and write to pytest-cache by default

869700c

wip

7161e05

Add --store-durations back

cd95f85

sondrelg force-pushed the write-to-cache branch from 0cbb911 to cd95f85 Compare May 30, 2021 23:03

sondrelg changed the title ~~Write to cache~~ Enable test duration writes on all runs May 30, 2021

jerry-git self-requested a review May 31, 2021 07:18

sondrelg mentioned this pull request May 31, 2021

Pre-commit config, linting, and type hints #13

Merged

sondrelg added 3 commits May 31, 2021 16:58

Pull master

caf87d2

Only save durations if --store-durations is passed

5f139b7

Make .test_durations the default output, and remove .pytest_cache writes

aee5a16

Make test splits respect original ordering

552cfa0

sondrelg force-pushed the write-to-cache branch 5 times, most recently from e7858aa to 5f4078e Compare May 31, 2021 20:17

Cleanup

e9206aa

sondrelg force-pushed the write-to-cache branch from 5f4078e to e9206aa Compare May 31, 2021 20:28

sondrelg commented May 31, 2021

View reviewed changes

jerry-git mentioned this pull request Jun 1, 2021

Update messaging in case splitting is skipped and check for UsageErrors more aggressively #12

Merged

Pull master

c506046

Clean up messages and add support for loading the old list format

41fe775

sondrelg force-pushed the write-to-cache branch 2 times, most recently from 73015f4 to cb87064 Compare June 8, 2021 11:34

Run on pytest 5 and 6 - 4 is no compatible

0f78668

sondrelg force-pushed the write-to-cache branch from cb87064 to 0f78668 Compare June 8, 2021 11:36

jerry-git reviewed Jun 8, 2021

View reviewed changes

Add the old format into 'test_it_splits' to make sure we're reading b…

604948d

…oth correctly

sondrelg force-pushed the write-to-cache branch from 794fedf to 604948d Compare June 8, 2021 12:40

jerry-git approved these changes Jun 8, 2021

View reviewed changes

jerry-git merged commit beed24d into jerry-git:master Jun 8, 2021

mbkroese mentioned this pull request Jun 11, 2021

Add splitting algorithm #16

Merged

This was referenced Jun 16, 2021

Support updating test durations when running for a single group #11

Closed

duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests #26

Open

sondrelg mentioned this pull request Jun 23, 2021

Get rid of test front-loading in the duration_based_chunks algo #27

Closed

jerry-git mentioned this pull request Dec 28, 2021

[Actions] Auto-Update cookiecutter template #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable test duration writes on all runs #14

Enable test duration writes on all runs #14

sondrelg commented May 30, 2021 •

edited

Loading

jerry-git commented May 31, 2021

sondrelg commented May 31, 2021

sondrelg commented May 31, 2021

jerry-git commented May 31, 2021

sondrelg commented May 31, 2021

sondrelg commented May 31, 2021

sondrelg commented May 31, 2021

mbkroese commented Jun 1, 2021

sondrelg commented Jun 1, 2021

sondrelg commented Jun 7, 2021

jerry-git commented Jun 8, 2021

jerry-git commented Jun 8, 2021

jerry-git commented Jun 8, 2021

sondrelg commented Jun 8, 2021

jerry-git commented Jun 8, 2021

sondrelg commented Jun 8, 2021

sondrelg commented Jun 8, 2021

sondrelg commented Jun 8, 2021

jerry-git commented Jun 8, 2021

sondrelg commented Jun 8, 2021

jerry-git commented Jun 8, 2021

jerry-git Jun 8, 2021

sondrelg Jun 8, 2021

jerry-git Jun 8, 2021

sondrelg Jun 8, 2021

sondrelg Jun 8, 2021 •

edited

Loading

jerry-git Jun 8, 2021

jerry-git left a comment

jerry-git commented Jun 8, 2021

mbkroese commented Jun 8, 2021

Enable test duration writes on all runs #14

Enable test duration writes on all runs #14

Conversation

sondrelg commented May 30, 2021 • edited Loading

jerry-git commented May 31, 2021

sondrelg commented May 31, 2021

sondrelg commented May 31, 2021

jerry-git commented May 31, 2021

sondrelg commented May 31, 2021

sondrelg commented May 31, 2021

sondrelg commented May 31, 2021

mbkroese commented Jun 1, 2021

sondrelg commented Jun 1, 2021

sondrelg commented Jun 7, 2021

jerry-git commented Jun 8, 2021

jerry-git commented Jun 8, 2021

jerry-git commented Jun 8, 2021

sondrelg commented Jun 8, 2021

jerry-git commented Jun 8, 2021

sondrelg commented Jun 8, 2021

sondrelg commented Jun 8, 2021

sondrelg commented Jun 8, 2021

jerry-git commented Jun 8, 2021

sondrelg commented Jun 8, 2021

jerry-git commented Jun 8, 2021

jerry-git Jun 8, 2021

Choose a reason for hiding this comment

sondrelg Jun 8, 2021

Choose a reason for hiding this comment

jerry-git Jun 8, 2021

Choose a reason for hiding this comment

sondrelg Jun 8, 2021

Choose a reason for hiding this comment

sondrelg Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

jerry-git Jun 8, 2021

Choose a reason for hiding this comment

jerry-git left a comment

Choose a reason for hiding this comment

jerry-git commented Jun 8, 2021

mbkroese commented Jun 8, 2021

sondrelg commented May 30, 2021 •

edited

Loading

sondrelg Jun 8, 2021 •

edited

Loading