Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable test duration writes on all runs #14

Merged
merged 13 commits into from
Jun 8, 2021

Conversation

sondrelg
Copy link
Contributor

@sondrelg sondrelg commented May 30, 2021

This PR:

  • Adds the possibility of outputting durations using --store-durations even when splitting. This seems useful for maintaining a "warm cache" of durations. If test fixtures change, or test environments are upgraded, test durations can now continuously update to reflect those changes.
  • Add the possibility of running pytest-split with no durations file present. In that case we now just split tests evenly. This is a small detail, but one that will make implementing continuous updates in a ci-environment a little easier I think.
  • Changes the saved data structure from a list of lists to a dict. This makes continuous updates easier since we can just update or add affected keys.

There's also a slight restructuring wrt. pytest plugin structure, which is not a big deal if you prefer to revert. Just let me know - I just thought it might make more sense to register cache/split functionality based on certain criteria rather than run it every time and return early when not relevant. At least that was what I was trying to do 🙂

@sondrelg sondrelg changed the title Write to cache Enable test duration writes on all runs May 30, 2021
@jerry-git jerry-git self-requested a review May 31, 2021 07:18
@jerry-git
Copy link
Owner

Some thoughts based on the PR description (haven't looked at the code yet 🙂 ).

Implements writes to the durations file on every run, even when splitting tests.

This is OK if it's still behind --store-durations. I.e. if that flag is not used, I would not store the durations.

Moves the default test-duration storage file to the pytest cache at .pytest_cache/v/cache/pytest-split

This is problematic as basically all the projects have .pytest_cache in their gitignore.

@sondrelg
Copy link
Contributor Author

Implements writes to the durations file on every run, even when splitting tests.

This is OK if it's still behind --store-durations. I.e. if that flag is not used, I would not store the durations.

Ok that sounds good 👍 I'll make that the default behavior

Moves the default test-duration storage file to the pytest cache at .pytest_cache/v/cache/pytest-split

This is problematic as basically all the projects have .pytest_cache in their gitignore.

Yeah I should maybe have waited to let you respond in #11 before starting, but had some time yesterday so thought I'd jump into it 🙂

Personally I would prefer that the cached durations be left out of version control, since the files generated can become quite big. At work we have a medium sized project with ~4000 tests which produces a 550kb sized test-durations output file, which is quite big and will become part of our PR reviews every time it changes.

At the same time, there's no need to make this the default behavior if you don't want it to be - we could skip the pytest-cache stuff altogether, and I could just put the output file (.test-durations) in my .gitignore at work. Do you maybe want to reply to the posts in the issue first, and then we can progress from there?

@sondrelg
Copy link
Contributor Author

I think I have a good idea of how to proceed now @jerry-git; thanks for responding in the issue.

One final thing before finalizing this - would you mind quickly taking a look at the _split_tests method I added? I had a little bit of trouble understanding the original split-function (looking at it now it seems makes sense, I think I was just tired), so I changed it slightly.

Some of the changes are new and have to be there (I think), like:

  • We need to set avg_duration_per_test = 1 when no durations are found to support running tests when no durations file has been added yet. I guess this would be considered a new backwards-compatible feature.

Then there's a change that's a consequence of changing how we select (and deselect) tests a little

items[:] = selected_tests        
config.hook.pytest_deselected(items=deselected_tests)

The deselection results in pytest outputting something like Found 45, selected 12, deselected 33 which I think is a slight improvement 🙂

The final change is that instead of finding the index at which the runtime threshold is reached, I've now sorted the tests by slowest first and then add faster tests to get as close to the threshold as possible. It might be overkill and could be reverted if you don't see any value in it.

If you could review that function specifically I think I have everything I need to finish 👍

@jerry-git
Copy link
Owner

We need to set avg_duration_per_test = 1 when no durations ...

Sounds legit 👍

The deselection results in pytest outputting something like Found 45, selected 12, deselected 33 which I think is a slight improvement

👍

I've now sorted the tests by slowest first and then add faster tests to get as close to the threshold as possible

One of the key features of pytest-split is that it maintains the order of tests, just make sure we don't lose that 🙂 One future feature idea would be to randomise based on some flag (i.e. something similar as pytest-randomly is doing) but let's maintain the alphabetical order by default.

@sondrelg
Copy link
Contributor Author

Sounds good 👍 🙂

@sondrelg
Copy link
Contributor Author

I'm more or less done I think 🙂 Just have one quick question @jerry-git: what do you think about making the pytest-split output stand out a little?

This is all the output messages I've left in currently (the top one will only be output when no durations file is found)

image

It might not be a big deal, but right now it might be hard to differentiate between regular pytest output and plugin output.

How do you feel about doing changing the color of the output slightly? Like this (bold white):

image

Or this (yellow)? 😄

image

The last one might be a little much - a color change might not be necessary - perhaps adding a >> to the start of the output strings or an equivalent marker would have the same effect.

Might also not be necessary at all, just thought I'd ask 🙂

@sondrelg sondrelg force-pushed the write-to-cache branch 5 times, most recently from e7858aa to 5f4078e Compare May 31, 2021 20:17
@sondrelg
Copy link
Contributor Author

A lot of back and forth here, but I think the PR is now ready for a second look when you have time @jerry-git 🙂

.github/workflows/test.yml Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
src/pytest_split/plugin.py Outdated Show resolved Hide resolved
src/pytest_split/plugin.py Outdated Show resolved Hide resolved
src/pytest_split/plugin.py Show resolved Hide resolved
src/pytest_split/plugin.py Outdated Show resolved Hide resolved
tests/test_plugin.py Show resolved Hide resolved
@mbkroese
Copy link

mbkroese commented Jun 1, 2021

@sondrelg this PR will have a lot of conflicts with this one: #12

In that PR I'm basically adding the extra printouts that you're also adding in this PR. I also change the logic somewhat to use a different PyTest hook. I agree the idea for the colours is nice, but I think it would be better to keep these changes in separate PRs. What do you think about commenting on my PR, so we can merge that, and afterwards apply your PR on top of it?

@sondrelg
Copy link
Contributor Author

sondrelg commented Jun 1, 2021

@mbkroese that sounds good to me 👍

Sorry I didn't see it before - not my intention to create conflicts 🙂 If you and Jerry can agree on how to resolve #12 I'll pull master and resolve conflicts afterwards.

@sondrelg
Copy link
Contributor Author

sondrelg commented Jun 7, 2021

Looked up the durations and yeah it looks like the start-time is recorded:

These are the tests over 100s in my durations file with the recorded time and the time it took to re-run them locally.

::test 243.08475018500008 - 2.63s
::test 233.48524261500006 - 2.14s
::test[variation1] 202.96748632899997 -
::test[variation2] 202.61207506399998 - 8.51s
::test 237.58321302000002 - 4.41
::test 236.68534560499998 - 4.53
::test 190.07624340700005 - 5.36
::test 190.2456836280001 - 5.34
::test 194.08574962100002 - 4.46
::test 194.78907412200005 - 4.31

For some reason I have 10 (9) tests with long runtimes, when I split in 8 groups 🤔 But looks like a good explanation of the unevenness!

@jerry-git
Copy link
Owner

What happens in a situation in which I'd have the .test_durations file in the old format and I'd run with a version of pytest-split which has this changeset present?

The implementation would fail 🙂
Adding handling for this would be easy though - should we just re-format the data on-read if we see the old format and write with the new format?

I see two options:

  1. We accept the non-backwards-compatibility and communicate it clearly (e.g. with some release notes).

  2. Reading would support both the old and the new format. Writing would always use the new format. However, I would only write when --store-durations is used.

  3. would be easiest and 2. would be most user-friendly. If 2 would be used, there could be some EOL for the old format support so that the code could be simplified at some point.

Any thoughts?

@jerry-git
Copy link
Owner

Great insights with the durations! I assume @sondrelg's findings are related to running db migrations in some fixture once per sub suite?

@jerry-git
Copy link
Owner

I won't do anything new with this PR though, I'll just wait until #12 is merged, fix conflicts, and fix the backwards compatibility issue + any new feedback you might have 🙂

Great! I'll merge #12 so we can move forward 🙂 Awesome work and knowledge sharing from both @sondrelg & @mbkroese! 💎

@sondrelg
Copy link
Contributor Author

sondrelg commented Jun 8, 2021

I see two options:

  1. We accept the non-backwards-compatibility and communicate it clearly (e.g. with some release notes).
  2. Reading would support both the old and the new format. Writing would always use the new format. However, I would only write when --store-durations is used.
  3. would be easiest and 2. would be most user-friendly. If 2 would be used, there could be some EOL for the old format support so that the code could be simplified at some point.

Any thoughts?

I don't mind adding support for reading both. I can add a comment to remember to remove support for the old format when you release v1 for example 🙂

Great insights with the durations! I assume @sondrelg's findings are related to running db migrations in some fixture once per sub suite?

Yep! This fixture is what takes so long, and it looks like the way pytest is set up, the setup times of all fixtures (including those scoped for a whole session) are attributed to the first test in the test suite. I opened this issue in the pytest repo; hopefully it's possible to fix this there in time.

In the meantime though, I would suggest pytest-split should do something like this where we just remove the setup time of the first test in the test suite.

I'm testing that branch on my project at work now and it seems to have gotten rid of a lot of the variability 🎉

If you're keen to work on the split logic and variability fixes after this PR is merged, feel free to tag me in the PR and I'd be happy to test/review it for you when it's ready 🙂

@jerry-git
Copy link
Owner

In the meantime though, I would suggest pytest-split should do something like this where we just remove the setup time of the first test in the test suite.

This sounds like a potential solution but how about if the first test doesn't use the heavy fixture? 🙂

@sondrelg
Copy link
Contributor Author

sondrelg commented Jun 8, 2021

In the meantime though, I would suggest pytest-split should do something like this where we just remove the setup time of the first test in the test suite.

This sounds like a potential solution but how about if the first test doesn't use the heavy fixture? 🙂

I don't think a perfect solution exists. We either risk reducing the setup time of the first test too much, or too little (current behaviour is also not correct right). And I guess an important point is that pytest-split cannot know which of them it is.

Since pytest-split cannot know, but we know that all session scoped fixtures will always be attributed to the first test, then I think it's pretty safe to assume that the first test will almost always be inflated. And the larger a project gets, the more pronounced the problem becomes, while under-estimation for a single test would probably matter less and less. There is a chance that some project will have a ridiculous setup time for the first test in their test suite, and changing the behaviour as suggested above would cause them problems, but I think it's about 1000x times less likely than the other way around 😛

Another option though is to let the user decide - the user actually can know what plugins they use, and what their fixture setups look like, so they are in a much better position to make the decision. A possible solution might be to discard the first setup time if --discard-initial-setup is passed; or perhaps that should be the default and users could opt-out instead?

I guess the question boils down to whether adding the choice is worth it, or if discarding the initial setup time of the first test is so unlikely to cause significant problems for users that it should just be default behaviour. I don't see a harm in adding an opt-out 🙂

@sondrelg
Copy link
Contributor Author

sondrelg commented Jun 8, 2021

These are the outputs after the last commit

Running without a test durations file

❯ pytest . --splits 2 --group 1 --store-durations

[pytest-split] No test durations found. Pytest-split will split tests evenly when no durations are found. 
[pytest-split] You can expect better results in consequent runs, when test timings have been documented.

========================================================================================================================== test session starts ==========================================================================================================================
platform darwin -- Python 3.9.4, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/sondrelg/Documents/pytest-split
plugins: cov-2.12.0, mock-3.6.1, django-4.4.0, split-0.1.6.dev28+gc506046.d20210608
collecting ... 

[pytest-split] Running group 1/2

collected 16 items / 7 deselected / 9 selected                                                                                                                                                                                                                          

tests/test_plugin.py .........                                                                                                                                                                                                                                    [100%]

[pytest-split] Stored test durations in /Users/sondrelg/Documents/pytest-split/.test_durations

==================================================================================================================== 9 passed, 7 deselected in 1.33s ====================================================================================================================

Since we now deselect tests, we get this nice collected 16 items / 7 deselected / 9 selected output, so just outputting the group in [pytest-split] Running group 1/2 seemed sufficient 🙂

@sondrelg sondrelg force-pushed the write-to-cache branch 2 times, most recently from 73015f4 to cb87064 Compare June 8, 2021 11:34
@sondrelg
Copy link
Contributor Author

sondrelg commented Jun 8, 2021

PR should be ready to review when you're ready @jerry-git 👏

@jerry-git
Copy link
Owner

but we know that all session scoped fixtures will always be attributed to the first test

This is not true, consider e.g.

import time
import pytest

@pytest.fixture(scope="session")
def expensive_session_fixture():
    time.sleep(10)

def test_1():
    ...

def test_2(expensive_session_fixture):
    ...

Yup, we could have some flag for enabling / disabling the "skip measuring the first setup". Considering your use case (and the whole thing in general), I think more valuable flag could be something like --discard-long-durations 20 which would basically mean that it doesn't store durations for tests which are measured to last more than 20 seconds.

@sondrelg
Copy link
Contributor Author

sondrelg commented Jun 8, 2021

Would you only apply the long-duration threshold for the setup time on the first test still?

@jerry-git
Copy link
Owner

Would you only apply the long-duration threshold for the setup time on the first test still?

I would treat every test identically 🙂

A configurable threshold would also provide a workaround against misc bugs with other libraries, see e.g. #1

Comment on lines +112 to +114
# This code provides backwards compatibility after we switched
# from saving durations in a list-of-lists to a dict format
# Remove this when bumping to v1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💪

Is there a test case for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add one 👍

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add a 100% coverage requirement as well at some point 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it tomorrow 😂

Added the old format to test in 794fedf

Copy link
Contributor Author

@sondrelg sondrelg Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could help you write some tests if you want, and if you'd like I could also help implement poetry for package management and dependency stuff 🙂

Looks like their website is down - probably for the same reason Github was down earlier.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poetry would be welcome indeed 👍

Copy link
Owner

@jerry-git jerry-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💪

@jerry-git jerry-git merged commit beed24d into jerry-git:master Jun 8, 2021
@jerry-git
Copy link
Owner

FYI, available in 0.2.0!

@mbkroese
Copy link

mbkroese commented Jun 8, 2021

collected 16 items / 7 deselected / 9 selected

This looks really good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants