Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "lazy sequence" to avoid Internet access during candidate search #8932

Merged
merged 9 commits into from
Oct 13, 2020

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Sep 28, 2020

find_matches() is modified to return a special type that implements the sequence protocol (instead of a plain list). This special sequence type tries to use the installed candidate as the first element if possible, and only access indexes when the installed candidate is considered unsatisfactory.

Outdated numbers

Using the reprod from #8664 (comment):

# Old resolver
real	0m2.209s
user	0m1.930s
sys	0m0.173s

# New resolver before patch
real	0m54.665s
user	0m36.713s
sys	0m1.416s

# New resolver with patch
real	0m4.261s
user	0m3.574s
sys	0m0.285s

I kind of feel the lazy sequence implementation should be pushed into resolvelib instead. The find_matches() would be modified to return a Callable[[], Optional[Iterator[Candidate]]] instead. This callable would be called multiple times by the resolver (to replace the need of iterating through the sequence multiple times in the current implementation).

Fix #8023.

@uranusjr uranusjr marked this pull request as draft September 28, 2020 16:48
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't mention you were implementing this! :o (to be fair, neither did I)

A few minor comments inline.

src/pip/_internal/resolution/resolvelib/provider.py Outdated Show resolved Hide resolved
@pradyunsg
Copy link
Member

I kind of feel the lazy sequence implementation should be pushed into resolvelib instead. The find_matches() would be modified to return a Callable[[], Optional[Iterator[Candidate]]] instead. This callable would be called multiple times by the resolver (to replace the need of iterating through the sequence multiple times in the current implementation).

I quite like the sound of this. OTOH, it should be possible to do it make this change here now and have resolvelib add this support in a backwards-compatible way after-the-fact, based on feedback on this variant -- we could then flip the switch on this side after that. :)

@uranusjr
Copy link
Member Author

You didn't mention you were implementing this! :o (to be fair, neither did I)

I was trying to do this in resolvelib initially, but found that the implementation needs to change quite a bit for it to work. The boolean check (i.e. whether the list is empty) is currently used to drive backtracking (if it’s empty, we’re hitting a dead end and should backtrack), and switching to an iterator means the resolver needs to change how it backtracks.

I think it’s probably a better plan to include this in pip for a while, to avoid resolvelib blocking pip releases.

@uranusjr
Copy link
Member Author

uranusjr commented Sep 29, 2020

Not sure how I should do the news fragment. #8905 is already taken by the cache PR.

--

Edit: I can probably use #8023 instead.

@uranusjr
Copy link
Member Author

uranusjr commented Sep 29, 2020

This plus the caching PR gets us to

real  0m3.660s
user  0m3.341s
sys   0m0.177s

# Old resolver from above.
real  0m2.209s
user  0m1.930s
sys   0m0.173s

The rest is probably due to the extra checks and not avoidable. For context, the new resolver emits 371 “Requirement already satisfied” messages; the older resolver only 163. These should correspond quite directly to how much computation each resolver does.

@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Sep 30, 2020
@uranusjr
Copy link
Member Author

Uh, why is Travis reporting legistimate test errors while Azure claims everything is fine 😞

@pradyunsg
Copy link
Member

Don't we only run the new resolver tests on Travis CI?

@pradyunsg
Copy link
Member

pradyunsg commented Sep 30, 2020

I'mma add this to 20.3 as a release blocker, since I strongly believe that we should include this change in the main rollout release. Please holler if you disagree. :)

@pradyunsg pradyunsg added the !release blocker Hold a release until this is resolved label Sep 30, 2020
@pradyunsg pradyunsg added this to the 20.3 milestone Sep 30, 2020
@uranusjr
Copy link
Member Author

Oh, I misred and thought the failures are from test_new_resolver_*. I’ll try to fix things later today.

@uranusjr
Copy link
Member Author

That should do it.

@uranusjr uranusjr marked this pull request as ready for review September 30, 2020 13:37
@uranusjr uranusjr force-pushed the new-resolver-lazy-sequence branch 4 times, most recently from b32ff85 to 9b21352 Compare October 1, 2020 16:12
@uranusjr
Copy link
Member Author

uranusjr commented Oct 1, 2020

I was trying to address the upgrade strategy issue mentioned above, and ended up re-implementing all the iterator stuff. I actually feel this is much better than the previous class-based approach though, both shorter and easier to understand.

@brainwane
Copy link
Contributor

@pradyunsg and I discussed this pull request in today's meeting. To make enough progress on #8664, we either need to merge this into pip, or we need to merge sarugaku/resolvelib#57 into resolvelib, then cut a resolvelib release, update the vendored version in pip, and update pip's implementation. Pradyun will talk with @uranusjr to figure out the best path forward.

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@graingert
Copy link
Contributor

@uranusjr @pradyunsg there's conflicts

find_matches() is modified to return a special type that implements
the sequence protocol (instead of a plain list). This special sequence
type tries to use the installed candidate as the first element if
possible, and only access indexes when the installed candidate is
considered unsatisfactory.
@pradyunsg pradyunsg mentioned this pull request Oct 13, 2020
@pradyunsg pradyunsg merged commit 3fe826c into pypa:master Oct 13, 2020
@uranusjr uranusjr deleted the new-resolver-lazy-sequence branch October 13, 2020 17:54
pradyunsg added a commit to pradyunsg/pip that referenced this pull request Oct 16, 2020
pradyunsg added a commit to pradyunsg/pip that referenced this pull request Oct 16, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
!release blocker Hold a release until this is resolved type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Resolver: Avoiding network access if the installed candidate is good enough
6 participants