Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get hash from link #1670

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

michael-k
Copy link
Contributor

@michael-k michael-k commented Aug 19, 2022

If the JSON API isn't available (eg. for devpi, Artifactory, AWS CodeArtifact), get the hash of a wheel/archive from the URL if available instead of hash.

This resolves #1135

Contributor checklist
  • Provided the tests for the changes.
  • Assure PR title is short, clear, and good to be included in the user-oriented changelog
Maintainer checklist
  • Assure one of these labels is present: backwards incompatible, feature, enhancement, deprecation, bug, dependency, docs or skip-changelog as they determine changelog listing.
  • Assign the PR to an existing or new milestone for the target version (following Semantic Versioning).

@michael-k michael-k added the enhancement Improvements to functionality label Aug 19, 2022
@michael-k
Copy link
Contributor Author

We've used that now for a while in production with AWS CodeArtifact and it did not cause any problems.

@hajapy
Copy link

hajapy commented Nov 1, 2022

I locally patched pip-tools with this change and confirmed it eliminated a lot of downloading and hashing work when using pip-compile with AWS CodeArtifact.

I still noticed there is a bit of extra time spent attempting to use the json api before falling back to the links. I wonder if we could opt out of the json api to skip making all those unnecessary api calls, or perhaps switch the order around to first use the hash in the link? Regardless, this additional small optimization would be tiny compared to the gains achieved by the change as-is. Would definitely like to see this in an upcoming pip-tools release!

@duoi
Copy link
Member

duoi commented Nov 15, 2022

LGTM. Can it be rebased with the latest changes?

@atugushev any thoughts on this?

@michael-k michael-k force-pushed the hashes-from-links branch 2 times, most recently from b78cafd to c36c51e Compare November 17, 2022 22:31
@michael-k
Copy link
Contributor Author

michael-k commented Nov 17, 2022

Can it be rebased with the latest changes?

Done; but wasn't really necessary 🤷

@atugushev
Copy link
Member

atugushev commented Nov 19, 2022

or perhaps switch the order around to first use the hash in the link?

@hajapy Good idea. We can improve this after #1723 being merged.

@atugushev atugushev added the hashes Related to hashes generated via --generate-hashes label Dec 12, 2022
@@ -389,6 +389,9 @@ def _get_matching_candidates(
return candidates_by_version[matching_versions[0]]

def _get_file_hash(self, link: Link) -> str:
if link.hash_name == FAVORITE_HASH:
Copy link
Member

@atugushev atugushev Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this with pip-22.2 and it works fine:

# requirements.in
--index-url https://m.devpi.net/root/pypi/+simple/

numpy
root@py311:/# pip --version
pip 22.2 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)

root@py311:/# pip-compile --generate-hashes -v
Using indexes:
  https://m.devpi.net/root/pypi/+simple/

                          ROUND 1
Current constraints:
  numpy (from -r requirements.in (line 3))

Finding the best candidates:
  found candidate numpy==1.23.5 (constraint was <any>)

Finding secondary dependencies:
  numpy==1.23.5             requires -
------------------------------------------------------------
Result of round 1: stable, done

Generating hashes:
  numpy
    Getting hash from link numpy-1.23.5-cp311-cp311-macosx_11_0_arm64.whl
    Getting hash from link numpy-1.23.5.tar.gz
    Getting hash from link numpy-1.23.5-cp39-cp39-win_amd64.whl
    Getting hash from link numpy-1.23.5-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
...

But with pip-22.3 it doesn't work:

root@py311:/# pip --version
pip 22.3.1 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)

root@py311:/# pip-compile --generate-hashes -v
Using indexes:
  https://m.devpi.net/root/pypi/+simple/

                          ROUND 1
Current constraints:
  numpy (from -r requirements.in (line 3))

Finding the best candidates:
  found candidate numpy==1.23.5 (constraint was <any>)

Finding secondary dependencies:
  numpy==1.23.5             requires -
------------------------------------------------------------
Result of round 1: stable, done

Generating hashes:
  numpy
    Hashing numpy-1.23.5-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
      |██                              |   6%^C

Using git bisect I've tracked down the problematic commit pypa/pip@bad03ef from pypa/pip#11111

Copy link
Member

@atugushev atugushev Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc'ing @cosmicexplorer for the insights. What could go wrong with Link.hash in 22.3?

Copy link

@hajapy hajapy Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a curious onlooker (not someone intimately familiar with pip internals), I think the above behaviour is coming from this part of the change: pypa/pip@bad03ef#diff-3919ca53335487395177edeffff8b60bd360f1ad18c95593287f6814b9ecefb8R403-L209.

In pip 22.3 and pip 22.2, I think the internal Link._hashes dict contains the "sha256" key but as of 22.3, that inner dict is not checked for the hash in the Link.hash_name property.

I think in the simple api, in the json form the hashes will come from the hashes dict, whereas in the http form it will be in the url fragment, and I think your above test (with index https://m.devpi.net/root/pypi/+simple/) is returning the json form of the simple api. Only for the latter will HashLink (as of pip 22.3) be populated with a hash/hash_name, so I think there is a regression in functionality for the simple api + json response introduced by pypa/pip#11111.

I think the check here could be if FAVORITE_HASH in link._hashes and that would work for both pip 22.2 and pip 22.3. Accessing the value would be link._hashes[FAVORITE_HASH]. Having to use that internal dict seems suboptimal :/

FWIW, the current implementation is still working on pip 22.3 with aws codeartifact as the index, which seems to be going down the path of responding not as json but as http and using Link.from_element rather than Link.from_json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it and update the PR to support both pip < 22.3 and >= 22.3.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pypa/pip@0233bf2 once again changes the internal logic for retrieving the hash_name which should restore the functionality as before.

Could we keep the change as-is and note that there is a still an inefficiency for affected pip versions and artifact stores that use the json form of the simple api? IMO, this is no worse than what we have today in this one specific exception and better in all other scenarios.

Alternatively we could use a workaround for the affected pip versions to have parity across all pip versions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this comment seems to confirm that hashes were missing from json pypa/pip#11692 (comment) and it was an issue in more ways for pip than just this proposed change.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update on this / interest in reviving it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, haven't had time to work on this. Our automation basically uses this branch therefore the pressure to get it upstream is low :/

I'll try to find some time, but I'm also happy if someone else wants to take over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements to functionality hashes Related to hashes generated via --generate-hashes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get sha256 hash from /simple (PEP503) endpoint
4 participants