Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid generating metadata in pip download --no-deps ... #1884

Open
tlynn opened this issue Jun 18, 2014 · 57 comments
Open

Avoid generating metadata in pip download --no-deps ... #1884

tlynn opened this issue Jun 18, 2014 · 57 comments
Labels
C: build logic Stuff related to metadata generation / wheel generation resolution: deferred till PR Further discussion will happen when a PR is made type: bug A confirmed bug or unintended behavior

Comments

@tlynn
Copy link

tlynn commented Jun 18, 2014

"pip install xxx --download yyy --no-deps" runs "python setup.py egg_info" after downloading (to generate the requires.txt dependency list?). This seems unnecessary.

This is a problem in the case of "cryptography", since its setup.py uses setup_requires, which automatically downloads its dependencies at the slightest nudge, even just --help (cryptography issue #716).

This impedes using pip install --download to build a collection of trusted (checksummed) modules for later installation.

@xolox
Copy link

xolox commented Jul 13, 2014

The python setup.py egg_info command is run for metadata discovery. It's an unfortunate fact of the Python packaging ecosystem that anything related to packaging always involves arbitrary code execution (referring to setup.py). The setuptools setup_requires feature is a particularly nasty example of this (don't get me wrong, I understand it has its uses, but it's still nasty). So I believe this should be fixed on the side of the cryptography package, because we can't expect pip to reasonably deal with setup_requires:

  • Setuptools downloads and unpacks setup requirements on demand every time a setup.py script is evaluated in any way, even if you run something very simple which seems like it should be a read only operation, e.g. python setup.py --help or python setup.py --version.

    As a bonus setuptools uses easy_install for setup requirements which can cause several problems: Using pip install --no-index will not stop easy_install from downloading remote package archives and using pip install --index-url=... will not instruct easy_install to use the custom index URL, instead the URL has to be set in ~/.pydistutils.cfg (you can't even configure it using an environment variable).

  • I assume setuptools always installs setup requirements as the very first thing because setup requirements can for example "install" custom setup commands that should appear in the output of python setup.py --help-commands. Setuptools can't decide what setup requirements will do (the possibilities are unbounded :-) so setuptools errs on the side of caution and always installs setup requirements before doing anything else with setup.py.

  • The only people in a position to judge which setup commands actually need setup requirements are the developers of the package that defines setup requirements. For this reason I created pull request Work around for side effects in setup.py script pyca/cryptography#1257 to get this fixed on the side of the cryptography package (it seems like it's possible to fix this with a not-too-big change to the setup.py script).

It is a shame that the setup_requires feature of setuptools is so rudimentary, it would have been nice if there was a canonical and supported way to tell setuptools which setup commands really need setup requirements. That might have avoided this mess.

@sholsapp
Copy link

@xolox thanks for the detailed description, that helped me understand what was going on here.

Another unfortunate side effect of the behavior reported by @tlynn is that we can't run parallel downloads for packages that share transitive dependencies. What one sees is that when using something like
pip install --exists-action=i --download-dir ./download-cache <lots of dependencies that share transitive dependencies> are lots of OSErrors about paths not existing, presumably because invocations of setup.py egg_info on packages that are in the process of being downloaded result in files not being found.

This is probably a convoluted example, but I figured I'd share my use case here.

@dstufft dstufft modified the milestone: 6.0 Dec 18, 2014
@dstufft
Copy link
Member

dstufft commented Mar 24, 2017

When we implement this, we should take care to prevent path traversal bugs like mentioned in #731.

@pradyunsg
Copy link
Member

@dstufft Can this be closed since the --download flag has been removed?

@dstufft
Copy link
Member

dstufft commented Jun 29, 2017

I think the issue is still valid for pip download --no-deps?

@pradyunsg pradyunsg added the type: bug A confirmed bug or unintended behavior label Jun 29, 2017
@pradyunsg
Copy link
Member

I guess so.

@pdecat
Copy link

pdecat commented Jun 6, 2018

What about not running setup.py egg_info on pip install --no-deps ? (Even without --no-download)

That would allow working around a current pylxd issue canonical/pylxd#308 (comment)

@RonnyPfannschmidt
Copy link
Contributor

how about additionally not running egg-info if the tarball ships an egg_info file to begin with
i believe these days the setuptools-generated packages can be trusted

however this should be introduced over a grace period where pip warns about differences between shipped and freshly generated

@RonnyPfannschmidt
Copy link
Contributor

i just ran into this when trying to use pip as sane caching downloader of other stuff
(its just really neat for travis use since the download uses the pip cache which is cached + all the bells and whistles)

@RonnyPfannschmidt
Copy link
Contributor

i just tried to disable this for downloading and the issue spans resolver, preparer and a few other utilities - i will need to retry with more time

@RonnyPfannschmidt
Copy link
Contributor

for an sdist the internals will aways fetch a class IsSDist(DistAbstraction) from pip._interal.operations.prepare

its prep_for_dist will always run egg_info and assert that assert_source_matches_version

with my limited understanding of the internals this is no longer a small saturday morning hack but rather a more nuanced re-factoring and i have to pull out

@mhsmith
Copy link

mhsmith commented Dec 19, 2018

As well as setup_requires, this also affects pyproject.toml. pip download --no-deps will download and install anything listed in there. You can stop this by passing --no-build-isolation, but that shouldn't be necessary because the command didn't ask for a build in the first place.

@chrahunt
Copy link
Member

chrahunt commented Nov 8, 2019

@RonnyPfannschmidt, I don't think we can trust PKG-INFO or similar without some indicator that it's OK. I made a thread over here to discuss one possible approach.

@chrahunt chrahunt changed the title Don't run setup.py egg_info on install --no-deps --download Don't run setup.py egg_info on pip download --no-deps Nov 10, 2019
@chrahunt
Copy link
Member

Updated title to reflect current CLI.

@wimglenn
Copy link
Contributor

wimglenn commented Apr 30, 2020

I guess this bug will be old enough to drink and vote some day ;)
If anyone else needs, there's a workaround demonstrated here (warning: very hacky monkeypatch)
https://stackoverflow.com/a/60325681/674039
the hack no longer works :(

@pradyunsg pradyunsg added the C: dependency resolution About choosing which dependencies to install label Apr 30, 2020
@smaudet
Copy link

smaudet commented Aug 2, 2020

@wimglenn seems to be worse:

did not work with e.g. pip download --no-deps pygobject

Seriously, stuff works fine for 18 versions then basic stuff breaks...

@smaudet
Copy link

smaudet commented Aug 2, 2020

Yeah, confirmed as of about pip 20.1 the patch no longer suffices.

I modified download.py inside the pip distribution to actually check no_deps and pass it down to sdist by adding the parameter to the RequirementPreparer ...

And it works!!

OFC this was manually on my user-installed version of pip, not on top of a git repo or anything. So I don't have a patch right now, persay...

@pradyunsg
Copy link
Member

pradyunsg commented Aug 2, 2020

If anyone wants to file a PR to fix this instead of complaining how this hasn't been fixed / doesn't work, that'd be appreciated. Otherwise, please understand that you're not contributing positively to this issue and making it less likely that this gets fixed (read to understand why).

@takluyver
Copy link
Member

Would this be mitigated if pip was able to read metadata from a PEP 621 [project] table in pyproject.toml files in sdists? I think not many projects are doing this yet, but one of the goals of PEP 621 is to make metadata available statically, so you don't have to run some build step to e.g. get a list of dependencies.

@pfmoore
Copy link
Member

pfmoore commented Oct 12, 2021

It would certainly reduce the number of times we have to do a build just for metadata, so yes.

No-one has yet looked at PEP 621 support in pip - apart from the obvious "no-one has been interested enough" reason, I think it's a matter of waiting until enough projects publish PEP 621 compatible metadata (and specifically dependency metadata) to make it worthwhile. Personally, I think that when setuptools starts writing PEP 621 data is probably the point at which I'd see it as worthwhile for pip (there will be a lot of projects that won't be able to generate static dependencies, even if setuptools supports them, but at least having the possibility seems like a reasonable starting point). Of course, you might feel differently about whether we should wait on a particular backend 🙂

@takluyver
Copy link
Member

No problem, I'm aware that PEP 621 usage is going to be pretty small until setuptools supports it. 🙂

@pradyunsg
Copy link
Member

PEP 621 usage is going to be pretty small until setuptools supports it. 🙂

Yea, and pypa/setuptools#1688 is sitting, waiting for someone to pick up and implement OR to provide funding for doing the necessary work on it to the PSF's Packaging-WG. :)

@pradyunsg
Copy link
Member

FWIW, a big part of the problem here is the implementation details of working within pip's codebase.

There's a concept of "preparing" a requirements, which involves downloading them and generating metadata for them. There's no separation between the two, and detangling these would likely be a non-trivial amount of work. @nchepanov (with some help from me) spent a bunch of time investigating this in Dev 2020, and the notes from that work are available above, in comments on this issue.

@pfmoore
Copy link
Member

pfmoore commented Oct 12, 2021

Yea, and pypa/setuptools#1688 is sitting, waiting for someone to pick up

Apologies, I got confused between PEP 621 (Storing project metadata in pyproject.toml) and PEP 643 (Metadata for Package Source Distributions). PEP 621 is irrelevant here, as tools should not read metadata from pyproject.toml. Reading metadata from a sdist via PEP 643 would be useful, and is valid, though.

While I guess it's tempting to assume that pip can read pyproject.toml and if it finds PEP 621 data, then use it, it would be wrong because there's no guarantee that the backend supports PEP 621, so there's no reason to believe that the metadata in the generated wheel/sdist would bear any relationship to the PEP 621 data. Obviously a project that specifies metadata in pyproject.toml, but different metadata in (say) setup.cfg is weird and broken, but nevertheless it's a good demonstration of why pip needs to leave metadata generation to the backend.

@tvalentyn
Copy link

As well as setup_requires, this also affects pyproject.toml. pip download --no-deps will download and install anything listed in there. You can stop this by passing --no-build-isolation, but that shouldn't be necessary because the command didn't ask for a build in the first place.

Just a note that adding --no-build-isolation does not necessarily disable metadata verification after package is downloaded. As a result, both:

python -m pip download --dest /tmp pyarrow==5.0.0 --no-deps --no-binary :all: --no-build-isolation 
python -m pip download --dest /tmp pyarrow==5.0.0 --no-deps --no-binary :all: --no-build-isolation --no-use-pep517

fail for me in a clean environment where Cython (a build dependency) is not installed.

@pfmoore
Copy link
Member

pfmoore commented Oct 12, 2021

That's probably because even with --no-deps we still need to (check for and) satisfy Python-Requires metadata.

@wimglenn
Copy link
Contributor

@pfmoore It still does it even if you add --ignore-requires-python.

@pradyunsg
Copy link
Member

fail for me in a clean environment where Cython (a build dependency) is not installed.

Please see https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/#build-isolation and https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/#disabling-build-isolation.

Passing --no-build-isolation means that you'll manage the build-time dependencies yourself. It doesn't change that pip still performs metadata generation -- you will still need those build-time dependencies to be installed.

@th0ma7
Copy link

th0ma7 commented Dec 18, 2021

Making some mileage on #1884 (comment) from @pfmoore I ended with the following which seems to work all right

while IFS= read -r requirement
do
   query="curl -s https://pypi.org/pypi/${requirement%%=*}/json"
   query+=" | jq -r '.releases[][]"
   query+=" | select(.packagetype==\"sdist\")"
   query+=" | select((.filename|test(\"-${requirement##*=}.tar.gz\")) or (.filename|test(\"-${requirement##*=}.zip\"))) | .url'"

   # Get local filename from query
   localFile=$(basename $(eval ${query} 2>/dev/null) 2>/dev/null)

   # Download file
   if [ "${localFile}" != "" ]; then
      wget --secure-protocol=TLSv1_2 -nv -O ${localFile}.part -nc $(eval $query)
      mv ${localFile}.part ${localFile}
   else
      echo "ERROR: Invalid package name [${requirement%%=*}]"
   fi
done < <(grep -v -e "^$" -e "^#" requirements.txt)

EDIT: Added ability to

  1. remove commented + blank lines
  2. capturing both .tar.gz and .zip source packages
  3. echo error when filename isn't ok

@pfmoore
Copy link
Member

pfmoore commented Aug 25, 2022

Locking as there's nothing immediately actionable here, and the ecosystem-level changes needed to improve the situation are already being worked on elsewhere.

@pypa pypa locked and limited conversation to collaborators Aug 25, 2022
@pradyunsg
Copy link
Member

Well... it's possible to do this purely on pip's end (see #1884 (comment) and discussion immediately after that).

The bit that's missing is developer resources to actually do that work, since that's closely related with how our build logic only has a single "prepare" operation that maps directly to "fetch file + generate metadata". That's something we've been chipping away at since mid 2019 (#6607, #7049).

None the less, I think it's reasonable to leave this locked for a bit. :)

@pfmoore
Copy link
Member

pfmoore commented Aug 25, 2022

As noted in an earlier comment of yours an easy way to restart discussion is if someone creates a PR with a suggested solution 🙂

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: build logic Stuff related to metadata generation / wheel generation resolution: deferred till PR Further discussion will happen when a PR is made type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests