Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip fails to install multiple packages in single invocation if one depends on another #1386

Closed
ghost opened this issue Dec 19, 2013 · 12 comments

Comments

@ghost
Copy link

ghost commented Dec 19, 2013

Related to #988, without dictating a solution.

If pkg-B depends on pkg-A, then:
pip install pkg-A pkg-B fails.

Same goes for installing from a properly ordered requirements.txt.

I've been working around this over and over again for a couple of
years now.

Specifically, In deployment scripts I've ended up iterate over requirements.txt files
line by line, running pip a dozen times or cluttering up the deployment scripts with multiple-chunk requirements files. The horror.

@qwcode
Copy link
Contributor

qwcode commented Dec 19, 2013

If pkg-B depends on pkg-A, then: pip install pkg-A pkg-B fails

this is not a true general statement. can you give a specific example of 2 pypi packages?

as for your iterating deployment scripts, I'm skeptical you need to be doing that. can you explain?

@qwcode qwcode closed this as completed Dec 19, 2013
@ghost
Copy link
Author

ghost commented Dec 19, 2013

Sorry, I'll clarify my description. The problem occurs when the install/build of pkg-B depends
on pkg-A being installed.
As a comment in #988 mentions, pip tries to build all packages before they are installed,

I see this all the time with packages that depend on numpy and/or cython, both of
which are meat-and-potatotes deps for pydata packages.

In a fresh venv, this fails:

pip install numpy bottleneck

While this succeeds:

pip install numpy 
pip install bottleneck

because bottleneck's setup.py requires numpy to be installed.

For a similar but distinct failure case (starting with a fresh venv), do:

pip install numpy cython

and then this fails

pip install numexpr tables

but this succeeds:

pip install numexpr 
# Note: the build requires libhdf5-devel to be installed
pip install tables

In this failed case, the build process (rather then setup.py) fails because numexpr
isn't installed at tables build-time.

Finally, any package foo that depends on cython for building itself will fail if you do:

pip install cython foo

So that's 3 distinct examples of how pip's behavior fails installations unnecessarily,
and the last case covers at least a few dozen packages out there.

That should make it clear why the steps I described were necessary.
Unluckily for me, I work with these packages all the time.

I understand why pip would want to enforce all-or-nothing semantics on multiple package installation,
but hopefully these examples convince you that the way it's done breaks for some common, real-world
cases.

AFAIK, pip doesn't offer a flag to turn this off. Is there?

@pfmoore
Copy link
Member

pfmoore commented Dec 19, 2013

Just looking at bottleneck, it doesn't state that it install_requires numpy, just that it requires it. And yet it imports numpy in its setup.py. So isn't that the issue, that bottleneck is not specifying its install-time dependencies properly?

@ghost
Copy link
Author

ghost commented Dec 19, 2013

Yes and no. There are valid reasons people avoid putting stuff in install_requires. Specifically pip -I will
upgrade an install_requires dep if one is available, even though the user may only want to upgrade the
package he/she named.

Casually upgrading a package like numpy is a big no-no because:

  1. It may cause subtle breakage to many other installed packages.
  2. It involves a monster compile.

IIRC, pip can be told not to do that but you have to spelunk the docs quite a bit to find out how
and most users just can't be expected to go through all that, So devs have opted for the lesser of
two evils and omit it from install_requires. I can dig up a long issue elsewhere on this if it matters enough.

Also, The user has implicitly given pip a dependency graph for build-time on the command line,
and my take on it is that the (missing) one-by-one behavior is the mental model most users have
by default.

pip could achieve all-or-nothing installs via rollback, or by doing a moist-run into a venv before
touching the system Or, and this would be just fine, give me a flag and I'll just tell pip I'm willing
to risk it.

@dstufft
Copy link
Member

dstufft commented Dec 19, 2013

The problem isn't the lack of install_requires. Pip attempts to discover all of the dependencies before it installs anything. It has to do this because it needs to know what to install. For instance you gave the example

pip install numpy bottleneck

Ok, so pip first goes out and discovers numpy, it finds it has versions A, B, and C. C is the latest so it selects that for install, then pip goes out and discovers bottleneck, however bottleneck (hypothetically) depends on numpy<C. Now pip goes and deselects C and instead selects B. (Note this isn't exactly how it works at the moment, but it's how it will in the future).

The only way to make it work how you're proposing is to have it, instead of "select" C, "install" C. Then when it discovers that no it needs B not C, have it uninstall C, and then install B. This will be horrifically slow, especially for something like numpy. This isn't something I think that is appropiate when the actual underlying issue is that the way someone uses numpy.distutils is fundamentally broken for automatic dependency resolution.

@ghost
Copy link
Author

ghost commented Dec 19, 2013

First, was I at least persuasive enough to merit a reopen? :)

I'm not dictating a solution, you know pip and I only use it. But a --damn-the-torpedos that
might fail (no rollback and other complexities) would be a completely fine solution.

re your points:

  1. broken or not, numpy.distutils only applies to one of the three examples I gave. Even
    if you're right, the other two are still valid.
  2. The "bruteforce search" example you provide has the following case analysis:
  • if the user requests a version that later needs to be replaced, the install will succeed but it may be slow.
  • if no conflict is found the install succeeds and quickly. (the common case for my, arguably typical, usage)

The case analysis for the current behavior is:

  • The install fails.

For the multiple concrete examples I've seen that doesn't look like an absurd compromise.
I acknowledge you can construct pathological cases that would make this behave terribly,
but that shouldn't be a blocker if you never/rarely see them in the wild and the upside is supporting
a large enough class of new cases.
And it seems possible to detect madness when it arises and just have pip scream and die.

Any reasonable way to address these concrete examples would be great. Since I'm not volunteering
to sling the code, I can hardly buck for a pip rewrite. This is however a common pip gripe I've encountered
in pydata land.

@ghost
Copy link
Author

ghost commented Dec 19, 2013

Also, re the slow case, what's the difference between:

pip install numpy==1.6
pip install bottleneck
# replaces numpy by version bottleneck wants

and

pip install numpy==1.6 bottleneck
# which would install numpy twice

in terms of the duration of the install process?

Sorry about the verbosity, I'll try to be more concise for the rest of the thread...

@qwcode
Copy link
Contributor

qwcode commented Dec 19, 2013

although pip doesn't have a true resolver (#988), the basic idea of resolving dependencies first, and installing once, is fundamental, and not something I can imagine changing to work around the cases you mention.

the examples you cite all seem to be build-time dependency issues, i.e. pkgB needs pkgA even to build. the pip/setuptools solution right now is setup_requires. But for various reasons (being hard to compile; projects requiring non-python dependencies; pip's recursive upgrade logic; people not knowing setuptools), some projects choose not to use those keywords.

to be clear, conda hasn't solved this quagmire due to having better resolution logic (e.g. by having a SAT solver like you mention in #988), it solves it by being a "full-stack" management tool that manages all of the non-python dependencies required for any of it's packages, and it installs pre-built binaries.

how might it be easier in the future for pip to install some of the projects you mention? Honestly, that's still TBD, and there's a fair amount of discussion on that now on distutils-sig. wheels and PEP426 will certainly be part of the answer.

the reason I'm leaving this closed, is that the core of the issue isn't a problem with pip's install logic, but rather it's a much broader issue of managing external dependencies, which goes beyond pip itself.

It might be worthwhile to leave a pip issue open that points to some of these ongoing discussions, so that new issues can be closed against that.

@dstufft
Copy link
Member

dstufft commented Dec 19, 2013

To answer your later question, there wouldn't be any difference in length of time. However a proper dependency resolver (which pip doesn't have yet) would determine which version of numpy needed installed without installing it twice, which is where the "will take more time" comes from.

To somewhat muddle this issue, there is the --no-deps flag to pip which should disable the dependency resolution all together and requires you to specify all the dependencies either on the command line or in a requirements.txt file. However this won't currently solve the numpy/bottleneck problems and such because as you noted pip doesn't really grok build time dependencies (leaving that up to the build tool itself, generally setuptools which uses setup_requires).

Personally I'd need to think about this some more. This isn't well supported by any tool right now, and I'm not sure if there's a good way of kluding a method in that won't make things worse in the long term while we wait for the real solution. It's possible that with a little bit of tweaking the existing --no-deps flag can be made to work in a way that would support this. My fear there is that I believe isolated builds and a proper build time dependency specification that comes with PEP426/Metadata 2.0 is a better solution and that if we un-isolate the builds now we'll end up regretting it once PEP426 is in place.

@ghost
Copy link
Author

ghost commented Dec 19, 2013

Reasonable. I hope these issues are tackled down the road as part of the shake-up in python packaging.
An open issue seems in order since these issues remain unresolved while being something users
expect their favorite package manager to handle.

Now, back to the iterating deployment scripts...

@paulmelnikow
Copy link

I'm not sure if this is informative or helpful to anyone, but I came across this thread while trying to resolve a build-time dependency – a package which require another package in order to be set up. I tried using setup_requires, and it fails, because I need forked versions of both packages. I specify the git paths. Unfortunately, when building the dependent package, it seems to encounter the setup_requires, but resolves it using the broken one from pypi instead of my fork.

I can't seem to find a way around it, so I'm going to just script installing the dependency first.

@JamesMcGuigan
Copy link

JamesMcGuigan commented Mar 5, 2020

Here is my requirements.sh utility script, that uses pip-compile + pip-sync + virtualenv

Xargs can be used to install requirements.txt on a line by line basis if required

pip install --upgrade pip pip-tools
timeout 5 pip-compile || pip-compile -v  # --generate-hashes
pip install -r ./requirements.txt || cat ./requirements.txt | perl -p -e 's/\s*#.*$//g' | sed '/^\s*$/d' | xargs -d'\n' -L1 -t pip install
pip-sync

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants