Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make pip source installs a bit easier #1048

Merged
merged 9 commits into from
Jul 3, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Jun 13, 2024

Follow-up to #1044
Contributes to rapidsai/build-planning#31
Related to #1047

  • updates libucx build requirements so they can be satisfied by only packages from pypi.org
  • removes --extra-index-url https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ in pip install calls, now that rapids-build-backend is on pypi.org (https://pypi.org/project/rapids-build-backend/)
  • sets rapids-build-backend config disable-cuda=true in pyproject.toml
    • this means that now pip install . will not require nvcc or produce a wheel with a suffix like -cu12
    • *modified CI script used to build wheels to override this, so the published wheels will still have -cu${ver} suffixes
  • updates docs on source installation to reflect these changes

Notes for Reviewers

These changes came out of an offline conversation with @pentschev and @vyasr

dependencies.yaml Outdated Show resolved Hide resolved
@@ -170,6 +170,6 @@ Building and installing UCX-Py can be done via `pip install`. For example:
conda activate ucx
git clone https://github.com/rapidsai/ucx-py.git
cd ucx-py
pip install -v .
pip install -v -C rapidsai.matrix-entry="cuda=12.2" .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than always including this, should the default development instructions not enable CUDA, and therefore not specify a matrix entry? This assumes we make the other change I suggest above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a question for @pentschev ?

These docs are under a section that says it's assumed you're building on a system with CUDA.

Source
------
The following instructions assume you'll be using UCX-Py on a CUDA enabled system and is in a `Conda environment <https://docs.conda.io/projects/conda/en/latest/>`_.

I'm happy to change this PR either way, but I don't feel qualified to make the call about whether or not the CUDA-enabled build should be the default.

I DO think that we should continue specifying this flag (and therefore depending on the libucx package) in docs builds here. Since we've seen that those docs builds end up also carrying some test coverage to catch works-without-CUDA types of issues (rapidsai/ucxx#231).

So @pentschev , the specific question for you is, which of these would you prefer?

  • pip install . works with 0 other flags, but requires a system installation of UCX
  • pip install . requires passing -C rapidsai.matrix-entry="cuda12.2" or similar, but in exchange the locally-built package depends on libucx wheels (so no system version required)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed 7b675a4 implementing this, by the way, so you can look at the diff and see what it'd look like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs assume a CUDA system because of the --with-cuda=$CUDA_HOME build flag, which is our primary use case, if that flag is omitted then it should work on non-CUDA systems as well, but we don't document that at the moment.

the specific question for you is, which of these would you prefer?

  • pip install . works with 0 other flags, but requires a system installation of UCX
  • pip install . requires passing -C rapidsai.matrix-entry="cuda12.2" or similar, but in exchange the locally-built package depends on libucx wheels (so no system version required)

Can I have both? 🙂

Ideally, I would like both to work, but given your phrasing it seems this is not possible. So let's assume you have UCX installed on the system and libucx installed, what would happen then both with and without -C rapidsai.matrix-entry="cuda12.2"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume you have libucx installed

This question is about whether or not pip install . pulls that dependency in for you, not about what happens if it happens to already be installed.

If you've already done something like the following:

apt-get install openucx
pip install libucx-cu12

Then import ucp will call libucx.load_library(), which should find that system installation of libucs, libucm, etc., but will fall back to the libucx-bundled ones if finding them fails for some reason.

Ok so what's the point of -C rapidsai.matrix-entry? libucx isn't distributed as the name libucx.... you have to pick libucx-cu11 (CUDA 11) or libucx-cu12 (CUDA 12). That selection of major CUDA version can happen automatically in rapids-build-backend, but that detection is being turned off here via the use of disable-cuda = true, to support source installation on non-CUDA systems by default.

Another option we could pursue, if you want, is to choose a specific CUDA verrsion as the default one for source installations by e.g. recording libucx-cu12 as a dependency in pyproject.toml. Then pip install . would "just work" with no additional flags on a system with or without CUDA available and with or without a system installation of UCX. But anyone using CUDA 11 would have to pass -C rapidsai.matrix-entry="cuda11.8" or similar, or they'd get a libucx-cu12 (built against CUDA 12), which might lead to runtime issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a39811c. Would you mind having a look if everything seems correct to you @jameslamb ?

Also, I've accidentally committed this local change that I was going to push to another PR. It's harmless and corrects the environment now, so if you don't mind let's leave it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's harmless and corrects the environment now, so if you don't mind let's leave it.

Sure, no problem, it's ok to leave it.

Would you mind having a look

Looking right now! Thanks for doing that.

Copy link
Member Author

@jameslamb jameslamb Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd written up a long comment about how the docs should indicate that CPU-only is totally supported by pip install ucx-py-cu{11,12} .... then re-read what you wrote and saw that you already said exactly that 🤦🏻

Anyway, in case it's interesting, here's how I tested that to convince myself:

how I tested that (click me)

Ran the following on my macOS laptop (so no possibility of accidentally finding CUDA):

docker run \
  --rm \
  -it python:3.11 \
  bash

# system-install UCX so we don't link to the the wheel one
apt-get update
apt-get install -y --no-install-recommends \
  libucx-dev

# install ucx-py-cu12 (which also pulls in libucx-cu12)
pip install --extra-index-url https://pypi.nvidia.com 'ucx-py-cu12>=0.38'

# import ucx-py and check where the loader found libraries
python -c "import ucp"
ldconfig -p | grep libuc
#	libuct.so.0 (libc6,AArch64) => /lib/aarch64-linux-gnu/libuct.so.0
#	libuct.so (libc6,AArch64) => /lib/aarch64-linux-gnu/libuct.so
#	libucs_signal.so.0 (libc6,AArch64) => /lib/aarch64-linux-gnu/libucs_signal.so.0
#	libucs_signal.so (libc6,AArch64) => /lib/aarch64-linux-gnu/libucs_signal.so
# ... etc., etc. ... it found them all

# write the demo code from https://github.com/rapidsai/ucx-py/blob/branch-0.39/docs/source/quickstart.rst
cat > server.py <<EOF
import asyncio
import time
import ucp
import numpy as np

n_bytes = 2**30
host = ucp.get_address(ifname='eth0')  # ethernet device name
port = 13337

async def send(ep):
    # recv buffer
    arr = np.empty(n_bytes, dtype='u1')
    await ep.recv(arr)
    assert np.count_nonzero(arr) == np.array(0, dtype=np.int64)
    print("Received NumPy array")

    # increment array and send back
    arr += 1
    print("Sending incremented NumPy array")
    await ep.send(arr)

    await ep.close()
    lf.close()

async def main():
    global lf
    lf = ucp.create_listener(send, port)

    while not lf.closed():
        await asyncio.sleep(0.1)

if __name__ == '__main__':
    asyncio.run(main())
EOF

cat > ./client.py <<EOF
import asyncio
import ucp
import numpy as np

port = 13337
n_bytes = 2**30

async def main():
    host = ucp.get_address(ifname='eth0')  # ethernet device name
    ep = await ucp.create_endpoint(host, port)
    msg = np.zeros(n_bytes, dtype='u1') # create some data to send

    # send message
    print("Send Original NumPy array")
    await ep.send(msg)  # send the real message

    # recv response
    print("Receive Incremented NumPy arrays")
    resp = np.empty_like(msg)
    await ep.recv(resp)  # receive the echo
    await ep.close()
    np.testing.assert_array_equal(msg + 1, resp)
    print("successfully received")

if __name__ == '__main__':
    asyncio.run(main())
EOF

# start the client and server
python ./server.py &
python ./client.py &

Saw output like this:

Send Original NumPy array
Receive Incremented NumPy arrays
Received NumPy array
Sending incremented NumPy array
successfully received

Maybe that'd be a helpful smoke test to add for rapidsai/ucxx#231.

Anyway... everything you wrote looks correct and clear to me! I think this is good to merge. Thanks as always for your patience working through it with me 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke tests are always useful, my concern is how would that work in CI? Not sure if it's worth the trouble if we have to add another job at this time, from existing jobs it's probably relatively complicated to do so in a "safe" manner (i.e., ensuring linkage isn't resolved via the proper PyPI/conda UCX package).

I'm also +1 on merging this, I've approved the PR. Thanks so much James for the work here! 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's worth the trouble

For sure, this wasn't a proposal to do add a new job or anything right now.

Thanks! I'll merge this once CI passes.

Comment on lines +118 to +120
# by default, do not rename the package 'ucx-py-cu${ver}'
# (this is overridden in wheel publishing)
disable-cuda=true
Copy link
Member

@jakirkham jakirkham Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ucx-py actually do anything different between CUDA versions? My understanding was no

Only libucx would potentially do different things for different CUDA versions

So am curious why we want to suffix here. Is this just to provide a way to influence libucx indirectly? Or is there not really a need?

Edit: Recognize we may just be agreeing, but want to double check that there isn't still something else going on with the suffix

Copy link
Member Author

@jameslamb jameslamb Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @pentschev or @vyasr elaborate, but I can share a bit.

why we want to suffix here

I can say confidently it's not ONLY to influence the libucx dependency, because ucx-py has been published with CUDA suffixes as far back as v0.32 (https://pypi.nvidia.com/ucx-py-cu12/), and libucx was only introduced in v0.38 (#1041).

But now that we do have libucx... the suffixing does serve that purpose of influencing which version you get.

e.g. cugraph-cu12 depends on ucx-py-cu12 which depends on libucx-cu12

Recognize we may just be agreeing

I don't think so.

This disable-cuda=true in this part of the diff here is removing that suffixing for source builds like pip install ., based on @pentschev 's feedback that he wanted that workflow to be as simple as possible, especially since in some cases ucx-py is used without any CUDA at all.

If this PR was accepted in its current state, we'd still be publishing ucx-py-cu{version} wheels, via pip wheel --config-settings rapidsai.disable-cuda=false (see changes to ci/build_wheel.sh in this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ucx-py actually do anything different between CUDA versions? My understanding was no

Only libucx would potentially do different things for different CUDA versions

John is right, UCX-Py doesn't have a direct dependency on CUDA.

So am curious why we want to suffix here. Is this just to provide a way to influence libucx indirectly? Or is there not really a need?

I have previously asked the same question myself, which James answered above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, couldn't a user just specify the libucx-cu* install if they want that? What does it provide to have ucx-py also including the same information?

Would add on the Conda side we don't provide this distinction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, couldn't a user just specify the libucx-cu* install if they want that? What does it provide to have ucx-py also including the same information?

I think the issue is that ucx-py needs libucx, if we don't encode the CUDA version in ucx-py then the user is responsible to install the appropriate libucx-cu* which can be error prone. I.e., installing ucx-py without libucx-cu* as a dependency will leave ucx-py in an unusable state, whereas ucx-py with a libucx-cu* dependency would make the correct CUDA version uninstallable (would ucx-py depend on libucx-cu11 or libucx-cu12?). Please correct me if I'm wrong @jameslamb .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...on the Conda side we don't provide this distinction

Right, because there's ucx from conda-forge which builds variants for different CUDA versions.

https://github.com/conda-forge/ucx-split-feedstock/blob/b60fb426784cbb6d5cbd384a76c86920737838f0/recipe/meta.yaml#L51-L53

And can constrain everything without needing to use names to disambiguate, by relying on the cuda-version metapackage.

With wheels and pip those same mechanisms don't exist (part of what @msarahan and others have been talking about in https://discuss.python.org/t/implementation-variants-rehashing-and-refocusing/54884).

If you want to use ucx-py with CUDA-enabled UCX, something somewhere has to get that CUDA-enabled UCX installed on the system... and one that was built against the correct version of CUDA. My understanding of rapidsai/build-planning#57 was that this idea of distributing libucx wheels was a response to difficulties users were facing when asked to install UCX themselves outside of ucx-py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this idea of distributing libucx wheels was a response to difficulties users were facing when asked to install UCX themselves outside of ucx-py.

I wouldn't even go as far as saying "users", even for ourselves it was quite challenging to bundle UCX with the libraries that need it, so much so that we had UCX broken for many months. One step further, this also bloated package sizes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to the libucx wheels existing, ucx-py statically linked/bundled components of the CTK in order to support running with CUDA. In that world, yes it did have a CUDA dependency because the CUDA transport layer built into the wheel was specific to a given CUDA version. That CUDA dependency has now migrated to the libucx package, so the purpose of the suffix in ucx-py is now to select the appropriate libucx dependency.

What does it provide to have ucx-py also including the same information?

This is generally what we've done throughout RAPIDS because if a pure Python package A depends on a CUDA-dependent package B there is no way to express the which CUDA version it should select for B without having separate versions of A. For instance, we do this in dask-cudf-cuXY, which has a hard dependency on cudf-cuXY. ucx-py is a bit of a special case because it works without CUDA, so the CUDA versions are truly optional unlike with something like dask-cudf (which requires cudf and therefore must pull a specific CUDA versioned one). In the past we've discussed is having some more automated way of having a default choice of CUDA version, but that still requires a default behavior so you can't get around the different packages in any way that I can tell.

For ucx-py, CUDA support is optional , and since rapidsai/ucx-wheels#5 we allow installing libucx on a system without a GPU. I don't see an option for the dependency specification that provides a good UX in all cases though. You either have to force users to manually install dependencies or you make the package not work by default out of the box AFAICT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that! The used-to-statically-link-bits-of-the-CTK was a part I was missing.

I think influencing the libucx dependency via a CUDA suffix should continue to be the approach here.

@jameslamb jameslamb requested a review from vyasr July 1, 2024 17:41
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jameslamb !

@jameslamb
Copy link
Member Author

/merge

1 similar comment
@pentschev
Copy link
Member

/merge

@pentschev
Copy link
Member

@vyasr maybe you have to approve too because you've been explicitly asked to review? Not sure to be honest.

@jameslamb
Copy link
Member Author

@pentschev This was my mistake, we still need a rapidsai/packaging-codeowners review.

image

I've asked @vyasr offline to to help us with one.

@pentschev
Copy link
Member

@pentschev This was my mistake, we still need a rapidsai/packaging-codeowners review.

Oh, now I see that. I always miss the grey text. 😅

@@ -7,7 +7,7 @@ dependencies:
- python=3.9
- cudatoolkit=11.5
- setuptools
- cython>=0.29.14,<3.0.0a0
- cython>=3.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this change here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentschev accidentally committed it (it was intentional, just not intended for this PR) and proposed just keeping it: #1048 (comment)

docs/source/install.rst Show resolved Hide resolved
@rapids-bot rapids-bot bot merged commit bfb1d99 into rapidsai:branch-0.39 Jul 3, 2024
39 checks passed
@jameslamb jameslamb deleted the docs/pip-install branch July 3, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants