Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU tests failing #291

Open
odow opened this issue Apr 18, 2024 · 6 comments
Open

GPU tests failing #291

odow opened this issue Apr 18, 2024 · 6 comments

Comments

@odow
Copy link
Member

odow commented Apr 18, 2024

cc @kalmarek

Some examples have nan:

image
@kalmarek
Copy link
Collaborator

this is the last one that doesn't nan:
https://buildkite.com/julialang/scs-dot-jl/builds/281#018ccf8a-baeb-409b-9fe5-fbde2f42e4bc

and the first one which nans
https://buildkite.com/julialang/scs-dot-jl/builds/283#018ea03d-8b93-40f3-8e1b-6e6a37dec3c8

but this ci run was just after change to README (and before enabling openmp). Smells like something in the CUDA toolchain?!

@odow
Copy link
Member Author

odow commented Apr 18, 2024

There are quite a few versions changes so not sure what the culprit is.

@kalmarek
Copy link
Collaborator

kalmarek commented Jul 6, 2024

The successful one uses

   Installed CUDA_Driver_jll ── v0.7.0+1
   Installed CUDA_Runtime_jll ─ v0.11.1+0
   Installed SCS_GPU_jll ────── v3.2.4+0

The failing one does

   Installed SCS_GPU_jll ────── v3.2.4+0
   Installed CUDA_Driver_jll ── v0.8.0+0
   Installed CUDA_Runtime_jll ─ v0.12.0+1

So this seems to be a problem with cuda-12? @maleadt (sorry if you get too many pings)

@maleadt
Copy link
Collaborator

maleadt commented Jul 8, 2024

Upgrading CUDA_Runtime_jll only updates the underlying CUDA toolkit. Maybe your package is incompatible with the CUDA toolkit v12.4 as introduced by Runtime_jll 0.12, or needs a rebuild.

@kalmarek
Copy link
Collaborator

@maleadt It seems that the newest scs was already built against CUDA toolkit 12.4/5:
https://buildkite.com/julialang/yggdrasil/builds/11739#01908495-78c0-45ae-8bf6-28205badd6b6

@bodono did you test scs with CUDA-12? some examples here run just fine (so I think we're interacting with the library correctly), but some end with bunch of nans.

@bodono
Copy link
Contributor

bodono commented Jul 15, 2024

Unfortunately if CUDA 12 is newish then it's likely that I have never tested with it, since I no longer have access to a GPU machine. The github action I have for gpus only compiles it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants