Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Singleton (XlaDebugInfoManager) leaks out of the control of C API and gets two copies in two shared libraries #22148

Open
yliu120 opened this issue Jun 27, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@yliu120
Copy link
Contributor

yliu120 commented Jun 27, 2024

Description

Hi JAX team,

We identify a bug with the JAX cuda plugin. Here is the writeup for the bug,

https://docs.google.com/document/d/1ldlD8XQ6XYX4zcSRCUIVQyAUBJQZX6v9PdE2qX2_FGw/edit?usp=sharing

To summarize,

We accidentally found that an object XlaDebugInfoManager which supposed to be a global singleton instance ends up with two copies in JAX code. The reason is that the singleton has been linked to both xla_extension.so and cuda_plugin.so so that different part of the python code would reference to different copy.

The direct consequence is that it leads to a few missing metadata in the profiler metadata and makes jax.profiler not function correctly.

This is a bug report but also a feature request because we want to make sure anything intended to be global should not leak from the control of the C API. (A future safety mechanism)

System info (python version, jaxlib version, accelerator, etc.)

This is a general issue with JAX plugins. I tested on JAX latest release and HEAD.

@yliu120 yliu120 added the bug Something isn't working label Jun 27, 2024
@yliu120
Copy link
Contributor Author

yliu120 commented Jun 27, 2024

@hawkinsp I chatted with Peter offline and I guess Peter has some ideas to improve the C API over this problem.
Could you please share some thoughts here? Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant