Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Segfault when CUPTI is not correctly initialized. #13853

Open
yliu120 opened this issue Jun 17, 2024 · 1 comment
Open

Bug: Segfault when CUPTI is not correctly initialized. #13853

yliu120 opened this issue Jun 17, 2024 · 1 comment

Comments

@yliu120
Copy link
Contributor

yliu120 commented Jun 17, 2024

Hi,

The XLA:GPU profiler has a segfault bug when CUPTI initialization failed:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fff0401cc7e in nsync::nsync_mu_lock(nsync::nsync_mu_s_*) () from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
(gdb) bt
#0  0x00007fff0401cc7e in nsync::nsync_mu_lock(nsync::nsync_mu_s_*) () from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
#1  0x00007ffeff73a904 in xla::profiler::CuptiActivityBufferManager::AddCachedActivityEventsTo(xla::profiler::CuptiEventCollectorDelegate&, unsigned long, unsigned long&)
    () from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
#2  0x00007ffeff73355e in xla::profiler::CuptiTraceCollector::OnTracerCachedActivityBuffers(std::unique_ptr<xla::profiler::CuptiActivityBufferManager, std::default_delete<xla::profiler::CuptiActivityBufferManager> >) () from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
#3  0x00007ffeff7340cd in xla::profiler::CuptiTraceCollectorImpl::Export(tensorflow::profiler::XSpace*, unsigned long) ()
   from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
#4  0x00007ffeff72a5c8 in xla::profiler::GpuTracer::CollectData(tensorflow::profiler::XSpace*) ()
   from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
#5  0x00007ffeff74d51b in tsl::profiler::ProfilerController::CollectData(tensorflow::profiler::XSpace*) ()
   from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
#6  0x00007ffeff74c427 in tsl::profiler::ProfilerCollection::CollectData(tensorflow::profiler::XSpace*) ()
   from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
#7  0x00007ffeff74bef8 in xla::profiler::PLUGIN_Profiler_CollectData(PLUGIN_Profiler_CollectData_Args*) ()
   from /usr/local/lib/python3.10/dist-packages/jax_plugins/xla_cuda12/xla_cuda_plugin.so
#8  0x00007fff49e9f2e1 in xla::profiler::PluginTracer::CollectData(tensorflow::profiler::XSpace*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#9  0x00007fff4a837efb in tsl::profiler::ProfilerController::CollectData(tensorflow::profiler::XSpace*) ()
   from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#10 0x00007fff4a836e37 in tsl::profiler::ProfilerCollection::CollectData(tensorflow::profiler::XSpace*) ()

The segfault is caused by an unitialized activity_buffers_ here: https://cs.opensource.google/tensorflow/tensorflow/+/master:third_party/xla/xla/backends/profiler/gpu/cupti_collector.cc;drc=17cedabb755224148be9854551d4efd172af10e5;l=630

The activity_buffer will only initialized when https://cs.opensource.google/tensorflow/tensorflow/+/master:third_party/xla/xla/backends/profiler/gpu/cupti_tracer.cc;drc=17cedabb755224148be9854551d4efd172af10e5;l=1314 is called.

But when CUPTI failed to initialize, this function is not called. The library uses an unconstructed object so it leads to a segfault.

Tasks

No tasks being tracked yet.
yliu120 referenced this issue Jun 17, 2024
… better overhead.

PiperOrigin-RevId: 641945583
@cheshire
Copy link
Member

Are there repro steps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants