Native module for gathering kernel execution performance metrics via code instrumentation. It is compiled into a dynamic library, which then needs to be preloaded into the process one wishes to monitor.
Example usage:
ONEAGENT_NVBIT_EXTENSION_CONF_FILE=<path-to>/nvbit-module.conf LD_PRELOAD=<path-to>/libnvbit-module.so <the application being instrumented>
Dependency | Tested version |
---|---|
spdlog | 1.3.1 |
CUDA Toolkit | 11.0 |
NVBit | 1.5.3 |
Boost | 1.71.0 |
Google Test | 1.10.0 |
CMake | 3.18.4 |
vcpkg | N/A |
Compiler with C++20 support (C++17 for CUDA) | gcc 10.2.0 |
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install spdlog:x64-linux boost-program-options:x64-linux gtest:x64-linux
NVBit does not require separate compilation as documented in README ("Getting Started with NVBit" section) located in the root directory of NVBit release package.
mkdir build
cd build
cmake -G "Unix Makefiles" -DNVBIT_PATH="<path_to_nvbit_release>" -DCMAKE_TOOLCHAIN_FILE="<vcpkg_directory>/scripts/buildsystems/vcpkg.cmake" ..
The module is configured twofold:
- startup configuration is read once during start from the file specified via
ONEAGENT_NVBIT_EXTENSION_CONF_FILE
environment variable, - runtime configuration is read every
runtime_config_polling_interval
seconds from file specified viaruntime_config_path
.
Startup configuration needs to be provided upfront via ONEAGENT_NVBIT_EXTENSION_CONF_FILE
environment variable.
Lintes starting with #
are treated as comments and ignored.
The list of settings is as denoted in the table below.
Key | Value type | Default value | Description |
---|---|---|---|
logfile |
Valid filesystem path | unset | Path to log file |
runtime_config_path |
Valid filesystem path | unset | Path to runtime configuration file |
runtime_config_polling_interval |
Positive integer | 10 | Runtime configuration polling internal in seconds |
measurements_output_dir |
Valid filesystem path | unset | Directory where measurements will be written to |
verbose |
Boolean | false | Enable verbose (debug) logging |
console_log_enabled |
Boolean | false | Enable logging to stdout |
count_warp_level |
Boolean | true | Count warp level or thread level instructions |
exclude_pred_off |
Boolean | false | Exclude predicated off instruction from count |
mangled_names |
Boolean | true | Print kernel names mangled or not |
See nvbit-module.conf for an example.
Runtime configuration is created on the fly by Python extension and contains a list of pids that should be instrumented, along with instrumentation functions to apply to each of them.
See nvbit-module-runtime.conf for an example. For a detailed documentation of communication protocol, see here.
Multiple GPU code injection routines cannot be enabled at once, e.g.
gmem_access_coalescence
andcount_instr
combined won't workgmem_access_coalescence
andoccupancy
will work
This limitation is subject to removal in future increments.