Skip to content

Releases: NVlabs/NVBit

NVBit-1.7.1

29 Aug 15:15
ff94852
Compare
Choose a tag to compare
  1. Improved CUDA program compatibility
  2. Fixed related function discovery on SM80 (close #129).
  3. Updated license headers.

NVBit-1.7

11 Jul 17:42
dd91b9f
Compare
Choose a tag to compare

NVBit 1.7 contains a lot of changes (both NVBit core and NVBit tools) to support CUDA 12. Please read the change log carefully and follow the migration guide to port your pre CUDA 12 NVBit tools to this new release, otherwise your NVBit is very likely not to work in CUDA 12 environment.

Changes and migration guide:

  1. Added Orin SM_87, Ada Lovelace SM_89, Hopper SM_90, support.
  2. Due to potential deadlock during initialization of application, NVBit disables module lazy loading by default: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#possible-issues-when-adopting-lazy-loading. If wanted, user can try to set NO_EAGER_LOAD=1 to enable module lazy loading.
  3. NVBit tools can no longer use syscalls in instrument functions, therefore printf() and assert() are no longer allowed in the injected functions. Any use of printf() or assert() will prevent your tool from loading and cause application error. As a result, mem_printf example is removed. Instead, tool writers will need to format and transfer their messages on their own. A skeleton example is provided as mem_printf2, which is built on top of mem_trace and requires tool writers to add a string formatter.
  4. Revised nvbit_at_ctx_init()/nvbit_at_ctx_term() callback rules:
    a. CUDA API calls are no longer allowed in the nvbit_at_ctx_init() callback function, please use they in the new nvbit_tool_init() callback function instead. Because CUDA API calls take the same lock which is already taken by CUDA driver (CUDA 12+) at context creation time when Nvbit_at_ctx_init() is invoked, whereas nvbit_tool_init() is invoked before first CUDA kernel launch without taking the lock. Failure to make this change will result in your tool deadlocking. NVBit will warn you about this change, set ACK_CTX_INIT_LIMITATION=1 to acknowledge and disable the warning.
    b. Launching a kernel, allocating device or managed memory are no longer allowed in the nvbit_at_ctx_term() callback function, due to a similar locking issue. Failure to make this change will result in your tool deadlocking.
  5. Rewrote mem_trace example to adapt to CUDA 12 changes by following the new nvbit_at_ctx_init()/nvbit_at_ctx_term() callback rules above. Please read the changes from mem_trace carefully and adapt your tool accordingly if it uses ChannelDev and ChannelHost from utils/channel.hpp.
  6. Added support for cudaLaunchKernelEx (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb9c891eb6bb8f4089758e64c9c976db9) API for all tools. Please update your tools to catch all kernel launches during instrumentation.
  7. NVBit tools are now compiled with arch=all by default to be able to run on all GPU architectures. To reduce tool compilation time and binary size, run make ARCH=sm_XX when you are planning to only run your tool on sm_xx GPU architecture.
  8. ppc64le support is dropped.

NVBit-1.5.5

03 Feb 16:52
f6aff9d
Compare
Choose a tag to compare

Fixed

  1. Fixed instrumentation of relative control flow instructions in Maxwell/Pascal.

Note: the ppc64le version is compiled but not tested on real machines.

NVBit-1.5.4

22 Nov 19:59
f6aff9d
Compare
Choose a tag to compare

Fixed

  • Fixed instruction size mismatch, i.e., possible wrong value from Instr::getSize().
  • Fixed nvbit_{read,write}_{ureg,preg_reg,upred_reg} functions.

Changed

  • Updated CUDA header files to CUDA 11.5
  • Better error messages for temporary file creations.

Note: the ppc64le version is compiled but not tested on real machines.

NVBit-1.5.3

08 Feb 16:55
Compare
Choose a tag to compare

Fixed

  • Added missing surface and texture MemorySpaceStr.
  • Fixed LDGSTS address generation issues.

Added

  • Added SM_86 support.

Changed

  • Changed mem_trace to work with multi-context workloads.

NVBit-1.5.2

23 Nov 19:45
Compare
Choose a tag to compare

Fixed

  • Fixed a bug in Turing+ architectures causing program state corruption due to using printf in instrumentation functions.
  • Fixed a bug in public NVBit decoding functions during Texture and Surface instruction decoding.

Added

  • Added an example of instrumenting programs that use CUDA graphs.

NVBit-1.5.1

27 Oct 19:45
Compare
Choose a tag to compare

Fixed

  • Fixed instruction decoding bugs in Turing and Ampere.
  • Fixed a cubin parsing bug.

NVBit-1.5

14 Oct 18:45
Compare
Choose a tag to compare

Changed

  • Changed *_pred functions/variables to *_guard_pred to avoid confusion, since some SASS instructions also use as operands predicate register, which is different from guard predicate.
  • Moved instruction types to InstrType namespace from Instr class.
  • Renamed class/enum type names: memOpType -> MemorySpace, memOpTypeStr -> MemorySpaceStr, operandType -> OperandType, operandTypeStr -> OperandTypeStr, regModiferType -> RegModiferType, regModiferTypeStr -> RegModifierTypeStr.
  • Added a new str variable, storing the parsed operand, to operand_t.

Added

  • Added support for native compilation of tools targeting SM arch >= 70 (up to the currently supported arch). Previously a compilation required targeting PTX< SM70 even when running on Volta+

Fixed

  • Removed unused mref_t variable.
  • Fixed some bugs on Turing and Ampere.
  • Fixed bug in instrumentation function stack calculation which resulted in segmentation fault on pbrt (#28) due to possible nested device function calls.

Removed

  • Removed obsolete custom implementations of shuffle and ballot from utils.h, which was implemented to support old nvcc

NVBit-1.4

08 Jun 15:43
Compare
Choose a tag to compare

Added

  • Added complete Turing support, specifically SM_73 and SM_75.
  • Added Ampere support, specifically SM_80.
    • Added new GLOBAL_TO_SHARED memory space for LDGSTS instruction from Ampere.
  • Added nvbit_read_ureg, nvbit_write_ureg to read/write uniform registers.
  • Added nvbit_read_pred_reg, nvbit_write_pred_reg to read/write predicate register.
  • Added nvbit_read_upred_reg, nvbit_write_upred_reg to read/write uniform predicate register.
  • Added NVBIT_VERSION to nvbit.h, so one can identify NVBit version in his instrumentation tools
  • Added variadic instrument function support (with record_reg_vals as an example), so one can write instrument functions like dev_func(int num_args...).

Fixed

  • Fixed the bug, which prevents callee functions from being instrumented if their caller function has no instruction to be instrumented.

Changed

  • IARG_PRED_VAL_T and IARG_PRED_REG_T give uniform predicate register value if the instrumented instruction uses uniform predicate register.
  • Changed move_replace tool to support uniform register.
  • Changed mem_trace and mem_print tools to support instructions with more than one memory reference address (e.g., LDGSTS).
  • Changed nvbit_enable_instrumented function to allow users to only enable/disable instrumentation on the specified function without affecting its related functions (the original and default behavior is to enable/disable instrumentation on the specified function and all its related functions).

NVBit-1.3.1

23 Apr 21:29
Compare
Choose a tag to compare

Added

  • Add ARM64 support in NVBit

Fixed

  • Remove an unnecessary register limit on instrument functions for GPUs older than Volta.