Releases · NVlabs/NVBit

29 Aug 15:15

x-y-z

1.7.1

ff94852

NVBit-1.7.1 Latest

Latest

Improved CUDA program compatibility
Fixed related function discovery on SM80 (close #129).
Updated license headers.

Assets 4

11 Jul 17:42

x-y-z

1.7

dd91b9f

NVBit-1.7

NVBit 1.7 contains a lot of changes (both NVBit core and NVBit tools) to support CUDA 12. Please read the change log carefully and follow the migration guide to port your pre CUDA 12 NVBit tools to this new release, otherwise your NVBit is very likely not to work in CUDA 12 environment.

Changes and migration guide:

Added Orin SM_87, Ada Lovelace SM_89, Hopper SM_90, support.
Due to potential deadlock during initialization of application, NVBit disables module lazy loading by default: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#possible-issues-when-adopting-lazy-loading. If wanted, user can try to set NO_EAGER_LOAD=1 to enable module lazy loading.
NVBit tools can no longer use syscalls in instrument functions, therefore printf() and assert() are no longer allowed in the injected functions. Any use of printf() or assert() will prevent your tool from loading and cause application error. As a result, mem_printf example is removed. Instead, tool writers will need to format and transfer their messages on their own. A skeleton example is provided as mem_printf2, which is built on top of mem_trace and requires tool writers to add a string formatter.
Revised nvbit_at_ctx_init()/nvbit_at_ctx_term() callback rules:
a. CUDA API calls are no longer allowed in the nvbit_at_ctx_init() callback function, please use they in the new nvbit_tool_init() callback function instead. Because CUDA API calls take the same lock which is already taken by CUDA driver (CUDA 12+) at context creation time when Nvbit_at_ctx_init() is invoked, whereas nvbit_tool_init() is invoked before first CUDA kernel launch without taking the lock. Failure to make this change will result in your tool deadlocking. NVBit will warn you about this change, set ACK_CTX_INIT_LIMITATION=1 to acknowledge and disable the warning.
b. Launching a kernel, allocating device or managed memory are no longer allowed in the nvbit_at_ctx_term() callback function, due to a similar locking issue. Failure to make this change will result in your tool deadlocking.
Rewrote mem_trace example to adapt to CUDA 12 changes by following the new nvbit_at_ctx_init()/nvbit_at_ctx_term() callback rules above. Please read the changes from mem_trace carefully and adapt your tool accordingly if it uses ChannelDev and ChannelHost from utils/channel.hpp.
Added support for cudaLaunchKernelEx (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb9c891eb6bb8f4089758e64c9c976db9) API for all tools. Please update your tools to catch all kernel launches during instrumentation.
NVBit tools are now compiled with arch=all by default to be able to run on all GPU architectures. To reduce tool compilation time and binary size, run make ARCH=sm_XX when you are planning to only run your tool on sm_xx GPU architecture.
ppc64le support is dropped.

Assets 4

03 Feb 16:52

x-y-z

1.5.5

f6aff9d

NVBit-1.5.5

Fixed

Fixed instrumentation of relative control flow instructions in Maxwell/Pascal.

Note: the ppc64le version is compiled but not tested on real machines.

Assets 5

22 Nov 19:59

x-y-z

1.5.4

f6aff9d

NVBit-1.5.4

Fixed

Fixed instruction size mismatch, i.e., possible wrong value from Instr::getSize().
Fixed nvbit_{read,write}_{ureg,preg_reg,upred_reg} functions.

Changed

Updated CUDA header files to CUDA 11.5
Better error messages for temporary file creations.

Note: the ppc64le version is compiled but not tested on real machines.

Assets 5

08 Feb 16:55

x-y-z

1.5.3

f02de14

NVBit-1.5.3

Fixed

Added missing surface and texture MemorySpaceStr.
Fixed LDGSTS address generation issues.

Added

Added SM_86 support.

Changed

Changed mem_trace to work with multi-context workloads.

Assets 5

23 Nov 19:45

x-y-z

1.5.2

f02de14

NVBit-1.5.2

Fixed

Fixed a bug in Turing+ architectures causing program state corruption due to using printf in instrumentation functions.
Fixed a bug in public NVBit decoding functions during Texture and Surface instruction decoding.

Added

Added an example of instrumenting programs that use CUDA graphs.

Assets 5

27 Oct 19:45

x-y-z

1.5.1

f02de14

NVBit-1.5.1

Fixed

Fixed instruction decoding bugs in Turing and Ampere.
Fixed a cubin parsing bug.

Assets 5

14 Oct 18:45

x-y-z

1.5

f02de14

NVBit-1.5

Changed

Changed *_pred functions/variables to *_guard_pred to avoid confusion, since some SASS instructions also use as operands predicate register, which is different from guard predicate.
Moved instruction types to InstrType namespace from Instr class.
Renamed class/enum type names: memOpType -> MemorySpace, memOpTypeStr -> MemorySpaceStr, operandType -> OperandType, operandTypeStr -> OperandTypeStr, regModiferType -> RegModiferType, regModiferTypeStr -> RegModifierTypeStr.
Added a new str variable, storing the parsed operand, to operand_t.

Added

Added support for native compilation of tools targeting SM arch >= 70 (up to the currently supported arch). Previously a compilation required targeting PTX< SM70 even when running on Volta+

Fixed

Removed unused mref_t variable.
Fixed some bugs on Turing and Ampere.
Fixed bug in instrumentation function stack calculation which resulted in segmentation fault on pbrt (#28) due to possible nested device function calls.

Removed

Removed obsolete custom implementations of shuffle and ballot from utils.h, which was implemented to support old nvcc

Assets 5

08 Jun 15:43

x-y-z

1.4

f02de14

NVBit-1.4

Added

Added complete Turing support, specifically SM_73 and SM_75.
Added Ampere support, specifically SM_80.
- Added new GLOBAL_TO_SHARED memory space for LDGSTS instruction from Ampere.
Added nvbit_read_ureg, nvbit_write_ureg to read/write uniform registers.
Added nvbit_read_pred_reg, nvbit_write_pred_reg to read/write predicate register.
Added nvbit_read_upred_reg, nvbit_write_upred_reg to read/write uniform predicate register.
Added NVBIT_VERSION to nvbit.h, so one can identify NVBit version in his instrumentation tools
Added variadic instrument function support (with record_reg_vals as an example), so one can write instrument functions like dev_func(int num_args...).

Fixed

Fixed the bug, which prevents callee functions from being instrumented if their caller function has no instruction to be instrumented.

Changed

IARG_PRED_VAL_T and IARG_PRED_REG_T give uniform predicate register value if the instrumented instruction uses uniform predicate register.
Changed move_replace tool to support uniform register.
Changed mem_trace and mem_print tools to support instructions with more than one memory reference address (e.g., LDGSTS).
Changed nvbit_enable_instrumented function to allow users to only enable/disable instrumentation on the specified function without affecting its related functions (the original and default behavior is to enable/disable instrumentation on the specified function and all its related functions).

Assets 5

23 Apr 21:29

x-y-z

1.3.1

a75e2cd

NVBit-1.3.1

Added

Add ARM64 support in NVBit

Fixed

Remove an unnecessary register limit on instrument functions for GPUs older than Volta.

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed

Fixed

Changed

Fixed

Added

Changed

Fixed

Added

Fixed

Changed

Added

Fixed

Removed

Added

Fixed

Changed

Added

Fixed

Releases: NVlabs/NVBit

NVBit-1.7.1

NVBit-1.7

NVBit-1.5.5

Fixed

NVBit-1.5.4

Fixed

Changed

NVBit-1.5.3

Fixed

Added

Changed

NVBit-1.5.2

Fixed

Added

NVBit-1.5.1

Fixed

NVBit-1.5

Changed

Added

Fixed

Removed

NVBit-1.4

Added

Fixed

Changed

NVBit-1.3.1

Added

Fixed