Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). #8333
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When --for some reason-- an allocation group for fused storage for multiple
Func
s that originally are intended to go inGPUShared
gets lifted out of the GPU-block loops, and sits inHeap
memory instead, the profiling injection logic assumed that this buffer came from a function with the same name. This buffer was incorrectly determined to be on the stack, as it ignored thecustom_new
andcustom_free
attributes of theAllocate
node.Consider this example (also included as a new test):
Produces the following Stmt right before the Profiling pass:
Notice how the
allocgroup__f1$0.0__f2$0.1.buffer
is outside of the outermost GPU-block loop. When this buffer didn't get lifted out of the kernel, Profiling wasn't an issue, as the profiler doesn't traverse the IR into GPU loops.The offending line was:
Halide/src/Profiling.cpp
Line 274 in 461c128
When instrumenting the allocate node. The node is incorrectly determined to be
on_stack=true
.This PR checks if there is a custom_new and overrides that it is on the stack to false.
@abadams I wonder if we can't simply rely on
Allocate::MemoryType
to determineon_stack
, or is that stillAuto
at that moment?