Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Investigate lighterweight alternatives to Boost.Pool #728

Open
olupton opened this issue Dec 22, 2021 · 0 comments
Open

Investigate lighterweight alternatives to Boost.Pool #728

olupton opened this issue Dec 22, 2021 · 0 comments
Labels
gpu improvement Improvement over existing implementation

Comments

@olupton
Copy link
Contributor

olupton commented Dec 22, 2021

Describe the issue
As part of #713 we introduced some memory pools for Random123 streams, implemented using Boost.Pool:

#ifdef CORENEURON_USE_BOOST_POOL
/** Tag type for use with boost::fast_pool_allocator that forwards to
* coreneuron::[de]allocate_unified(). Using a Random123-specific type here
* makes sure that allocations do not come from the same global pool as other
* usage of boost pools for objects with sizeof == sizeof(nrnran123_State).
*
* The messy m_block_sizes map is just because `deallocate_unified` uses sized
* deallocations, but the Boost pool allocators don't. Because this is hidden
* behind the pool mechanism, these methods are not called very often and the
* overhead is minimal.
*/
struct random123_allocate_unified {
using size_type = std::size_t;
using difference_type = std::size_t;
static char* malloc(const size_type bytes) {
std::lock_guard<std::mutex> const lock{m_mutex};
static_cast<void>(lock);
auto* buffer = coreneuron::allocate_unified(bytes);
m_block_sizes[buffer] = bytes;
return reinterpret_cast<char*>(buffer);
}
static void free(char* const block) {
std::lock_guard<std::mutex> const lock{m_mutex};
static_cast<void>(lock);
auto const iter = m_block_sizes.find(block);
assert(iter != m_block_sizes.end());
auto const size = iter->second;
m_block_sizes.erase(iter);
return coreneuron::deallocate_unified(block, size);
}
static std::mutex m_mutex;
static std::unordered_map<void*, std::size_t> m_block_sizes;
};
std::mutex random123_allocate_unified::m_mutex{};
std::unordered_map<void*, std::size_t> random123_allocate_unified::m_block_sizes{};
using random123_allocator =
boost::fast_pool_allocator<coreneuron::nrnran123_State, random123_allocate_unified>;
#else

This brought significant benefits in GPU builds, both in initialisation time and compatibility with (NVIDIA) profiling tools, at the cost of a large (and polarising) dependency, Boost.

In the current implementation, Boost is an optional dependency, but clearly if it is not available then there is no performance improvement from the memory pools.

As we are only using a small, header-only component of Boost, we may be able to find an alternative high-quality implementation that we could use unconditionally.

Possible alternatives to try:

@olupton olupton added gpu improvement Improvement over existing implementation labels Dec 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
gpu improvement Improvement over existing implementation
Projects
None yet
Development

No branches or pull requests

1 participant