Investigate lighterweight alternatives to Boost.Pool #728

olupton · 2021-12-22T16:36:53Z

Describe the issue
As part of #713 we introduced some memory pools for Random123 streams, implemented using Boost.Pool:

CoreNeuron/coreneuron/utils/randoms/nrnran123.cu

Lines 32 to 71 in 53b0c5f

    
           #ifdef CORENEURON_USE_BOOST_POOL 
        
           /** Tag type for use with boost::fast_pool_allocator that forwards to 
        
            *  coreneuron::[de]allocate_unified(). Using a Random123-specific type here 
        
            *  makes sure that allocations do not come from the same global pool as other 
        
            *  usage of boost pools for objects with sizeof == sizeof(nrnran123_State). 
        
            * 
        
            *  The messy m_block_sizes map is just because `deallocate_unified` uses sized 
        
            *  deallocations, but the Boost pool allocators don't. Because this is hidden 
        
            *  behind the pool mechanism, these methods are not called very often and the 
        
            *  overhead is minimal. 
        
            */ 
        
           struct random123_allocate_unified { 
        
               using size_type = std::size_t; 
        
               using difference_type = std::size_t; 
        
               static char* malloc(const size_type bytes) { 
        
                   std::lock_guard<std::mutex> const lock{m_mutex}; 
        
                   static_cast<void>(lock); 
        
                   auto* buffer = coreneuron::allocate_unified(bytes); 
        
                   m_block_sizes[buffer] = bytes; 
        
                   return reinterpret_cast<char*>(buffer); 
        
               } 
        
               static void free(char* const block) { 
        
                   std::lock_guard<std::mutex> const lock{m_mutex}; 
        
                   static_cast<void>(lock); 
        
                   auto const iter = m_block_sizes.find(block); 
        
                   assert(iter != m_block_sizes.end()); 
        
                   auto const size = iter->second; 
        
                   m_block_sizes.erase(iter); 
        
                   return coreneuron::deallocate_unified(block, size); 
        
               } 
        
               static std::mutex m_mutex; 
        
               static std::unordered_map<void*, std::size_t> m_block_sizes; 
        
           }; 
        
           std::mutex random123_allocate_unified::m_mutex{}; 
        
           std::unordered_map<void*, std::size_t> random123_allocate_unified::m_block_sizes{}; 
        
           using random123_allocator = 
        
               boost::fast_pool_allocator<coreneuron::nrnran123_State, random123_allocate_unified>; 
        
           #else

This brought significant benefits in GPU builds, both in initialisation time and compatibility with (NVIDIA) profiling tools, at the cost of a large (and polarising) dependency, Boost.

In the current implementation, Boost is an optional dependency, but clearly if it is not available then there is no performance improvement from the memory pools.

As we are only using a small, header-only component of Boost, we may be able to find an alternative high-quality implementation that we could use unconditionally.

Possible alternatives to try:

https://github.com/foonathan/memory
...add good ideas here...

olupton added gpu improvement Improvement over existing implementation labels Dec 22, 2021

olupton mentioned this issue Dec 22, 2021

Integrate changes from NERSC GPU hackathon. #713

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate lighterweight alternatives to Boost.Pool #728

Investigate lighterweight alternatives to Boost.Pool #728

olupton commented Dec 22, 2021

Investigate lighterweight alternatives to Boost.Pool #728

Investigate lighterweight alternatives to Boost.Pool #728

Comments

olupton commented Dec 22, 2021