From be2fb93722eb973a495c1d5b3661d8768d58d51b Mon Sep 17 00:00:00 2001 From: Olga Malysheva Date: Mon, 24 Jul 2023 18:32:00 +0200 Subject: [PATCH] Update documentation for oneTBB 2021.10.0 (#1147) * Update documentation for oneTBB 2021.10.0 Signed-off-by: Olga Malysheva --- RELEASE_NOTES.md | 32 ++-- doc/GSG/next_steps.rst | 31 ++++ doc/main/_templates/layout.html | 9 +- doc/main/tbb_userguide/std_invoke.rst | 217 ++++++++++++++++++++++++++ doc/main/tbb_userguide/title.rst | 1 + 5 files changed, 266 insertions(+), 24 deletions(-) create mode 100644 doc/main/tbb_userguide/std_invoke.rst diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md index b3e979a8e9..57258416fe 100644 --- a/RELEASE_NOTES.md +++ b/RELEASE_NOTES.md @@ -21,29 +21,23 @@ This document contains changes of oneTBB compared to the last release. - [New Features](#new-features) - [Known Limitations](#known-limitations) - [Fixed Issues](#fixed-issues) -- [Open-source Contributions Integrated](#open-source-contributions-integrated) ## :tada: New Features -- Hybrid CPU support is now a fully supported feature. +- Since C++17, parallel algorithms and Flow Graph nodes are allowed to accept pointers to the member functions and member objects as the user-provided callables. +- Added missed member functions, such as assignment operators and swap function, to the ``concurrent_queue`` and ``concurrent_bounded_queue`` containers. ## :rotating_light: Known Limitations -- A static assert will cause compilation failures in oneTBB headers when compiling with clang 12.0.0 or newer if using the LLVM standard library with -ffreestanding and C++11/14 compiler options. -- An application using Parallel STL algorithms in libstdc++ versions 9 and 10 may fail to compile due to incompatible interface changes between earlier versions of Threading Building Blocks (TBB) and oneAPI Threading Building Blocks (oneTBB). Disable support for Parallel STL algorithms by defining PSTL_USE_PARALLEL_POLICIES (in libstdc++ 9) or _GLIBCXX_USE_TBB_PAR_BACKEND (in libstdc++ 10) macro to zero before inclusion of the first standard header file in each translation unit. -- On Linux* OS, if oneAPI Threading Building Blocks (oneTBB) or Threading Building Blocks (TBB) are installed in a system folder like /usr/lib64, the application may fail to link due to the order in which the linker searches for libraries. Use the -L linker option to specify the correct location of oneTBB library. This issue does not affect the program execution. -- The oneapi::tbb::info namespace interfaces might unexpectedly change the process affinity mask on Windows* OS systems (see https://github.com/open-mpi/hwloc/issues/366 for details) when using hwloc version lower than 2.5. -- Using a hwloc version other than 1.11, 2.0, or 2.5 may cause an undefined behavior on Windows OS. See https://github.com/open-mpi/hwloc/issues/477 for details. -- The NUMA topology may be detected incorrectly on Windows OS machines where the number of NUMA node threads exceeds the size of 1 processor group. -- On Windows OS on ARM64*, when compiling an application using oneTBB with the Microsoft* Compiler, the compiler issues a warning C4324 that a structure was padded due to the alignment specifier. Consider suppressing the warning by specifying /wd4324 to the compiler command line. -- oneTBB does not support fork(), to work-around the issue, consider using task_scheduler_handle to join oneTBB worker threads before using fork(). +- A static assert will cause compilation failures in oneTBB headers when compiling with clang 12.0.0 or newer if using the LLVM standard library with ``-ffreestanding`` and C++11/14 compiler options. +- An application using Parallel STL algorithms in libstdc++ versions 9 and 10 may fail to compile due to incompatible interface changes between earlier versions of Threading Building Blocks (TBB) and oneAPI Threading Building Blocks (oneTBB). Disable support for Parallel STL algorithms by defining ``PSTL_USE_PARALLEL_POLICIES`` (in libstdc++ 9) or ``_GLIBCXX_USE_TBB_PAR_BACKEND`` (in libstdc++ 10) macro to zero before inclusion of the first standard header file in each translation unit. +- On Linux* OS, if oneAPI Threading Building Blocks (oneTBB) or Threading Building Blocks (TBB) are installed in a system folder like ``/usr/lib64``, the application may fail to link due to the order in which the linker searches for libraries. Use the ``-L`` linker option to specify the correct location of oneTBB library. This issue does not affect the program execution. +- The ``oneapi::tbb::info`` namespace interfaces might unexpectedly change the process affinity mask on Windows* OS systems (see https://github.com/open-mpi/hwloc/issues/366 for details) when using hwloc* version lower than 2.5. +- Using a hwloc* version other than 1.11, 2.0, or 2.5 may cause an undefined behavior on Windows* OS. See https://github.com/open-mpi/hwloc/issues/477 for details. +- The NUMA* topology may be detected incorrectly on Windows* OS machines where the number of NUMA* node threads exceeds the size of 1 processor group. +- On Windows* OS on ARM64*, when compiling an application using oneTBB with the Microsoft* Compiler, the compiler issues a warning C4324 that a structure was padded due to the alignment specifier. Consider suppressing the warning by specifying ``/wd4324`` to the compiler command line. +- oneTBB does not support ``fork()``, to work-around the issue, consider using task_scheduler_handle to join oneTBB worker threads before using fork(). - C++ exception handling mechanism on Windows* OS on ARM64* might corrupt memory if an exception is thrown from any oneTBB parallel algorithm (see Windows* OS on ARM64* compiler issue: https://developercommunity.visualstudio.com/t/ARM64-incorrect-stack-unwinding-for-alig/1544293). ## :hammer: Fixed Issues -- Improved robustness of thread creation algorithm on Linux* OS. -- Enabled full support of Thread Sanitizer on macOS* -- Fixed the issue with destructor calls for uninitialized objects in oneapi::tbb::parallel_for_each algorithm (GitHub* #691) -- Fixed the issue with tbb::concurrent_lru_cache when items history capacity is zero (GitHub* #265) -- Fixed compilation issues on modern GCC* versions - -## :octocat: Open-source Contributions Integrated -- Fixed the issue reported by the Address Sanitizer. Contributed by Rui Ueyama (https://github.com/oneapi-src/oneTBB/pull/959). -- Fixed the input_type alias exposed by flow_graph::join_node. Contributed by Deepan (https://github.com/oneapi-src/oneTBB/pull/868). +- Fixed the hang in the reserve method of concurrent unordered containers ([GitHub* #1056](http://github.com/oneapi-src/oneTBB/issues/1056)). +- Fixed the C++20 three-way comparison feature detection ([GitHub* #1093](http://github.com/oneapi-src/oneTBB/issues/1093)). +- Fixed oneTBB integration with CMake* in the Conda* environment. diff --git a/doc/GSG/next_steps.rst b/doc/GSG/next_steps.rst index 42e7d5c861..4974265d21 100644 --- a/doc/GSG/next_steps.rst +++ b/doc/GSG/next_steps.rst @@ -118,3 +118,34 @@ Build and Run a Sample #. If oneTBB is configured correctly, the output displays ``Sum: 5050``. +Hybrid CPU and NUMA Support +**************************** + +If you need NUMA/Hybrid CPU support in oneTBB, you need to make sure that HWLOC* is installed on your system. + +HWLOC* (Hardware Locality) is a library that provides a portable abstraction of the hierarchical topology of modern architectures (NUMA, hybrid CPU systems, etc). oneTBB relies on HWLOC* to identify the underlying topology of the system to optimize thread scheduling and memory allocation. + +Without HWLOC*, oneTBB may not take advantage of NUMA/Hybrid CPU support. Therefore, it's important to make sure that HWLOC* is installed before using oneTBB on such systems. + +Check HWLOC* on the System +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +To check if HWLOC* is already installed on your system, run ``hwloc-ls``: + +* For Linux* OS, in the command line. +* For Windows* OS, in the command prompt. + +If HWLOC* is installed, the command displays information about the hardware topology of your system. If it is not installed, you receive an error message saying that the command ``hwloc-ls`` could not be found. + +.. note:: For Hybrid CPU support, make sure that HWLOC* is version 2.5 or higher. For NUMA support, install HWLOC* version 1.11 or higher. + +Install HWLOC* +^^^^^^^^^^^^^^ + +To install HWLOC*, visit the official Portable Hardware Locality website (https://www-lb.open-mpi.org/projects/hwloc/). + +* For Windows* OS, binaries are available for download. +* For Linux* OS, only the source code is provided and binaries should be built. + +On Linux* OS, HWLOC* can be also installed with package managers, such as APT*, YUM*, etc. To do so, run: sudo apt install hwloc. + +.. note:: For Hybrid CPU support, make sure that HWLOC* is version 2.5 or higher. For NUMA support, install HWLOC* version 1.11 or higher. diff --git a/doc/main/_templates/layout.html b/doc/main/_templates/layout.html index f044be1700..eb4d31dd81 100644 --- a/doc/main/_templates/layout.html +++ b/doc/main/_templates/layout.html @@ -6,11 +6,10 @@ var wapLocalCode = 'us-en'; // Dynamically set per localized site, see mapping table for values var wapSection = "oneapi-tbb"; // WAP team will give you a unique section for your site // Load TMS - (function () { - var url = 'https://www.intel.com/content/dam/www/global/wap/tms-loader.js'; // WAP file URL - var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = url; - var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); + (function () { + var url = 'https://www.intel.com/content/dam/www/global/wap/tms-loader.js'; // WAP file URL + var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = url; + var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })(); - {% endblock %} diff --git a/doc/main/tbb_userguide/std_invoke.rst b/doc/main/tbb_userguide/std_invoke.rst new file mode 100644 index 0000000000..17ee7add99 --- /dev/null +++ b/doc/main/tbb_userguide/std_invoke.rst @@ -0,0 +1,217 @@ +.. _std_invoke: + +Invoke a Callable Object +========================== + +Starting from C++17, the requirements for callable objects passed to algorithms or Flow Graph nodes are relaxed. It allows using additional types of bodies. +Previously, the body of the algorithm or Flow Graph node needed to be a Function Object (see `C++ Standard Function Object `_) and provide an +``operator()`` that accepts input parameters. + +Now the body needs to meet the more relaxed requirements of being Callable (see `C++ Standard Callable `_) that covers three types of objects: + +* **Function Objects that provide operator(arg1, arg2, ...)**, which accepts the input parameters +* **Pointers to member functions** that you can use as the body of the algorithm or the Flow Graph node +* **Pointers to member objects** work as the body of the algorithm or parallel construct + +You can use it not only for a Flow Graph but also for algorithms. See the example below: + +.. code:: + + // The class models oneTBB Range + class StrideRange { + public: + StrideRange(int* s, std::size_t sz, std::size_t str) + : start(s), size(sz), stride(str) {} + + // A copy constructor + StrideRange(const StrideRange&) = default; + + // A splitting constructor + StrideRange(StrideRange& other, oneapi::tbb::split) + : start(other.start), size(other.size / 2) + { + other.size -= size; + other.start += size; + } + + ~StrideRange() = default; + + // Indicate if the range is empty + bool empty() const { + return size == 0; + } + + // Indicate if the range can be divided + bool is_divisible() const { + return size >= stride; + } + + void iterate() const { + for (std::size_t i = 0; i < size; i += stride) { + // Performed an action for each element of the range, + // implement the code based on your requirements + } + } + + private: + int* start; + std::size_t size; + std::size_t stride; + }; + +Where: + +* The ``StrideRange`` class models oneTBB range that should be iterated with a specified stride during its initial construction. +* The ``stride`` value is stored in a private field within the range. Therefore, the class provides the member function ``iterate() const`` that implements a loop with the specified stride. + +``range.iterate()`` +******************* + +Before C++17, to utilize a range in a parallel algorithm, such as ``parallel_for``, it was required to provide a ``Function Object`` as the algorithm's body. This Function Object defined the operations to be executed on each iteration of the range: + +.. code:: + + int main() { + std::size_t array_size = 1000; + + int* array_to_iterate = new int[array_size]; + + StrideRange range(array_to_iterate, array_size, /* stride = */ 2); + + // Define a lambda function as the body of the parallel_for loop + auto pfor_body = [] (const StrideRange& range) { + range.iterate(); + }; + + // Perform parallel iteration + oneapi::tbb::parallel_for(range, pfor_body); + + delete[] array_to_iterate; + } + +An additional lambda function ``pfor_body`` was also required. This lambda function invoked the ``rage.iterate()`` function. + +Now with C++17, you can directly utilize a pointer to ``range.iterate()`` as the body of the algorithm: + +.. code:: + + int main() { + std::size_t array_size = 1000; + + int* array_to_iterate = new int[array_size]; + + // Performs the iteration over the array elements with the specified stride + StrideRange range(array_to_iterate, array_size, /* stride = */ 2); + + // Parallelize the iteration over the range object + oneapi::tbb::parallel_for(range, &StrideRange::iterate); + + delete[] array_to_iterate; + } + +``std::invoke`` +**************** + +``std::invoke`` is a function template that provides a syntax for invoking different types of callable objects with a set of arguments. + +oneTBB implementation uses the C++ standard function ``std::invoke(&StrideRange::iterate, range)`` to execute the body. It is the equivalent of ``range.iterate()``. +Therefore, it allows you to invoke a callable object, such as a function object, with the provided arguments. + +.. tip:: Refer to `C++ Standard `_ to learn more about ``std::invoke``. + +Example +^^^^^^^^ + +Consider a specific scenario with ``function_node`` within a Flow Graph. + +In the example below, a ``function_node`` takes an object as an input to read a member object of that input and proceed it to the next node in the graph: + +.. code:: + + struct Object { + int number; + }; + + int main() { + using namespace oneapi::tbb::flow; + + // Lambda function to read the member object of the input Object + auto number_reader = [] (const Object& obj) { + return obj.number; + }; + + // Lambda function to process the received integer + auto number_processor = [] (int i) { /* processing integer */ }; + + graph g; + + // Function node that takes an Object as input and produces an integer + function_node func1(g, unlimited, number_reader); + + // Function node that takes an integer as input and processes it + function_node func2(g, unlimited, number_processor); + + // Connect the function nodes + make_edge(func1, func2); + + // Provide produced input to the graph + func1.try_put(Object{1}); + + // Wait for the graph to complete + g.wait_for_all(); + } + + +Before C++17, the ``function_node`` in the Flow Graph required the body to be a Function Object. A lambda function was required to extract the number from the Object. + +With C++17, you can use ``std::invoke`` with a pointer to the member number directly as the body. + +You can update the previous example as follows: + +.. code:: + + struct Object { + int number; + }; + + int main() { + using namespace oneapi::tbb::flow; + + // The processing logic for the received integer + auto number_processor = [] (int i) { /* processing integer */ }; + + // Create a graph object g to hold the flow graph + graph g; + + // Use a member function pointer to the number member of the Object struct as the body + function_node func1(g, unlimited, &Object::number); + + // Use the number_processor lambda function as the body + function_node func2(g, unlimited, number_processor); + + // Connect the function nodes + make_edge(func1, func2); + + // Connect the function nodes + func1.try_put(Object{1}); + + // Wait for the graph to complete + g.wait_for_all(); + } + +Find More +********* + +The following APIs supports Callable object as Bodies: + +* `parallel_for `_ +* `parallel_reduce `_ +* `parallel_deterministic_reduce `_ +* `parallel_for_each `_ +* `parallel_scan `_ +* `parallel_pipeline `_ +* `function_node `_ +* `multifunction_node `_ +* `async_node `_ +* `sequencer_node `_ +* `join_node with key_matching policy `_ diff --git a/doc/main/tbb_userguide/title.rst b/doc/main/tbb_userguide/title.rst index b51c3294b8..c073acfc8c 100644 --- a/doc/main/tbb_userguide/title.rst +++ b/doc/main/tbb_userguide/title.rst @@ -23,6 +23,7 @@ ../tbb_userguide/design_patterns/Design_Patterns ../tbb_userguide/Migration_Guide ../tbb_userguide/Constraints + ../tbb_userguide/std_invoke ../tbb_userguide/appendix_A ../tbb_userguide/appendix_B ../tbb_userguide/References