-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better rvalues support for parallel_reduce algorithm #1307
Conversation
Signed-off-by: Konstantin Boyarinov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see someone looking at this again! A few suggestions.
include/oneapi/tbb/parallel_reduce.h
Outdated
body(range); | ||
body.join(rhs); | ||
body.join(std::move(rhs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by forwarding reference and so should be a forward not a move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this part of the reduce API was not changed at all, it was a mistake to change this concept. This line was removed.
parallel_reduce_reduction
and parallel_reduce_combine
concepts were changed instead.
include/oneapi/tbb/parallel_reduce.h
Outdated
Value&& result() noexcept { | ||
return std::move(my_value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be desirable to &&
qualify this member function? As in
[[nodiscard]] Value&& result() noexcept && {
return std::move(my_value);
}
?
That would force you to say return std::move(body).result();
elsewhere, which may be clearer. (In general body.result()
isn't obvious that it is extracting the result from body
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought on this question was removing the &&
qualifier because as an internal structure, the body would be only move-from the result. But now I guess having this qualifier would be more clear from the move semantics perspective. Thanks for catching that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. An alternative is to give it a different name so it’s clearer that it moves out without the std::move
, but just qualifying it like this feels reasonable.
include/oneapi/tbb/parallel_reduce.h
Outdated
@@ -514,7 +515,7 @@ Value parallel_reduce( const Range& range, const Value& identity, const RealBody | |||
lambda_reduce_body<Range,Value,RealBody,Reduction> body(identity, real_body, reduction); | |||
start_reduce<Range,lambda_reduce_body<Range,Value,RealBody,Reduction>,const __TBB_DEFAULT_PARTITIONER> | |||
::run(range, body, __TBB_DEFAULT_PARTITIONER() ); | |||
return body.result(); | |||
return std::move(body.result()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this std::move
does nothing since body.result()
is an rvalue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to std::move(body).result()
, as discussed in the previous thread.
@@ -56,6 +57,37 @@ struct ReduceBody { | |||
} | |||
}; | |||
|
|||
template <class T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//!
explaining what this struct is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto join_body = [](vector_wrapper<int>&& x, vector_wrapper<int>&& y) { | ||
vector_wrapper<int> new_vector = std::move(x); | ||
new_vector.reserve(new_vector.size() + y.size()); | ||
new_vector.insert(new_vector.end(), std::make_move_iterator(y.begin()), std::make_move_iterator(y.end())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A move iterator over a vector of int
s doesn't buy much. Consider for this example making
struct MoveOnlyInt {
int x;
MoveOnlyInt() = default;
explicit MoveOnlyInt(int x) : x{x} {}
MoveOnlyInt(const MoveOnlyInt&) = delete;
MoveOnlyInt(MoveOnlyInt&& other) : x{std::exchange(other.x, 0)} {}
MoveOnlyInt& operator=(const MoveOnlyInt&) = delete;
MoveOnlyInt& operator=(MoveOnlyInt&& other) {
this->x = std::exchange(other.x, 0);
return *this;
}
};
and use vectors of MoveOnlyInt
so the move iterators are required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also consider a test case using std::list
or std::set
that moves the nodes across, like
auto join_body = [](std::list<int>&& x, std::list<int>&& y) {
x.splice(x.end(), std::move(y));
return std::move(x);
}
or
auto join_body = [](std::set<int> x, std::set<int>&& y) {
x.merge(std::move(y));
return std::move(x);
}
(These could use MoveOnlyInt
too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to do a test with a type that isn’t default constructible and/or isn’t movable or copyable in a node-based container — basically all the weird corner cases of irregular types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we still cannot use move-only types for parallel_reduce since we need to copy the identity into each reduction "leaf". And using std::vector<MoveOnlyInt>
or list would cause compilation errors even if the identity element is empty.
I have changed the value type into the integer wrapper that is copyable, but the actual copy constructor should never be called.
Regarding adding new test cases, I think it makes no sense to test both "vector of vectors" and "vector of lists" use-cases. Splicing the lists while doing parallel_reduce is more interesting IMHO, so I left only the vector of lists test.
Signed-off-by: Konstantin Boyarinov <[email protected]>
Signed-off-by: Konstantin Boyarinov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I'll leave it to tbb maintainers to approve.
NeverCopyWrapper(const NeverCopyWrapper&) { | ||
REQUIRE_MESSAGE(false, "Copy constructor of NeverCopyWrapper should never be called"); | ||
} | ||
|
||
NeverCopyWrapper& operator=(const NeverCopyWrapper&) { | ||
REQUIRE_MESSAGE(false, "Copy assignment of NeverCopyWrapper should never be called"); | ||
return *this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to =delete;
these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot delete them because if we provide std::vector<NeverCopyWrapper>
as an identity to parallel_reduce
, it should be copied into each reduce leaf. And even if the empty identity would be copied, it requires value_type to be copy constructible and causes compilation issues otherwise.
Because of this, I decided to make it copiable to fix the compilation but deny to actually call the ctor and assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have investigated one more time, and since we don't use std::vector
directly, only using the wrapper, we can control that actual copy constructor would not be called by the container. So I removed NeverCopyWrapper
and changed it to be MoveOnlyWrapper
auto operator()(Args&&... args) const -> decltype(oneapi::tbb::parallel_deterministic_reduce(std::forward<Args>(args)...)) { | ||
return oneapi::tbb::parallel_deterministic_reduce(std::forward<Args>(args)...); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the trailing return type helpful versus just leave it to auto
(or decltype(auto)
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are forced to use trailing return type, because both auto
with automatic deduction and decltype(auto)
are C++14 features and we still need to support C++11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. That's annoying.
LGTM! |
Description
Adding the optimization for reduction of rvalues using
parallel_reduce
andparallel_deterministic_reduce
.Based on #171 , #169 and #1299.
This patch is intended to optimize operating with Value class in parallel reduce algorithm (lambda version). Example from #1299
The spec forces the user to use inefficient copying instead of "moving" even if the input range preservation is not important for the user. "Body" version of parallel reduce allows to drop unnecessary copying:
The main goal of this patch is to allow rvalues optimization for lambda version of reduce.
This patch is NOT intended to allow reduction of move-only objects (such as
std::unique_ptr
s) since we still need to copy the identity value into each parallel_reduce leaf.Specification change is also needed for this PR:
ParallelReduceFunc
named requirement to allowValue operator()(const Range& range, Value&& value) const
ParallelReduceReduction
named requirement to allowValue operator()(Value&& x, Value&& y) const
.This PR also contains simple test that I guess should be extended to better cover all overloads and algorithms.
Fixes # - issue number(s) if exists
Type of change
Choose one or multiple, leave empty if none of the other choices apply
Add a respective label(s) to PR if you have permissions
Tests
Documentation
Breaks backward compatibility
Notify the following users
@BenFrantzDale
@rfschtkt
Feel free to participate in review and discussions for this patch:)
Other information