Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tentative rvalue reference support for parallel_reduce #171

Conversation

rfschtkt
Copy link

@rfschtkt rfschtkt commented Aug 7, 2019

My own take on rvalue reference support for parallel_reduce (see comments in pull request #169), extended with support for semigroups, i.e., without using an identity element.

For a while, shameless history rewriting was going on here, then on 2019-09-09 I squashed everything, and now I intend to only do straight commits... or that was the intention before I rebased into thematic commits for easier review.

@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch from 6b3f665 to 0dc4b53 Compare August 12, 2019 01:59
@rfschtkt rfschtkt mentioned this pull request Aug 17, 2019
@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch 5 times, most recently from fd6be4e to c501689 Compare August 30, 2019 04:21
@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch 6 times, most recently from 204a8e6 to 302528b Compare September 7, 2019 05:23
@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch 4 times, most recently from 398dfa7 to 2f283a1 Compare September 9, 2019 08:21
@rfschtkt rfschtkt marked this pull request as ready for review September 9, 2019 08:31
@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch 4 times, most recently from 70798a5 to bad586d Compare September 17, 2019 19:44
Copy link

@hjmjohnson hjmjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more expressive comments in the commit message describing the rationale for incorporating the changes would facilitate acceptance of the code changes.

The benefits of the proposed changes may be obvious to you, but they may require the reviewer to do extensive code hunting to understand the benefit of the changes.

Clearly stating the benefits of accepting the proposed changes makes reviewing tremendously easier.

@rfschtkt
Copy link
Author

rfschtkt commented Oct 3, 2019

Hmm, I would hope that the code and the comments there speak for themselves, without depending on any rationale hidden in a history that may not even remain accessible... Does it help if I characterise the current 9 commits as, respectively: trivial, frivolous (x4), preparation, material (x2), cleanup? Otherwise, please give an example of what you would like to see instead.

(Added) I just force-pushed the same commits, but reworded, and with "Frivolous: alignment" first split off from "Material: rvalue support", then relocated next to the other frivolous commits, and finally edited with a variable renamed and a comment removed (I verified that these were also the only changes to the final result).

(Added) Somebody stop me! Now with "Frivolous: whitespace".

(Added) And function->operation.

(Added) Applied final cleanup to previous commits instead.

(Added) Corrected "body" -> "function object", renamed Accumulator*.

(Added) Sorry, I did not maintain this, there are more changes after this. Maybe I should have tried to keep the history in git (if that is easily done and not too confusing)?

@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch 5 times, most recently from cd69591 to 7c58e99 Compare October 8, 2019 07:11
@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch from 7c58e99 to f58c370 Compare October 13, 2019 07:43
@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch from c6f4417 to 3f37419 Compare April 15, 2020 04:57
@rfschtkt
Copy link
Author

rfschtkt commented Apr 15, 2020

Some (trivial) tinkering with "Ancillary: Foo->Minimal" (only): MinimalBody member variables now always initialised, instead of assigned either inside or outside the constructor.

Still wondering where std::optional, backported or not, causes additional move-construction to happen... but not enough to try to find out.

@BenFrantzDale
Copy link

What's needed to push this over the line?

@rfschtkt
Copy link
Author

rfschtkt commented Jun 5, 2020

I am resigned to the distinct possibility that that might never happen, yet still available to assist in any serious attempt to integrate some of the ideas I had fun exploring.

@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch from f662b2f to 75808c3 Compare July 2, 2020 07:49
@rfschtkt
Copy link
Author

rfschtkt commented Jul 2, 2020

Oops! I did it again... Moved a comment from the "Experimental: std::optional" commit to the "Material: semigroup support" where it belongs, then squashed the previous commits about "Experimental: std::optional" with an improved comment and some asserts added.

BTW, I just noticed that my history rewriting seems to have discarded @hjmjohnson's review comments (sorry!)... maybe try again?

@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch from 75808c3 to f7c1ff6 Compare July 3, 2020 23:21
@rfschtkt
Copy link
Author

rfschtkt commented Jul 3, 2020

Moved some inline comments from "Experimental: std::optional" to "Material: rvalue support" where they belong, and minimally changed some others. But the outcome was negative anyway, so this commit can be disregarded.

@BenFrantzDale
Copy link

It seems like this adds a lot, going beyond the scope of adding rvalue support. Is there a reason not to do the simple change of just having lambda_reduce_body's result be Value& result() and have its join be my_value = my_reduction(tbb ::internal::move(my_value), tbb::internal::move(rhs.my_value));?

I appreciate the desire for a more-elegant solution and for avoiding reconstructing default values, but it feels like that's something that callers could do if they want (using std::unique_ptr<T>(nullptr) or std::optional<T>(std::nullopt) as the identity, for example.

@BenFrantzDale
Copy link

PS: There's no Issue to go with this. Should I create one?

@alexey-katranov
Copy link
Contributor

The main issue that we need to rebase the patch on onetbb_2021 because we are not planning to update TBB 2020. Unfortunately, the code base has changed over parallel_reduce and I expect a lot of merge conflicts. The good news is that we can rely on C++11 and do not need to provide compatibility with C++03.

Sorry that I request additional work, can you reconsider (or at least rebase) the approach on the new code base?

@rfschtkt
Copy link
Author

You certainly don't need all the ideas I had fun exploring here (although you also shouldn't forget to move() out of result(), as my changes to the tests would have demonstrated). The semigroup stuff (as I've called it here) is clearly differentiated in some other languages, e.g., with foldl1 instead of foldl (Haskell) or with reduceLeft() instead of foldLeft() (Scala), and it's there for a reason (it's always an imposition on the user to have to emulate one in terms of the other, in both directions), although the reason why it now seems better to me not to attempt to use overloading for that here is because of the potential confusion with the use of default values in C++17 reduce(). BTW, git has no idea how to even begin to rebase this...

@rfschtkt
Copy link
Author

rfschtkt commented Sep 14, 2020

I'm not entirely convinced yet that git best practices were top of mind when branch onetbb_2021 was created... There are also lots of differences between the relevant corresponding files, so perhaps it would be best to first make firm decisions on what will be used, and to then reapply those changes manually to the new branch.

BTW, I'm curious how std::reduce() fares against tbb::parallel_reduce(), considering that the former is entirely dependent on compiler smarts to optimise range operations. On the other hand, it might be nice to also have a default range operation in TBB...

@alexey-katranov
Copy link
Contributor

The onetbb_2021 branch is not real branch. It is a completely new code base that partly was copied from TBB 2020 but the most of the code was reconsidered.

With ranges, do you mean C++20 ranges support?

@rfschtkt
Copy link
Author

rfschtkt commented Sep 14, 2020

What I meant was that TBB uses separate and redundant operations, one for reducing over a range of elements and the other for joining intermediate results: the former could most easily be implemented in terms of the latter, computationally speaking, but taking it as a parameter also offers an opportunity for explicit vectorisation or any other intervention for optimisation, so I was just wondering how often this is significant and to what extent.

(2020-09-15 Added) To avoid misunderstanding, I think I'd rather be forced to always provide a range operation than not being able to customise one, but it might be even nicer to have it be an optional parameter. I think I gave up on the idea earlier because it probably requires C++11 (or even later?) and because there were already so many overloads, but if C++11 can now be assumed and the semigroup stuff could be handled separately by tbb::parallel_reduce1()...

@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch from f7c1ff6 to 3edb858 Compare September 16, 2020 10:01
@rfschtkt
Copy link
Author

rfschtkt commented Sep 16, 2020

Semigroup support now with 1 suffix, and is_not_partitioner_t no longer needed for SFINAE-based disambiguation. Maybe next I'll try to make the range operation parameter optional... :-)

(2020-09-18 Added) std::accumulate(std::next(r.begin()), r.end(), *r.begin())

@rfschtkt rfschtkt force-pushed the parallel_reduce_with_rvalue_reference branch from 3edb858 to e5b9d50 Compare September 17, 2020 23:30
@rfschtkt
Copy link
Author

rfschtkt commented Sep 18, 2020

I don't think I'll be playing around with making range operations optional after all. The later operation-based functions entirely replace the original Body-based ones, but to provide the same functionality without range operations (in those cases where only performance-related things like explicit vectorisation can be omitted), there would also have to be tbb::parallel_transform_reduce() functions. That's a lot of dead code to write just for fun amid other options.

I'll look into what's required to rebase this, but if it takes significant effort I would prefer the editorialising to occur prior to the effort with a firm commitment to integrate the result...

(2020-09-20 Added, then corrected because somehow an intermediate version was saved) It seems that it would take significant effort, so I'm not going to rebase "at risk" (see COVID-19 vaccine development efforts)... Did anybody else notice the problem with how the aggregation type is (trivially) deduced from the identity argument (unless it is explicitly specified as a template argument, which is only possible by also specifying the type of the earlier Range parameter, which is quite cumbersome and therefore unlikely)? That seems quite the "design bug", apparently inherited from the accidental circumstance that std::accumulate() was introduced before C++11's std::result_of (later std::invoke_result). In turn, for the semigroup versions I inferred the aggregation type from the result of the range operation. Couldn't we do better now (by using the type of the result of the binary/join operation), or do we have to live with it forever because fixing it would change behaviour that imprudently relies on something that could not possibly be regarded as appropriate? API design is tricky...

(Added, then corrected a mistake) Alas, std::reduce() and std::transform_reduce() haven't fixed the problem, either. Why do these unforced errors keep happening? If new programs have a beta-testing phase as a matter of course, why not language/library design? My favourite example is if you have sets A and B, and construct AB by initialising as A and adding all elements from B and BA by initialising as B and adding all elements from A, and then you ask whether AB == BA...

@kboyarinov
Copy link
Contributor

Since #1307 was merged to master, this PR can be closed. Thanks to everyone for the productive discussion.
Feel free to reopen the PR of open a separate PR/issue in case of any questions on the current implementation.

@kboyarinov kboyarinov closed this Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants