Use system functions to allocate aligned memory #671

Lastique · 2021-11-25T17:16:20Z

Description

This uses the standard allocator for allocating and freeing memory with higher alignment requirements instead of the hand-rolled implementation.

- git commit message contains appropriate signed-off-by string (see CONTRIBUTING.md for details)

Type of change

bug fix - change which fixes an issue
new feature - change which adds functionality
tests - change in tests
infrastructure - change in infrastructure and CI
documentation - documentation update
optimization

Tests

added - required for new features and for some bug fixes
not needed

Documentation

updated in # - add PR number
needs to be updated
not needed

Breaks backward compatibility

Yes
No
Unknown

Lastique · 2021-11-25T17:17:08Z

This patch originated from #326.

src/tbb/allocator.cpp

alexey-katranov

Overall, I am Ok with your approach. However, can you consider a set of proposed changes to reduce the code slicing and make macro dependent code more isolated inside the functions? What do you think? (I understand that it will cause one more call in the chain but I hope modern compilers can perform tail call optimization)

src/tbb/allocator.cpp

Lastique · 2021-11-26T12:23:56Z

Overall, I am Ok with your approach. However, can you consider a set of proposed changes to reduce the code slicing and make macro dependent code more isolated inside the functions? What do you think? (I understand that it will cause one more call in the chain but I hope modern compilers can perform tail call optimization)

I'm quite certain the compiler won't optimize away the forwarding function, even if it amounts to a jmp instruction, because it has to have a unique address. That's unnecessary overhead that I'd prefer to avoid. The resulting code doesn't improve much in terms if simplicity, IMHO, and I value runtime performance more than code simplicity anyway. However, if you insist I can change the code as you suggest. Please, confirm.

alexey-katranov · 2021-11-26T13:06:42Z

The resulting code doesn't improve much in terms if simplicity

At rough estimation, I removed 6 condition blocks but added only 2.

I value runtime performance more than code simplicity anyway.

It seems we have slightly different views here. The code is not on critical path in term of performance but any correctness issue should be avoided. However, code complexity wastes engineering time for debug and maintaining while it is not critical for performance of the whole library and we can spend this time to improve something else. As for optimization, it seems possible only for __TBB_USE_MSVC_ALIGNED_MALLOC, while other platforms will have excessive jmp. So, why not have a generic approach for all platforms until we face performance issue due to this inefficiency?

However, if you insist I can change the code as you suggest. Please, confirm.

I do not like to insist and prefer finding consensus. In any case, good balance is consist of multiple opinions. Taking into consideration that you contribute relatively often and reasonably, your opinion is valuable.

Lastique · 2021-11-26T13:32:11Z

The resulting code doesn't improve much in terms if simplicity

At rough estimation, I removed 6 condition blocks but added only 2.

Yes, it is better, I agree, but I just don't find the change that much better. I mean, the original code is not hard to decipher, and all conditions are uniform and easy to locate if they need to be updated. In my personal code I would probably be fine with the extra preprocessor checks if it makes the code more optimal.

I value runtime performance more than code simplicity anyway.

It seems we have slightly different views here. The code is not on critical path in term of performance but any correctness issue should be avoided. However, code complexity wastes engineering time for debug and maintaining while it is not critical for performance of the whole library and we can spend this time to improve something else. As for optimization, it seems possible only for __TBB_USE_MSVC_ALIGNED_MALLOC, while other platforms will have excessive jmp. So, why not have a generic approach for all platforms until we face performance issue due to this inefficiency?

Because MSVC is lucky enough for this optimization to be possible, and it seems wasteful to not take this opportunity.

However, if you insist I can change the code as you suggest. Please, confirm.

I do not like to insist and prefer finding consensus. In any case, good balance is consist of multiple opinions. Taking into consideration that you contribute relatively often and reasonably, your opinion is valuable.

The thing is, this project is not my personal code, and you, the maintainers, may have different priorities than I do, and this is totally fine. I'm just stating my position, and if you still think you prefer it the other way, I can change accordingly, and that is also fine. If I'm adamantly against something, you can be sure I will say so and refuse to change. :)

In fact in this case I don't care about MSVC as my main target platform is Linux. I just added support for it because it was the straightforward and easy thing to do. My main goal here is to get this optimization merged, primarily for Linux, and I don't want this PR to hang indefinitely in limbo or get bogged down in discussions about minor things like this. So I will change the code as you suggest, no problem.

Lastique · 2021-11-26T14:27:33Z

Because MSVC is lucky enough for this optimization to be possible, and it seems wasteful to not take this opportunity.

Actually, it's not only MSVC - free is used directly on other platforms.

…memory. This uses the standard allocator for allocating and freeing memory with higher alignment requirements instead of the hand-rolled implementation. Signed-off-by: Andrey Semashev <[email protected]>

Lastique · 2021-11-26T14:37:35Z

So I will change the code as you suggest, no problem.

Done.

alexey-katranov

Thank you for the contribution.

…memory. (#671) This uses the standard allocator for allocating and freeing memory with higher alignment requirements instead of the hand-rolled implementation. Signed-off-by: Andrey Semashev <[email protected]>

Lastique mentioned this pull request Nov 25, 2021

Allow to disable tbbmalloc (oneTBB) #326

Open

alexey-katranov previously approved these changes Nov 25, 2021

View reviewed changes

src/tbb/allocator.cpp Outdated Show resolved Hide resolved

Lastique dismissed alexey-katranov’s stale review via ad56fad November 25, 2021 18:14

Lastique force-pushed the use_memalign branch from 1146885 to ad56fad Compare November 25, 2021 18:14

Lastique requested a review from alexey-katranov November 25, 2021 18:16

alexey-katranov reviewed Nov 26, 2021

View reviewed changes

Use memalign, posix_memalign and _aligned_malloc to allocate aligned …

0365020

…memory. This uses the standard allocator for allocating and freeing memory with higher alignment requirements instead of the hand-rolled implementation. Signed-off-by: Andrey Semashev <[email protected]>

Lastique force-pushed the use_memalign branch from ad56fad to 0365020 Compare November 26, 2021 14:36

Lastique requested a review from alexey-katranov November 26, 2021 14:37

alexey-katranov approved these changes Dec 2, 2021

View reviewed changes

alexey-katranov merged commit b7a062e into oneapi-src:master Dec 3, 2021

Lastique deleted the use_memalign branch December 3, 2021 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use system functions to allocate aligned memory #671

Use system functions to allocate aligned memory #671

Lastique commented Nov 25, 2021

Lastique commented Nov 25, 2021

alexey-katranov left a comment

Lastique commented Nov 26, 2021

alexey-katranov commented Nov 26, 2021

Lastique commented Nov 26, 2021

Lastique commented Nov 26, 2021

Lastique commented Nov 26, 2021

alexey-katranov left a comment

Use system functions to allocate aligned memory #671

Use system functions to allocate aligned memory #671

Conversation

Lastique commented Nov 25, 2021

Description

Type of change

Tests

Documentation

Breaks backward compatibility

Lastique commented Nov 25, 2021

alexey-katranov left a comment

Choose a reason for hiding this comment

Lastique commented Nov 26, 2021

alexey-katranov commented Nov 26, 2021

Lastique commented Nov 26, 2021

Lastique commented Nov 26, 2021

Lastique commented Nov 26, 2021

alexey-katranov left a comment

Choose a reason for hiding this comment