Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use system functions to allocate aligned memory #671

Merged
merged 1 commit into from
Dec 3, 2021

Conversation

Lastique
Copy link
Contributor

Description

This uses the standard allocator for allocating and freeing memory with higher alignment requirements instead of the hand-rolled implementation.

  • - git commit message contains appropriate signed-off-by string (see CONTRIBUTING.md for details)

Type of change

  • bug fix - change which fixes an issue
  • new feature - change which adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update
  • optimization

Tests

  • added - required for new features and for some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

@Lastique
Copy link
Contributor Author

This patch originated from #326.

src/tbb/allocator.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@alexey-katranov alexey-katranov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I am Ok with your approach. However, can you consider a set of proposed changes to reduce the code slicing and make macro dependent code more isolated inside the functions? What do you think? (I understand that it will cause one more call in the chain but I hope modern compilers can perform tail call optimization)

src/tbb/allocator.cpp Outdated Show resolved Hide resolved
src/tbb/allocator.cpp Outdated Show resolved Hide resolved
src/tbb/allocator.cpp Show resolved Hide resolved
src/tbb/allocator.cpp Outdated Show resolved Hide resolved
src/tbb/allocator.cpp Outdated Show resolved Hide resolved
src/tbb/allocator.cpp Outdated Show resolved Hide resolved
src/tbb/allocator.cpp Outdated Show resolved Hide resolved
@Lastique
Copy link
Contributor Author

Overall, I am Ok with your approach. However, can you consider a set of proposed changes to reduce the code slicing and make macro dependent code more isolated inside the functions? What do you think? (I understand that it will cause one more call in the chain but I hope modern compilers can perform tail call optimization)

I'm quite certain the compiler won't optimize away the forwarding function, even if it amounts to a jmp instruction, because it has to have a unique address. That's unnecessary overhead that I'd prefer to avoid. The resulting code doesn't improve much in terms if simplicity, IMHO, and I value runtime performance more than code simplicity anyway. However, if you insist I can change the code as you suggest. Please, confirm.

@alexey-katranov
Copy link
Contributor

The resulting code doesn't improve much in terms if simplicity

At rough estimation, I removed 6 condition blocks but added only 2.

I value runtime performance more than code simplicity anyway.

It seems we have slightly different views here. The code is not on critical path in term of performance but any correctness issue should be avoided. However, code complexity wastes engineering time for debug and maintaining while it is not critical for performance of the whole library and we can spend this time to improve something else. As for optimization, it seems possible only for __TBB_USE_MSVC_ALIGNED_MALLOC, while other platforms will have excessive jmp. So, why not have a generic approach for all platforms until we face performance issue due to this inefficiency?

However, if you insist I can change the code as you suggest. Please, confirm.

I do not like to insist and prefer finding consensus. In any case, good balance is consist of multiple opinions. Taking into consideration that you contribute relatively often and reasonably, your opinion is valuable.

@Lastique
Copy link
Contributor Author

The resulting code doesn't improve much in terms if simplicity

At rough estimation, I removed 6 condition blocks but added only 2.

Yes, it is better, I agree, but I just don't find the change that much better. I mean, the original code is not hard to decipher, and all conditions are uniform and easy to locate if they need to be updated. In my personal code I would probably be fine with the extra preprocessor checks if it makes the code more optimal.

I value runtime performance more than code simplicity anyway.

It seems we have slightly different views here. The code is not on critical path in term of performance but any correctness issue should be avoided. However, code complexity wastes engineering time for debug and maintaining while it is not critical for performance of the whole library and we can spend this time to improve something else. As for optimization, it seems possible only for __TBB_USE_MSVC_ALIGNED_MALLOC, while other platforms will have excessive jmp. So, why not have a generic approach for all platforms until we face performance issue due to this inefficiency?

Because MSVC is lucky enough for this optimization to be possible, and it seems wasteful to not take this opportunity.

However, if you insist I can change the code as you suggest. Please, confirm.

I do not like to insist and prefer finding consensus. In any case, good balance is consist of multiple opinions. Taking into consideration that you contribute relatively often and reasonably, your opinion is valuable.

The thing is, this project is not my personal code, and you, the maintainers, may have different priorities than I do, and this is totally fine. I'm just stating my position, and if you still think you prefer it the other way, I can change accordingly, and that is also fine. If I'm adamantly against something, you can be sure I will say so and refuse to change. :)

In fact in this case I don't care about MSVC as my main target platform is Linux. I just added support for it because it was the straightforward and easy thing to do. My main goal here is to get this optimization merged, primarily for Linux, and I don't want this PR to hang indefinitely in limbo or get bogged down in discussions about minor things like this. So I will change the code as you suggest, no problem.

@Lastique
Copy link
Contributor Author

Because MSVC is lucky enough for this optimization to be possible, and it seems wasteful to not take this opportunity.

Actually, it's not only MSVC - free is used directly on other platforms.

…memory.

This uses the standard allocator for allocating and freeing memory with
higher alignment requirements instead of the hand-rolled implementation.

Signed-off-by: Andrey Semashev <[email protected]>
@Lastique
Copy link
Contributor Author

So I will change the code as you suggest, no problem.

Done.

Copy link
Contributor

@alexey-katranov alexey-katranov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution.

@alexey-katranov alexey-katranov merged commit b7a062e into oneapi-src:master Dec 3, 2021
@Lastique Lastique deleted the use_memalign branch December 3, 2021 14:26
kboyarinov pushed a commit that referenced this pull request Dec 27, 2021
…memory. (#671)

This uses the standard allocator for allocating and freeing memory with
higher alignment requirements instead of the hand-rolled implementation.

Signed-off-by: Andrey Semashev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants