Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in OpenMP distributed tridiagonal solver #114

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

semi-h
Copy link
Member

@semi-h semi-h commented Jul 19, 2024

Last week when implementing Thomas on CUDA I realised we use this trick often, but its not really suitable for the OpenMP backend. Thought all was removed when quickly checked last week, haven't noticed this one!

@semi-h semi-h requested a review from Nanoseb July 19, 2024 02:18
@Nanoseb
Copy link
Collaborator

Nanoseb commented Jul 19, 2024

lgtm, but why is it not suitable? I am not sure I understand what was going wrong.
Also, if the results produced were wrong it may be a sign we need better tests.

@pbartholomew08
Copy link
Member

pbartholomew08 commented Jul 19, 2024

temp_du needs to be an array for SIMD access. In the past, I've found compiling with gfortran debug flags enabled catches this.

@Nanoseb
Copy link
Collaborator

Nanoseb commented Jul 19, 2024

ah yes of course

@pbartholomew08
Copy link
Member

I agree though, it indicates the tests aren't catching something

@Nanoseb
Copy link
Collaborator

Nanoseb commented Jul 19, 2024

This is maybe just a performance issue. The compiler may detect it can't vectorise the loop so it doesn't?

@pbartholomew08
Copy link
Member

Confirming that would require checking assembly/vectorisation reports. I think that the pragma means "vectorise this, ignoring reasons not to", but with strict debug flags enabled it highlights as a warning and or raises an error

@semi-h semi-h merged commit 9ab64bf into xcompact3d:main Jul 31, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants