Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROOT 6.28.00 fails on aarch64: cling JIT session error: Failed to materialize symbols #12294

Closed
ellert opened this issue Feb 12, 2023 · 3 comments · Fixed by #12353
Closed

ROOT 6.28.00 fails on aarch64: cling JIT session error: Failed to materialize symbols #12294

ellert opened this issue Feb 12, 2023 · 3 comments · Fixed by #12353

Comments

@ellert
Copy link
Contributor

ellert commented Feb 12, 2023

Describe the bug

About 1/3 of the tests fail on aarch64 with errors like:

cling JIT session error: Failed to materialize symbols: { (main, { __aarch64_ldadd8_acq_rel }) }
cling JIT session error: Failed to materialize symbols: { (main, { _ZN9RooArgSetC1IJEEERK9RooAbsArgDpOT_ }) }
cling JIT session error: Failed to materialize symbols: { (main, { _ZN9RooArgSetC1IJEEERK9RooAbsArgDpOT_ }) }
cling JIT session error: Failed to materialize symbols: { (main, { _ZN9RooArgSetC1IJEEERK9RooAbsArgDpOT_ }) }

Expected behavior

No failing tests.

To Reproduce

Build ROOT 6.28.00 for aarch64, run tests.

  1. ROOT version 6.28.00
  2. Operating system GNU/Linux RHEL+EPEL 9, Fedora 36. Fedora 37, Fedora 38. Fedora 39 - same result on all
  3. Package build from source.

Additional context

epel9: 67% tests passed, 434 tests failed out of 1317
f36: 67% tests passed, 435 tests failed out of 1318
f37: 67% tests passed, 435 tests failed out of 1318
f38: 67% tests passed, 436 tests failed out of 1318

Some of the symbols that can't be found are in libgcc:
$ nm /usr/lib/gcc/aarch64-redhat-linux/12/libgcc.a | grep __aarch64_ldadd4_acq_rel
0000000000000000 T __aarch64_ldadd4_acq_rel

@hahnjo
Copy link
Member

hahnjo commented Feb 14, 2023

@ellert thanks for the report, I'll need to see if I can get access to a AArch64 system to test. In the meantime, could you see if the symbols appear in the executables or one of the shared libraries? Then they would be in the process and Cling should automatically find them...

@ellert
Copy link
Contributor Author

ellert commented Feb 19, 2023

This problem happens with

  • RHEL+EPEL 9 (gcc 11.3.1)
  • Fedora 36/37 (gcc 12.2.1)
  • Fedora 38/39 (gcc 13.0.1)

But not with:

  • RHEL+EPEL 8 (gcc 8.5.0)

It seems to be related to an issue that appeared also with the previous LLVM 9 based version of ROOT when gcc 10 was introduced in Fedora 33:
https://bugzilla.redhat.com/show_bug.cgi?id=1830472

From a comment in the above bugzilla report:
On aarch64 -moutline-atomics has been turned on by default, and those symbols are solely in libgcc.a, not in libgcc_s.so.*.

The problem with the old ROOT version was fixed when the libgcc_s.so symlink in gcc was replaced by a linker script. This linker script is still there:

$ cat /usr/lib/gcc/aarch64-redhat-linux/12/libgcc_s.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-littleaarch64)
GROUP ( /lib64/libgcc_s.so.1 libgcc.a )

With this linker scripts ROOT worked fine on aarch64 with gcc >= 10. But with the LLVM 13 update it broke again despite the linker script still being there. It still works on RHEL+EPEL 8 with gcc 8 which seems to suggest that it is related to -moutline-atomics which became the default on aarch64 in gcc 10.

I hope this will give some ideas about how to fix it.

@hahnjo
Copy link
Member

hahnjo commented Feb 20, 2023

Thanks @ellert, this helped a lot to get me started into the right direction!

As far as I can tell, the problem is slightly different from https://bugzilla.redhat.com/show_bug.cgi?id=1830472; that one failed during the build of ROOT while we now have a problem during JIT compilation, after ROOT has already been built successfully. But we are very likely on the right track here with -moutline-atomics because Clang now defaults to enabling that if it detects a libgcc newer than 9.3.1 - this explains why it still works with GCC 8. Before the upgrade to LLVM 13, it was working fine everywhere because LLVM 9 didn't know about the __aarch64_ldadd* functions, I believe it used a different lowering strategy for atomics...

hahnjo added a commit to hahnjo/root that referenced this issue Feb 20, 2023
The routines __aarch64_* are defined in the static library libgcc.a
and not necessarily included in libCling or otherwise present in the
process, so the interpreter has a hard time finding them.

Fixes root-project#12294
hahnjo added a commit to hahnjo/root that referenced this issue Feb 20, 2023
The routines __aarch64_* are defined in the static library libgcc.a
and not necessarily included in libCling or otherwise present in the
process, so the interpreter has a hard time finding them.

Fixes root-project#12294
hahnjo added a commit that referenced this issue Feb 20, 2023
The routines __aarch64_* are defined in the static library libgcc.a
and not necessarily included in libCling or otherwise present in the
process, so the interpreter has a hard time finding them.

Fixes #12294
hahnjo added a commit to hahnjo/root that referenced this issue Feb 20, 2023
The routines __aarch64_* are defined in the static library libgcc.a
and not necessarily included in libCling or otherwise present in the
process, so the interpreter has a hard time finding them.

Fixes root-project#12294

(cherry picked from commit ddf9a8c)
hahnjo added a commit that referenced this issue Mar 2, 2023
The routines __aarch64_* are defined in the static library libgcc.a
and not necessarily included in libCling or otherwise present in the
process, so the interpreter has a hard time finding them.

Fixes #12294

(cherry picked from commit ddf9a8c)
omazapa pushed a commit to omazapa/root that referenced this issue Apr 13, 2023
The routines __aarch64_* are defined in the static library libgcc.a
and not necessarily included in libCling or otherwise present in the
process, so the interpreter has a hard time finding them.

Fixes root-project#12294
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants