Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.17.0: Segmentation fault after modifying RPATH #446

Open
bastimeyer opened this issue Nov 26, 2022 · 6 comments · Fixed by koordinates/kart#764
Open

0.17.0: Segmentation fault after modifying RPATH #446

bastimeyer opened this issue Nov 26, 2022 · 6 comments · Fixed by koordinates/kart#764
Labels

Comments

@bastimeyer
Copy link

Describe the bug

Just encountered an issue with patchelf 0.17.0...

I build AppImages for my Python application, and in order to do that I'm using Python's official manylinux docker images where I copy one of the pre-built python environments and modify the RPATHs of all binaries and their dependencies using patchelf. The RPATHs get set to $ORIGIN (and other relative paths), so that when the AppImage's squashfs gets mounted on the user's system upon execution, Python can properly be run on unknown/arbitrary mount points. So far, this has all been working flawlessly.

The patchelf 0.17.0 upgrade however has introduced segmentation faults of modified executables after modifying their RPATH. Patchelf 0.16.1 was working fine.

When looking at the recent git commit history, there was a big change in 2cb863f in regards to the ELF header file with lots of changed constants. I have 0% knowledge of any internals here and how ELF files and dynamic linking works, but that's what stood out to me.

Steps To Reproduce

Here's a short BASH script for reproducing the issue in two manylinux docker containers. One with patchelf 0.16.1 and the next one right after patchelf was upgraded to 0.17.0. There are other changes included between those two image versions, but those are unrelated and the issue can also be reproduced by simply building patchelf 0.17.0 on the older image.

#!/usr/bin/env bash

IMAGES=(
  # 2022-11-14 - patchelf 0.16.1
  quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013
  # 2022-11-19 - patchelf 0.17.0
  quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
)

SCRIPT=$(cat <<'EOF'
PYTHON=/opt/python/cp310-cp310/bin/python

patchelf --version
$PYTHON --version

rpath=$(patchelf --print-rpath $PYTHON)
echo "RPATH: $rpath"

# Modify the python executable:
# set a different value, so the file actually gets written
patchelf --debug --set-rpath "\$ORIGIN" $PYTHON
echo "TEMP RPATH: $(patchelf --print-rpath $PYTHON)"

# and revert it again
patchelf --debug --set-rpath "$rpath" $PYTHON
echo "RPATH: $(patchelf --print-rpath $PYTHON)"

$PYTHON --version
EOF
)

for image in "${IMAGES[@]}"; do
  echo "Running ${image}"
  docker run -i --rm "${image}" <<< "${SCRIPT}"
  echo $'\n\n\n'
done

Log output

$ ./patchelf-bug.sh
Running quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013
patchelf 0.16.1
Python 3.10.8
RPATH: 
patching ELF file '/opt/python/cp310-cp310/bin/python'
new rpath is '$ORIGIN'
rpath is too long, resizing...
DT_NULL index is 30
replacing section '.dynamic' with size 592
replacing section '.dynstr' with size 36674
this is an executable
using replaced section '.dynstr'
using replaced section '.dynamic'
last replaced is 20
looking at section '.interp'
replacing section '.interp' which is in the way
looking at section '.note.gnu.build-id'
replacing section '.note.gnu.build-id' which is in the way
looking at section '.note.ABI-tag'
replacing section '.note.ABI-tag' which is in the way
looking at section '.gnu.hash'
replacing section '.gnu.hash' which is in the way
looking at section '.dynsym'
replacing section '.dynsym' which is in the way
looking at section '.dynstr'
looking at section '.gnu.version'
first reserved offset/addr is 0x17fea/0x417fea
first page is 0x400000
needed space is 98952
needed space is 99008
needed pages is 1
clearing first 101586 bytes
rewriting section '.dynamic' from offset 0x2ee288 (size 576) to offset 0x318 (size 592)
rewriting section '.dynstr' from offset 0xf0b0 (size 36666) to offset 0x568 (size 36674)
rewriting section '.dynsym' from offset 0x3500 (size 48048) to offset 0x94b0 (size 48048)
rewriting section '.gnu.hash' from offset 0x308 (size 12792) to offset 0x15060 (size 12792)
rewriting section '.interp' from offset 0x2a8 (size 28) to offset 0x18258 (size 28)
rewriting section '.note.ABI-tag' from offset 0x2e8 (size 32) to offset 0x18278 (size 32)
rewriting section '.note.gnu.build-id' from offset 0x2c4 (size 36) to offset 0x18298 (size 36)
rewriting symbol table section 3
writing /opt/python/cp310-cp310/bin/python
TEMP RPATH: $ORIGIN
patching ELF file '/opt/python/cp310-cp310/bin/python'
new rpath is ''
writing /opt/python/cp310-cp310/bin/python
RPATH: 
Python 3.10.8




Running quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
patchelf 0.17.0
Python 3.10.8
RPATH: 
patching ELF file '/opt/python/cp310-cp310/bin/python'
new rpath is '$ORIGIN'
rpath is too long, resizing...
DT_NULL index is 30
replacing section '.dynamic' with size 592
replacing section '.dynstr' with size 36674
this is an executable
using replaced section '.dynstr'
using replaced section '.dynamic'
last replaced is 20
looking at section '.interp'
replacing section '.interp' which is in the way
looking at section '.note.gnu.build-id'
replacing section '.note.gnu.build-id' which is in the way
looking at section '.note.ABI-tag'
replacing section '.note.ABI-tag' which is in the way
looking at section '.gnu.hash'
replacing section '.gnu.hash' which is in the way
looking at section '.dynsym'
replacing section '.dynsym' which is in the way
looking at section '.dynstr'
looking at section '.gnu.version'
first reserved offset/addr is 0x17fea/0x417fea
first page is 0x400000
needed space is 98952
needed space is 99008
needed pages is 1
clearing first 101586 bytes
rewriting section '.interp' from offset 0x2a8 (size 28) to offset 0x318 (size 28)
rewriting section '.note.gnu.build-id' from offset 0x2c4 (size 36) to offset 0x338 (size 36)
rewriting section '.note.ABI-tag' from offset 0x2e8 (size 32) to offset 0x360 (size 32)
rewriting section '.gnu.hash' from offset 0x308 (size 12792) to offset 0x380 (size 12792)
rewriting section '.dynsym' from offset 0x3500 (size 48048) to offset 0x3578 (size 48048)
rewriting section '.dynstr' from offset 0xf0b0 (size 36666) to offset 0xf128 (size 36674)
rewriting section '.dynamic' from offset 0x2ee288 (size 576) to offset 0x18070 (size 592)
rewriting symbol table section 5
writing /opt/python/cp310-cp310/bin/python
TEMP RPATH: $ORIGIN
patching ELF file '/opt/python/cp310-cp310/bin/python'
new rpath is ''
writing /opt/python/cp310-cp310/bin/python
RPATH: 
/bin/bash: line 18:    14 Segmentation fault      (core dumped) $PYTHON --version
@otherjason
Copy link

I ran into what I think is the same problem, also with a Python 3.10 executable. PR #447 seems to fix it for me.

@pramodk
Copy link

pramodk commented Jan 20, 2023

Hello All,

We are having the same issue with >=v0.17.0 (i.e. including latest release 0.17.2).

I build AppImages for my Python application, and in order to do that I'm using Python's official manylinux docker images where I copy one of the pre-built python environments and modify the RPATHs of all binaries and their dependencies using patchelf. The RPATHs get set to $ORIGIN (and other relative paths), so that when the AppImage's squashfs gets mounted on the user's system upon execution, Python can properly be run on unknown/arbitrary mount points. So far, this has all been working flawlessly.

We have the similar use case - as part of the NEURON project (simulator used in computational neuroscience community), we distribute python wheels. These wheels contain standalone binary files that are updated by patchelf (via auditwheel) for similar reasons mentioned above.

We are trying to update our wheels building pipeline with the latest quay.io/pypa/manylinux2014_x86_64 (which contains patchelf >= 0.17.0). With the newer patchelf, when one of the binary (modlunit in this case) segfaults when the RPATHs are updated. Also, ldd crashes:

[root@0998f73e5778]# ./modlunit
Segmentation fault (core dumped)

[root@0998f73e5778]# ldd ./modlunit
/usr/bin/ldd: line 116: 15156 Segmentation fault      (core dumped) LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION=$verify_out LD_VERBOSE= "$@"

This issue doesn't appear if we are using older releases like 0.16.1. I also looked at the latest release 0.17.2 (containing #447) but that doesn't help in our case.

I locally build the patchelf and used git bisect to find the first "bad" commit. It point us to the 42394e8 (#430). The previous commit 7c18779 works fine! Also, with the latest master if I just revert #430 then the issue dissappears. I have no knowdlege of ELF/patchelf but just trying to provide additional information.

As a reproducer, you can run following script (thank you, @bastimeyer!):

#!/usr/bin/env bash

IMAGES=(
  # 2022-11-14 - patchelf 0.16.1 : using quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013
  neuronsimulator/reprod_patchelf_0160
  # 2022-11-19 - patchelf 0.17.0 using quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
  neuronsimulator/reprod_patchelf_0170
)

SCRIPT=$(cat <<'EOF'
patchelf --version

patchelf --print-rpath /tmp/modlunit
patchelf --remove-rpath /tmp/modlunit
# just add some additional rpaths
patchelf --force-rpath --set-rpath \$ORIGIN/123456:\$ORIGIN/11_22_33_44_555 /tmp/modlunit
patchelf --print-rpath /tmp/modlunit
ldd /tmp/modlunit
EOF
)

for image in "${IMAGES[@]}"; do
  echo "Running ${image}"
  docker run -i --rm "${image}" <<< "${SCRIPT}"
  echo $'\n\n\n'
done

and it produces output as:

./run.sh
Running neuronsimulator/reprod_patchelf_0160
patchelf 0.16.1
$ORIGIN/../lib:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/lib:/opt/rh/devtoolset-11/root/usr/lib/gcc/x86_64-redhat-linux/11/../../../../lib64
$ORIGIN/123456:$ORIGIN/11_22_33_44_555
	linux-vdso.so.1 =>  (0x00007ffc0cad3000)
	libnvhpcatm.so => not found
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fa75aa0a000)
	libnvomp.so => not found
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fa75a806000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa75a5ea000)
	libnvcpumath-avx2.so => not found
	libnvc.so => not found
	libc.so.6 => /lib64/libc.so.6 (0x00007fa75a21c000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa75a006000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fa759d04000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa75ad12000)


Running neuronsimulator/reprod_patchelf_0170
patchelf 0.17.0
$ORIGIN/../lib:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/lib:/opt/rh/devtoolset-11/root/usr/lib/gcc/x86_64-redhat-linux/11/../../../../lib64
$ORIGIN/123456:$ORIGIN/11_22_33_44_555
/usr/bin/ldd: line 116:    16 Segmentation fault      (core dumped) LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION=$verify_out LD_VERBOSE= "$@"

The docker images are created from a simple Dockerfile such as:

# 0.16.1
#FROM quay.io/pypa/manylinux2014_x86_64@sha256:005826a6fa94c97bd31fccf637a0f10621304da447ca2ab3963c13991dffa013

# 0.17.0
FROM quay.io/pypa/manylinux2014_x86_64@sha256:383c6016156c94d7dbd10696c15f2444288b99a25927239b7b024e1cc6ca6a81
COPY modlunit /tmp/

(i.e. on top of standard manylinux pypa image, I have included the modlunit binary that segfaults in our case).

I hope this will help to find the root cause. If anything else is needed to debug the issue, I will be more than happy to help!

Thank you!

bilke added a commit to ufz/ogs that referenced this issue Jan 26, 2023
@brenoguim
Copy link
Collaborator

Looking at the process header table of the program that crashes, we have the following two loads:

	2	0x3ff000 + 0x1118 RW align:0x1000
	7	0x400e22 + 0xf6e  R  align:0x1000

This is a bit weird because the start/end addresses do not respect alignment.
And if you round down the start address and round up the end address, we get:

0x3ff000 -> 0x401000 RW
0x400000 -> 0x402000 R

Well, there is an overlap there with mixed access rights.
If you run the program on GDB, it tells you two things:
1 - The crash happens at an attempt of the loader to write the position 0x400000
2 - Printing the /proc/pid/maps we get:

003ff000-00400000 rw-p
00400000-00402000 r--p

Which gives the answer as to what the kernel decided to do. It chose the safest access right for the clashing addresses.

This also explains why the commit 42394e8 introduced the issue. After that patch, the .dynamic section is placed last in the segment, which is the portion that became read-only.
Before that commit, .dynamic section is placed in the beginning, with a lot of RW space to use.

The original working binary had a unaligned segment entry already:

0x420cc0 + 0x5218 -> 0x425ed8 RW

but when rounded up and down, it didn't clash with any other segment.

So apparently Patchelf is doing it's thing by reordering/inserting/moving segments but in doing so, it's creating a segment clash.

@brenoguim
Copy link
Collaborator

With the PR mentioned above, ldd on the binary works correctly. But it needs discussion.

@adonis0147
Copy link

adonis0147 commented Apr 28, 2023

I met the similar issue and I tried the latest version (0.18.0) but it didn't help. After I downgraded patchelf to 0.16.1, every things worked well.

Platform: Centos 6/Ubuntu 22.04
Architecture: x86_64

@fda77
Copy link

fda77 commented Apr 28, 2023

shr-project added a commit to shr-project/meta-webosose that referenced this issue Jun 30, 2023
:Release Notes:
mke2fs.real, mkfs.ext2.real, mkfs.ext3.real, mkfs.ext4.real are indentical
binary with multiple hardlinks and we end calling patchelf-uninative 4
times even when the interpreter is already set correctly from the build

:Detailed Notes:
To avoid corrupted binaries created on 18.04 ubuntu avoid calling
patchelf-uninative multiple times and in this case don't call it at all.

It might be related to:
NixOS/patchelf#492
or
NixOS/patchelf#446
but the later was already included in patchelf-0.17.2 used in uninative-3.9

This was submitted to upstream in:
https://lists.openembedded.org/g/openembedded-core/message/183314
but wasn't merged yet (so it cannot be in meta-webos-backports-* layer)
and it might take a while until it's backported to kirkstone.

:Testing Performed:
Only build tested.

:QA Notes:
No change to image.

:Issues Addressed:
[WRP-19053] CCC: Various build fixes
[WRP-17893] mkfs.ext4 segfaults with uninative 3.10 and newer
[WRP-6209] Update jenkins slaves to use Ubuntu 20.04 or 22.04

Change-Id: Ied1e0965423c660bca375c1e8deac7500014cc03
rdb added a commit to panda3d/buildbot-panda3d that referenced this issue Aug 3, 2023
Due to bugs causing executables to be corrupted, see pypa/manylinux#1421 and NixOS/patchelf#446

Fixes panda3d/panda3d#1504
stellaraccident added a commit to iree-org/iree that referenced this issue Aug 12, 2023
There is a bug that is being triggered by recent updates, causing `iree-tracy-capture` to be corrupted during auditwheel. Pinning to the old version of patchelf, which it depends on clears the issue.

Here is the patchelf issue where others have gotten stung: NixOS/patchelf#446
nhasabni pushed a commit to plaidml/iree that referenced this issue Aug 24, 2023
There is a bug that is being triggered by recent updates, causing `iree-tracy-capture` to be corrupted during auditwheel. Pinning to the old version of patchelf, which it depends on clears the issue.

Here is the patchelf issue where others have gotten stung: NixOS/patchelf#446
ywbyun0815 pushed a commit to webosose/meta-webosose that referenced this issue Sep 7, 2023
:Release Notes:
mke2fs.real, mkfs.ext2.real, mkfs.ext3.real, mkfs.ext4.real are indentical
binary with multiple hardlinks and we end calling patchelf-uninative 4
times even when the interpreter is already set correctly from the build

:Detailed Notes:
To avoid corrupted binaries created on 18.04 ubuntu avoid calling
patchelf-uninative multiple times and in this case don't call it at all.

It might be related to:
NixOS/patchelf#492
or
NixOS/patchelf#446
but the later was already included in patchelf-0.17.2 used in uninative-3.9

This was submitted to upstream in:
https://lists.openembedded.org/g/openembedded-core/message/183314
but wasn't merged yet (so it cannot be in meta-webos-backports-* layer)
and it might take a while until it's backported to kirkstone.

:Testing Performed:
Only build tested.

:QA Notes:
No change to image.

:Issues Addressed:
[WRP-19053] CCC: Various build fixes
[WRP-17893] mkfs.ext4 segfaults with uninative 3.10 and newer
[WRP-6209] Update jenkins slaves to use Ubuntu 20.04 or 22.04

Cherry-picked-from-commit: d3e0606
Cherry-picked-from-branch:
bartoldeman added a commit to ComputeCanada/gentoo-overlay that referenced this issue Mar 21, 2024
0.18 has issues with some patched binaries segfaulting
NixOS/patchelf#446
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants