Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission denied when renaming pgdata to pgdata.failed #460

Closed
gtato opened this issue Apr 29, 2024 · 3 comments · Fixed by #580
Closed

Permission denied when renaming pgdata to pgdata.failed #460

gtato opened this issue Apr 29, 2024 · 3 comments · Fixed by #580
Assignees
Labels
bug Something isn't working

Comments

@gtato
Copy link

gtato commented Apr 29, 2024

Steps to reproduce

This happened in prod, and I haven't reproduced in a local env.

At some point replicas go out of sync and try to restore pgdata from the primary, but fail with this error:

2024-04-29 09:16:53 UTC [15]: ERROR: Could not rename data directory /var/lib/postgresql/data/pgdata 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 1314, in remove_data_directory
    shutil.rmtree(self._data_dir)
  File "/usr/lib/python3.10/shutil.py", line 731, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib/python3.10/shutil.py", line 729, in rmtree
    os.rmdir(path)
PermissionError: [Errno 13] Permission denied: '/var/lib/postgresql/data/pgdata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 1287, in move_data_directory
    os.rename(self._data_dir, new_name)
PermissionError: [Errno 13] Permission denied: '/var/lib/postgresql/data/pgdata' -> '/var/lib/postgresql/data/pgdata.failed'

Expected behavior

Replicas retrieve correctly the wal entries from the primary and restore their state.

Actual behavior

Replicas fail to get pgdata and fail. This in turn causes the primary wal to increase the size until it can't function properly.

Versions

Operating system:

Juju CLI: 2.9.49

Juju agent: 3.1.8

Charm revision: 14/edge 198

microk8s: v1.26.15

Log output

Juju debug log:

Additional context

To resolve this issue I used these steps: https://matrix.to/#/!BukWfnyOTgQSKAxdtT:ubuntu.com/$C-iLZEXS39xBD8vFV40EVBWFefNjlvUQmxFDxNcS2p0?via=ubuntu.com&via=matrix.org

but the I am not sure how to prevent this issue in the future.

@gtato gtato added the bug Something isn't working label Apr 29, 2024
Copy link
Contributor

@marceloneppel marceloneppel self-assigned this Jun 13, 2024
@marceloneppel
Copy link
Member

Steps to reproduce on GKE:

juju ssh --container postgresql postgresql-k8s/leader bash # leader
apt update && apt install nano curl -y
nano /var/lib/postgresql/data/patroni.yml

# Remove the other units from both pg_hba sections.

curl -X POST localhost:8008/reload # leader

# Wait 30 seconds.

juju ssh --container postgresql postgresql-k8s/0 bash # replica
apt update && apt install curl -y
curl -X POST localhost:8008/reinitialize

juju ssh --container postgresql postgresql-k8s/1 bash # replica
apt update && apt install curl -y
curl -X POST localhost:8008/reinitialize

The issue is related to the permissions in the volume mounted in the units, like in https://warthogs.atlassian.net/browse/DPE-707. I'll create a PR to fix that.

@marceloneppel
Copy link
Member

Hi, @gtato!

Revisions 332 and 333 from the 14/edge channel contain the fix for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants