Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charm stuck on JAAS #309

Closed
marceloneppel opened this issue Nov 1, 2023 · 9 comments
Closed

Charm stuck on JAAS #309

marceloneppel opened this issue Nov 1, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@marceloneppel
Copy link
Member

Steps to reproduce

Deploy PostgreSQL K8S from stable in jimm.operatorinc.org

Expected behavior

Charm starts correctly.

Actual behavior

Charm stuck with the awaiting for cluster to start message.

Versions

Juju CLI: 3.1.6

Juju agent: 3.1.5

Charm revision: 158

Pebble logs:

root@pge-0:/# pebble logs
2023-11-01T09:31:09.897Z [postgresql] The files belonging to this database system will be owned by user "postgres".
2023-11-01T09:31:09.897Z [postgresql] This user must also own the server process.
2023-11-01T09:31:09.897Z [postgresql] 
2023-11-01T09:31:09.897Z [postgresql] The database cluster will be initialized with locales
2023-11-01T09:31:09.897Z [postgresql]   COLLATE:  C
2023-11-01T09:31:09.897Z [postgresql]   CTYPE:    C.UTF-8
2023-11-01T09:31:09.897Z [postgresql]   MESSAGES: C
2023-11-01T09:31:09.897Z [postgresql]   MONETARY: C
2023-11-01T09:31:09.897Z [postgresql]   NUMERIC:  C
2023-11-01T09:31:09.897Z [postgresql]   TIME:     C
2023-11-01T09:31:09.897Z [postgresql] The default text search configuration will be set to "english".
2023-11-01T09:31:09.897Z [postgresql] 
2023-11-01T09:31:09.897Z [postgresql] Data page checksums are enabled.
2023-11-01T09:31:09.897Z [postgresql] 
2023-11-01T09:31:09.897Z [postgresql] creating directory /var/lib/postgresql/data/pgdata ... ok
2023-11-01T09:31:09.898Z [postgresql] creating subdirectories ... ok
2023-11-01T09:31:09.899Z [postgresql] selecting dynamic shared memory implementation ... posix
2023-11-01T09:31:09.899Z [postgresql] selecting default max_connections ... 20
2023-11-01T09:31:10.739Z [postgresql] selecting default shared_buffers ... 400kB
2023-11-01T09:31:14.325Z [postgresql] selecting default time zone ... Etc/UTC
2023-11-01T09:31:14.344Z [postgresql] creating configuration files ... ok
2023-11-01T09:31:14.345Z [postgresql] running bootstrap script ... Bus error (core dumped)
2023-11-01T09:31:14.509Z [postgresql] child process exited with exit code 135
2023-11-01T09:31:14.509Z [postgresql] initdb: removing data directory "/var/lib/postgresql/data/pgdata"
2023-11-01T09:31:14.511Z [postgresql] pg_ctl: database system initialization failed
2023-11-01T09:31:15.084Z [postgresql] Traceback (most recent call last):
2023-11-01T09:31:15.084Z [postgresql]   File "/usr/bin/patroni", line 33, in <module>
2023-11-01T09:31:15.084Z [postgresql]     sys.exit(load_entry_point('patroni==3.0.2', 'console_scripts', 'patroni')())
2023-11-01T09:31:15.084Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/__main__.py", line 144, in main
2023-11-01T09:31:15.084Z [postgresql]     return patroni_main()
2023-11-01T09:31:15.085Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/__main__.py", line 136, in patroni_main
2023-11-01T09:31:15.085Z [postgresql]     abstract_main(Patroni, schema)
2023-11-01T09:31:15.085Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/daemon.py", line 181, in abstract_main
2023-11-01T09:31:15.085Z [postgresql]     controller.run()
2023-11-01T09:31:15.085Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/__main__.py", line 106, in run
2023-11-01T09:31:15.085Z [postgresql]     super(Patroni, self).run()
2023-11-01T09:31:15.086Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/daemon.py", line 126, in run
2023-11-01T09:31:15.086Z [postgresql]     self._run_cycle()
2023-11-01T09:31:15.086Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/__main__.py", line 109, in _run_cycle
2023-11-01T09:31:15.086Z [postgresql]     logger.info(self.ha.run_cycle())
2023-11-01T09:31:15.086Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1770, in run_cycle
2023-11-01T09:31:15.086Z [postgresql]     info = self._run_cycle()
2023-11-01T09:31:15.087Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1592, in _run_cycle
2023-11-01T09:31:15.087Z [postgresql]     return self.post_bootstrap()
2023-11-01T09:31:15.087Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1483, in post_bootstrap
2023-11-01T09:31:15.087Z [postgresql]     self.cancel_initialization()
2023-11-01T09:31:15.087Z [postgresql]   File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1476, in cancel_initialization
2023-11-01T09:31:15.088Z [postgresql]     raise PatroniFatalException('Failed to bootstrap cluster')
2023-11-01T09:31:15.088Z [postgresql] patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'

Additional context

Huge pages setting in the unit:

root@pge-0:/# sysctl -a | grep hugepage
sysctl: reading key "kernel.apparmor_display_secid_mode"
sysctl: reading key "kernel.unprivileged_userns_apparmor_policy"
vm.nr_hugepages = 1024
vm.nr_hugepages_mempolicy = 1024
vm.nr_overcommit_hugepages = 0
root@pge-0:/#

MM discussion

@marceloneppel marceloneppel added the bug Something isn't working label Nov 1, 2023
Copy link
Contributor

github-actions bot commented Nov 1, 2023

@taurus-forever
Copy link
Contributor

From https://chat.charmhub.io/charmhub/pl/yg6asxao6f8hpn93xpq83efrte Marcelo wrote:

I created a new revision for the PostgreSQL charm on the 14/edge/test channel. It should have the fix to start PostgreSQL (and avoid it being stuck with the awaiting for cluster to start message). I couldn't reproduce the Juju secrets issue that you showed me in the logs, but now the PostgreSQL charm should be able to start.
....
If that revision works, I should create a pull request to later publish the revision correctly to the 14/edge channel.

We are waiting for the fix confirmation to merge in edge...

@taurus-forever
Copy link
Contributor

@marceloneppel what is our plan here?

@marceloneppel
Copy link
Member Author

We'll need some help from Alex Kilroy to bootstrap the environment again. He was busy, so that we couldn't progress in this task.

@taurus-forever
Copy link
Contributor

taurus-forever commented Jun 7, 2024

Hi @ale8k , is it still reproducible on JAAS (with 14/stable or 14/candidate we are preparing for stable release now)?
If so, can you help us with env to reproduce and troubleshoot there? Tnx!

@taurus-forever
Copy link
Contributor

Dear @ale8k are there any place we can reproduce this issue (see my comment above)? Tnx!

@ale8k
Copy link

ale8k commented Aug 14, 2024

Hi @taurus-forever, I'm unsure on how to reproduce this... Perhaps @kian99 knows?

@ale8k
Copy link

ale8k commented Aug 14, 2024

Ahh! This can be ignored, operator day has passed.

@taurus-forever
Copy link
Contributor

Resolving the ticket as no longer topical.

Data Team is still interested in JAAS deployment/testing and searching the environment to test and document it (separate story).

Tnx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants