Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-2728] Handle scaling to zero units #331

Merged
merged 7 commits into from
Nov 30, 2023

Conversation

marceloneppel
Copy link
Member

@marceloneppel marceloneppel commented Nov 23, 2023

Issue

When the cluster is scaled to 0 units and later scaled back up again, it gets into an error state. It happens due to some conflicts in the unit data and the missing leader key in the Patroni K8S Endpoint that makes the leader unit try to get the cluster info, but it's unable to do that.

Solution

Remove unit data when scaling to zero and add the leader key back if it's missing when scaling back up again. Also, don't set Unknown in the unit status if it's the original status of the unit (otherwise it would trigger an error).

One more detail: the logic from the VM charm was copied to src/relations/db.py and src/relations/postgresql_provider.py to avoid deleting the relation user when the PostgreSQL charm is scaled down.

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
Signed-off-by: Marcelo Henrique Neppel <[email protected]>
Signed-off-by: Marcelo Henrique Neppel <[email protected]>
…ng-to-zero-units

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
Signed-off-by: Marcelo Henrique Neppel <[email protected]>
…ng-to-zero-units

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
@marceloneppel marceloneppel marked this pull request as ready for review November 27, 2023 14:21
dragomirp
dragomirp previously approved these changes Nov 27, 2023
@@ -387,3 +388,43 @@ async def test_network_cut(
), "Connection is not possible after network restore"

await is_cluster_updated(ops_test, primary_name)


async def test_scaling_to_zero(ops_test: OpsTest, continuous_writes) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should most probably move this test in another suite, since it's a requirement for the self healing tests to be able to run against an existing cluster and this is a potentially destructive test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I created https://warthogs.atlassian.net/browse/DPE-3094 to handle that.

taurus-forever
taurus-forever previously approved these changes Nov 30, 2023
Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix. I see more problematic corner cases here, but let's document the proper way of restoring from zero units and test this implementation for some time before all the further improvements. Tnx!

…ng-to-zero-units

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
@marceloneppel marceloneppel force-pushed the dpe-2728-handle-scaling-to-zero-units branch from cecd9ae to c064e25 Compare November 30, 2023 12:05
@dragomirp dragomirp self-requested a review November 30, 2023 13:03
@marceloneppel marceloneppel merged commit 31ca568 into main Nov 30, 2023
35 checks passed
@marceloneppel marceloneppel deleted the dpe-2728-handle-scaling-to-zero-units branch November 30, 2023 18:26
BON4 pushed a commit to BON4/postgresql-k8s-operator that referenced this pull request May 20, 2024
* Handle scaling to zero units

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Update units tests

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Remove unused constants

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Don't set unknown status

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

---------

Signed-off-by: Marcelo Henrique Neppel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants