[DPE-2728] Handle scaling to zero units #331

marceloneppel · 2023-11-23T13:15:08Z

Issue

When the cluster is scaled to 0 units and later scaled back up again, it gets into an error state. It happens due to some conflicts in the unit data and the missing leader key in the Patroni K8S Endpoint that makes the leader unit try to get the cluster info, but it's unable to do that.

Solution

Remove unit data when scaling to zero and add the leader key back if it's missing when scaling back up again. Also, don't set Unknown in the unit status if it's the original status of the unit (otherwise it would trigger an error).

One more detail: the logic from the VM charm was copied to src/relations/db.py and src/relations/postgresql_provider.py to avoid deleting the relation user when the PostgreSQL charm is scaled down.

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

…ng-to-zero-units Signed-off-by: Marcelo Henrique Neppel <[email protected]>

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

…ng-to-zero-units Signed-off-by: Marcelo Henrique Neppel <[email protected]>

dragomirp · 2023-11-27T17:54:53Z

tests/integration/ha_tests/test_self_healing.py

@@ -387,3 +388,43 @@ async def test_network_cut(
    ), "Connection is not possible after network restore"

    await is_cluster_updated(ops_test, primary_name)
+
+
+async def test_scaling_to_zero(ops_test: OpsTest, continuous_writes) -> None:


We should most probably move this test in another suite, since it's a requirement for the self healing tests to be able to run against an existing cluster and this is a potentially destructive test.

Good point. I created https://warthogs.atlassian.net/browse/DPE-3094 to handle that.

taurus-forever

Thank you for the fix. I see more problematic corner cases here, but let's document the proper way of restoring from zero units and test this implementation for some time before all the further improvements. Tnx!

…ng-to-zero-units Signed-off-by: Marcelo Henrique Neppel <[email protected]>

* Handle scaling to zero units Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Update units tests Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Remove unused constants Signed-off-by: Marcelo Henrique Neppel <[email protected]> * Don't set unknown status Signed-off-by: Marcelo Henrique Neppel <[email protected]> --------- Signed-off-by: Marcelo Henrique Neppel <[email protected]>

Handle scaling to zero units

7c08259

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

marceloneppel mentioned this pull request Nov 23, 2023

[DPE-2728] Fix scaling to zero units #328

Closed

github-actions bot added the Libraries: OK label Nov 23, 2023

marceloneppel added 5 commits November 24, 2023 09:11

Update units tests

d5a2aba

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

Remove unused constants

d6804a3

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

Merge remote-tracking branch 'origin/main' into dpe-2728-handle-scali…

914442c

…ng-to-zero-units Signed-off-by: Marcelo Henrique Neppel <[email protected]>

Don't set unknown status

df4c116

Signed-off-by: Marcelo Henrique Neppel <[email protected]>

Merge remote-tracking branch 'origin/main' into dpe-2728-handle-scali…

59a6139

…ng-to-zero-units Signed-off-by: Marcelo Henrique Neppel <[email protected]>

marceloneppel marked this pull request as ready for review November 27, 2023 14:21

marceloneppel requested review from taurus-forever and dragomirp November 27, 2023 14:21

dragomirp previously approved these changes Nov 27, 2023

View reviewed changes

dragomirp reviewed Nov 27, 2023

View reviewed changes

marceloneppel mentioned this pull request Nov 27, 2023

[DPE-2398] Fix relation broken event #221

Closed

taurus-forever previously approved these changes Nov 30, 2023

View reviewed changes

github-actions bot added Libraries: Out of sync and removed Libraries: OK labels Nov 30, 2023

Merge remote-tracking branch 'origin/main' into dpe-2728-handle-scali…

c064e25

…ng-to-zero-units Signed-off-by: Marcelo Henrique Neppel <[email protected]>

marceloneppel force-pushed the dpe-2728-handle-scaling-to-zero-units branch from cecd9ae to c064e25 Compare November 30, 2023 12:05

marceloneppel dismissed stale reviews from taurus-forever and dragomirp via c064e25 November 30, 2023 12:49

dragomirp self-requested a review November 30, 2023 13:03

dragomirp approved these changes Nov 30, 2023

View reviewed changes

dragomirp requested a review from taurus-forever November 30, 2023 13:03

marceloneppel mentioned this pull request Nov 30, 2023

Move test to avoid data loss in existing cluster #346

Open

taurus-forever approved these changes Nov 30, 2023

View reviewed changes

marceloneppel merged commit 31ca568 into main Nov 30, 2023
35 checks passed

marceloneppel deleted the dpe-2728-handle-scaling-to-zero-units branch November 30, 2023 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPE-2728] Handle scaling to zero units #331

[DPE-2728] Handle scaling to zero units #331

marceloneppel commented Nov 23, 2023 •

edited

Loading

dragomirp Nov 27, 2023

marceloneppel Nov 30, 2023

taurus-forever left a comment

[DPE-2728] Handle scaling to zero units #331

[DPE-2728] Handle scaling to zero units #331

Conversation

marceloneppel commented Nov 23, 2023 • edited Loading

Issue

Solution

dragomirp Nov 27, 2023

Choose a reason for hiding this comment

marceloneppel Nov 30, 2023

Choose a reason for hiding this comment

taurus-forever left a comment

Choose a reason for hiding this comment

marceloneppel commented Nov 23, 2023 •

edited

Loading