haumea/zrepl: reduce snapshot count #447

vcunat · 2024-06-23T13:07:07Z

Reduce snapshot count. We repeatedly run out of space on Haumea.

vcunat · 2024-06-23T13:08:20Z

build/haumea/zrepl.nix

@@ -33,23 +33,27 @@
      };
      pruning = {
        keep_sender = [
+          { type = "not_replicated"; }


Oops, I expect we need have the regex = part here as well.

Or maybe not. Their examples set doesn't have it:
https://zrepl.github.io/configuration/prune.html#pruning-policies

Maybe we should drop this line anyway. In case the remote end is down, we probably want to keep pruning the sender to reduce the risk of running out of space. Also it might reduce the time to sync up when the receiver becomes reachable again.

vcunat · 2024-06-24T07:24:52Z

Let me dump a bit about why.

The problematic situations that we see have lots of data unique to (some) in-between snapshots, i.e. dropping some of those snapshots (manually) could release lots of space. Consequently:

making/transferring snapshots less often should decrease the total transfer amount and reduce this pressure on the remote
larger spacing between snapshots kept on Haumea should decrease the total space needed there, perhaps even if we didn't significantly decrease the total time span covered by snapshots

EDIT:

interesting note is that those problematic snapshots seem to have also larger total size (i.e. if we didn't have any snapshotting, the disk usage would be larger at those points)

vcunat · 2024-06-25T18:39:07Z

🤔 as for the backup location(s), it feels wasteful to keep every week uniformly for a year. Can you see any reason for it? I'd intuitively again go for some exponentially increasing spacing. I assume we can afford more space than on Haumea itself, so e.g. this slower one?

              "2x1h"
              "2x2h"
              "2x4h"
              "4x8h"
              # At this point the grid spans 2 days (-2h) by 10 snapshots.
              # (See note above about 8h -> 24h.)
              "2x1d"
              "2x2d"
              "2x4d"
              "2x8d"
              "2x16d"
              "2x32d"
              "2x64d"
              "2x128d"
              # At this point we keep 26 snapshots spanning 384--512 days (depends on moment),
              # with exponentially increasing spacing (almost).

Perhaps note the docs that the specified intervals do not overlap. All the intervals are stacked in the specified order and multiplicity, forming a fixed grid.

vcunat · 2024-06-25T18:41:07Z

build/haumea/zrepl.nix

+              "1x2h"
+              "1x4h"
+              # "grid" acts weird if an interval isn't a whole-number multiple
+              # of the previous one, so we jump from 8h to 24h


Not sure if it's worth trying to explain the weirdness. I base it not on actual experience but on definition in their docs – and how such a model behaves then when running continuously.

Reduce snapshot count. We repeatedly run out of space on Haumea.

1/100 of defaults seemed excessive. Suspected to cause issues. Changed to 1/10 of defaults.

This should be fine, as we have a faster connection to receiver, and the churn doesn't seem so significant anymore anyway.

vcunat commented Jun 23, 2024

View reviewed changes

vcunat commented Jun 25, 2024

View reviewed changes

vcunat added 5 commits July 1, 2024 18:10

haumea/zrepl: WIP

4220267

Reduce snapshot count. We repeatedly run out of space on Haumea.

fixup! haumea/zrepl: WIP

1ebacff

haumea/postgresql WIP: reduce auto-vacuuming

4a474df

1/100 of defaults seemed excessive. Suspected to cause issues. Changed to 1/10 of defaults.

haumea/zrepl: switch to rsync.net again

cc4b71b

haumea/zrepl: snapshot a bit more often again

0b75fae

This should be fine, as we have a faster connection to receiver, and the churn doesn't seem so significant anymore anyway.

vcunat force-pushed the zrepl branch from a690a91 to 0b75fae Compare July 1, 2024 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

haumea/zrepl: reduce snapshot count #447

haumea/zrepl: reduce snapshot count #447

vcunat commented Jun 23, 2024

vcunat Jun 23, 2024

vcunat Jun 25, 2024

vcunat Jun 27, 2024 •

edited

Loading

vcunat commented Jun 24, 2024 •

edited

Loading

vcunat commented Jun 25, 2024

vcunat Jun 25, 2024

haumea/zrepl: reduce snapshot count #447

Are you sure you want to change the base?

haumea/zrepl: reduce snapshot count #447

Conversation

vcunat commented Jun 23, 2024

vcunat Jun 23, 2024

Choose a reason for hiding this comment

vcunat Jun 25, 2024

Choose a reason for hiding this comment

vcunat Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

vcunat commented Jun 24, 2024 • edited Loading

vcunat commented Jun 25, 2024

vcunat Jun 25, 2024

Choose a reason for hiding this comment

vcunat Jun 27, 2024 •

edited

Loading

vcunat commented Jun 24, 2024 •

edited

Loading