Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buckets-based hashing for gentler resource utilization #60

Open
jeromegn opened this issue Sep 20, 2023 · 1 comment
Open

Buckets-based hashing for gentler resource utilization #60

jeromegn opened this issue Sep 20, 2023 · 1 comment

Comments

@jeromegn
Copy link
Member

Corrosion presently makes full table scans to hash the contents of everything to be able to compare consistency between actors.

This is an expensive operation and it gets more and more expensive.

The new plan would be to create buckets based on primary keys hashes:

  • When Corrosion has written changes, hash all the unique primary keys, down to a u64
  • <hash> - (<hash> % <buckets size>) to determine the bucket "id"
  • Store the ID in a table linking table name, primary key and bucket ID.
  • Queue a background job to update the bucket's hash
  • Hash all bucket hashes periodically (or maybe in-place with XOR?) to determine the full consistency hash

This has several benefits:

  • No need to scan full tables
  • Should still provide a valid, consistent, hash across nodes
  • Can be done continuously

I believe this depends on vlcn-io/cr-sqlite#344 so the pk <-> bucket ID table doesn't weigh too much. We'd be able to use the cr-sqlite-encoded primary keys which are varints and therefore a lot smaller.

@jeromegn
Copy link
Member Author

Update: this should be implemented in cr-sqlite directly and feature-flagged or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant