Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lookaside for primary keys #344

Closed
wants to merge 7 commits into from
Closed

lookaside for primary keys #344

wants to merge 7 commits into from

Conversation

tantaman
Copy link
Collaborator

@tantaman tantaman commented Sep 6, 2023

Primary keys for CRRs are generally going to be UUIDs or some other large-ish thing.

Moving them out into a lookaside lets us keep a variable length encoded int in the clock tables instead of the full primary key. This is useful since clock tables keep an entry per cell rather than a single entry per row.

In tables with < 65k entries this save 14 bytes. In tables not exceeding 4 billion entries, saves us 12 bytes.

(aside: we can use 64 bit ints in many cases for primary keys in cases where we can segment the population similar to https://mariadb.com/kb/en/uuid_short/)


  • trigger function
  • merge function
  • pull changesets

We can do this since we're moving all user controlled columns out of the clock tables and into `[table_name]__crsql_pks`

clock tables currently still use user provided columns as the primary keys so we can test this change on its own before making the sweeping change of using the lookaside table for the clock table primary key
We may abandon triggers all together in the near future and either:

- use the preupdate hook
- the update hook
- strip trigger bodies to only invoke a single function or virtual table
The problem is trigger constraints.

We need the key from the lookup table. We can't set a variable in a trigger so I was going to do:

```sql
SELECT set_value(SELECT key FROM lookaside WHERE ...);
```

and then use `get_value` later.

Problem is, set value may not get set. So we need to conditionally insert and set the value again after the insert.

Further problem is we can't do that in a trigger.

So the new plan...

Stop using triggers, mostly.

Trigger bodies will now be:

```sql
VALUES (process_insert/update/delete('table', 'column', value, pks...));
```

and the `process_*` function(s) will take care of it.

This mean we should re-work our statement caching to:
1. No longer require an indirect lookup via b-tree
2. Be stored on `table_info` structs
3. Be re-computed after an alter of that specific `CRR`

These statements should probably be lazily computed to reduce extension load time? Since
serverless (ugh) environments may re-load the extension between every request.
@jeromegn
Copy link
Contributor

Will this be a breaking change or is there an easy non-breaking upgrade path? It's fine if not, but I'll need to know since we released 0.1.0 of Corrosion and I want to upgrade to this as soon as possible :)

@tantaman
Copy link
Collaborator Author

tantaman commented Sep 22, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants