Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsub: fix a subtle race in catch up logic for barely active subs #247

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

pborzenkov
Copy link
Contributor

The order of events processing when Corrosion is handling a change is:

  • determine changes
  • send changes to the channel
  • commit changes to sub DB

When Corrosion client is catching up it subscribes to changes sent to
the channel and queries the full sub DB. It's possibly that the query
is run before the changes are committed, but after they've been sent
to the channel, so what happens is:

  • A change with ID N is processed by Corrosion and sent to the channel
  • catch up logic starts and subscribes to changes, but doesn't see
    this change
  • sub DB is read and sent to the client, last committed change ID is
    N - 1, so EndOfQuery{change_id: N - 1} is sent to the client
  • corrosion commits the last update so the last change in the DB becomes
    N
  • catch up handler has min_change_id = N - 1 + 1, so when it compares
    last sent change ID (N) with min_change_id (N) it ignores it becase
    the comparison is strictly greater.

min_change_id is what we expect to receive, so the fix is trivial -
comparison should be greater or equal.

The order of events processing when Corrosion is handling a change is:
  - determine changes
  - send changes to the channel
  - commit changes to sub DB

When Corrosion client is catching up it subscribes to changes sent to
the channel and queries the full sub DB. It's possibly that the query
is run *before* the changes are committed, but after they've been sent
to the channel, so what happens is:

  * A change with ID N is processed by Corrosion and sent to the channel
  * catch up logic starts and subscribes to changes, but doesn't see
    this change
  * sub DB is read and sent to the client, last committed change ID is
    N - 1, so EndOfQuery{change_id: N - 1} is sent to the client
  * corrosion commits the last update so the last change in the DB becomes
    N
  * catch up handler has min_change_id = N - 1 + 1, so when it compares
    last sent change ID (N) with min_change_id (N) it ignores it becase
    the comparison is strictly greater.

min_change_id is what we expect to receive, so the fix is trivial -
comparison should be greater or equal.
@jeromegn jeromegn merged commit f93048a into main Jul 30, 2024
4 checks passed
@jeromegn jeromegn deleted the fix-catchup-v2 branch July 30, 2024 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants