Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: Add ETL to HashingStages & HistoryStages #6909

Closed
2 tasks done
joshieDo opened this issue Mar 1, 2024 · 1 comment
Closed
2 tasks done

Tracking: Add ETL to HashingStages & HistoryStages #6909

joshieDo opened this issue Mar 1, 2024 · 1 comment
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-enhancement New feature or request M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity

Comments

@joshieDo
Copy link
Collaborator

joshieDo commented Mar 1, 2024

Feature

We should use the ETL Collector on these stages as to decrease both first sync duration and initial write-amplification. These stages use hashes as keys which lead to performance/storage degradation when using tx.insert(current) vs tx.append (with etl). More here.

The flow is very straightforward.

  • Insert all data into a Collector.
    • commit_thresholdnow becomes the maximum chunk size to hold in memory. More here.. This also means that we commit all data in one go.
  • Iterate collector and:
    • if first sync: use tx.append
    • else: use tx.insert

Example usage on:
TransactionLookupStage (took it from >5h to 20-30min and 157 GiB to 107GiB)
HeaderStage

Stages:

Additional context

No response

@joshieDo joshieDo added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) labels Mar 1, 2024
@emhane emhane assigned emhane and unassigned emhane Mar 3, 2024
Copy link
Contributor

This issue is stale because it has been open for 21 days with no activity.

@github-actions github-actions bot added the S-stale This issue/PR is stale and will close with no further activity label Mar 29, 2024
@emhane emhane added M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity and removed S-stale This issue/PR is stale and will close with no further activity labels Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-enhancement New feature or request M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity
Projects
Archived in project
Development

No branches or pull requests

2 participants