Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History pruning takes a very long time #8995

Closed
shekhirin opened this issue Jun 20, 2024 · 6 comments
Closed

History pruning takes a very long time #8995

shekhirin opened this issue Jun 20, 2024 · 6 comments
Assignees
Labels
A-pruning Related to pruning or full node C-bug An unexpected or incorrect behavior

Comments

@shekhirin
Copy link
Collaborator

shekhirin commented Jun 20, 2024

Problem

Sender Recovery pruning goes before Account History, and we don't start pruning the next segment until the previous one is completed. Once Sender Recovery is finished (the chart on the right, you can see that it stops taking a significant time and completes instantly), pruner starts to call Account History pruning (chart on the right, spike in time)

image

The problem is that while we prune Sender Recovery (but do not prune Account History), Account History tables accumulate data and it starts to take more time to prune them after Sender Recovery is done.

Solution

  1. Fix history pruning, so that we don't need to walk all indices every pruner run
  2. Re-order segments, so that Account and Storage History segments go before Sender Recovery
  3. segment ring for fair + performant pruning on pruning interrupt #7343
  4. Since freelist isn't an issue anymore, on node startup prune what's left unpruned in the database (opt-out via a flag)
@shekhirin shekhirin added C-bug An unexpected or incorrect behavior A-pruning Related to pruning or full node labels Jun 20, 2024
@shekhirin shekhirin self-assigned this Jun 20, 2024
@deromik
Copy link

deromik commented Jun 23, 2024

faced with the same on on 06/21
image

and exactly at that time, lost 4 nodes due to root state mismatch on rc2 and had to resync them at 06/21
and lost 2 more nodes on rc2 with the same reason today, resyncing
suppose the pruning time is the root cause for this

the logs are full of 'Hook is in progress, delaying forkchoice update. This may affect the performance of your node as a validator.' while the pruner is in progress

image

nodes are 980pro/990pro, ryzen 7900x/7950x

@lodotek
Copy link

lodotek commented Jul 2, 2024

Is there any workaround for now? I brought up a new node and it seems that it is taking significantly longer to get fully syned and ready to perform, than every other EL client I've used so far. Should I just be patient and let it finish, or is there some setting I should change or something?
image

@Rjected
Copy link
Member

Rjected commented Jul 2, 2024

Is there any workaround for now? I brought up a new node and it seems that it is taking significantly longer to get fully syned and ready to perform, than every other EL client I've used so far. Should I just be patient and let it finish, or is there some setting I should change or something?

Is this referring to initial sync time, or time to prune?

@lodotek
Copy link

lodotek commented Jul 2, 2024

Is there any workaround for now? I brought up a new node and it seems that it is taking significantly longer to get fully syned and ready to perform, than every other EL client I've used so far. Should I just be patient and let it finish, or is there some setting I should change or something?

Is this referring to initial sync time, or time to prune?

Sorry - initial sync time. I mean it took a very long time to sync (seemingly longer than geth and NM). And once it synced, it seems to have automatically started pruning, which seems to be going extremely slow too ¯_(ツ)_/¯
image
image

@Rjected
Copy link
Member

Rjected commented Jul 2, 2024

Sorry - initial sync time. I mean it took a very long time to sync (seemingly longer than geth and NM)

This is because reth does not have a "snapshot sync" method, and executes the entire history by default, whereas geth and NM do snap sync by default iirc. The pruning behavior is more relevant here though

@shekhirin
Copy link
Collaborator Author

Fixed by #9312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-pruning Related to pruning or full node C-bug An unexpected or incorrect behavior
Projects
Archived in project
Development

No branches or pull requests

4 participants