Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta unable to resume Sync if stopped #7039

Closed
1 task done
argakiig opened this issue Mar 7, 2024 · 6 comments
Closed
1 task done

Beta unable to resume Sync if stopped #7039

argakiig opened this issue Mar 7, 2024 · 6 comments
Labels
C-bug An unexpected or incorrect behavior

Comments

@argakiig
Copy link
Contributor

argakiig commented Mar 7, 2024

Describe the bug

When starting a sync attempt with the new db specifically the 0.2.0-beta.1 tag everything works fine
Logs are as expeccted

024-03-07T21:45:51.088706Z DEBUG provider::static_file: Commit segment=Transactions path="/root/.local/share/reth/mainnet/static_files/static_file_transactions_6500000_6999999" duration=8.485221ms
2024-03-07T21:45:51.089378Z DEBUG provider::static_file: Commit segment=Receipts path="/root/.local/share/reth/mainnet/static_files/static_file_receipts_0_499999" duration=574.081<C2><B5>s
2024-03-07T21:45:51.089980Z DEBUG provider::static_file: Commit segment=Headers path="/root/.local/share/reth/mainnet/static_files/static_file_headers_19000000_19499999" duration=530.265<C2><B5>s
2024-03-07T21:45:51.091622Z DEBUG storage::db::mdbx: Commit total_duration=1.558627ms commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 2, audit: 0, write: 0, sync: 
96, ending: 0, whole: 99, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cp
u: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-03-07T21:45:51.091700Z  INFO reth_node_core::events::node: Executing stage pipeline_stages=2/12 stage=Bodies checkpoint=6955000 target=19385733
2024-03-07T21:45:51.091715Z DEBUG sync::stages::bodies: Commencing sync stage_progress=6955001 target=19385733 start_tx_id=365799100

but after stopping the node and starting it back up again it is unable to connect to prysm and prysm is unable to connect to it.
the logs change as follows

2024-03-07T23:31:30.384083Z DEBUG storage::db::mdbx: Commit total_duration=1.347<C2><B5>s commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 0, audit: 0, write: 0, sync: 0, ending: 0, whole: 1, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=true
2024-03-07T23:31:30.527650Z DEBUG storage::db::mdbx: Commit total_duration=1.836<C2><B5>s commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 0, audit: 0, write: 0, sync: 0, ending: 0, whole: 1, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=true
2024-03-07T23:31:35.388570Z DEBUG storage::db::mdbx: Commit total_duration=1.967<C2><B5>s commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 0, audit: 0, write: 0, sync: 0, ending: 0, whole: 1, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=true
2024-03-07T23:31:35.530038Z DEBUG storage::db::mdbx: Commit total_duration=2.588<C2><B5>s commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 0, audit: 0, write: 0, sync: 0, ending: 0, whole: 1, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=true

I am able to get the db to start syncing again if i blow the db away

Steps to reproduce

Start reth and your choice of consensus engine
let it begin syncing
stop the node and let it gracefully shutdown,
restart reth observe it stops progressing

Node logs

2024-03-07T21:46:45.255151Z DEBUG storage::db::mdbx: Commit total_duration=1.569981ms commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 1, audit: 0, write: 0, sync: 
97, ending: 0, whole: 99, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cp
u: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=false
2024-03-07T21:46:47.773384Z DEBUG reth::cli: shutting down gracefully
2024-03-07T21:46:47.774411Z  INFO reth::cli: Wrote network peers to file peers_file="/root/.local/share/reth/mainnet/known-peers.json"
2024-03-07T21:46:47.774433Z DEBUG reth_tasks: gracefully shut down
2024-03-07T23:05:15.163255Z  INFO reth::cli: Opening database path="/root/.local/share/reth/mainnet/db"
2024-03-07T23:05:15.172474Z  INFO reth::cli: Configuration loaded path="/root/.local/share/reth/mainnet/reth.toml"
2024-03-07T23:05:15.185811Z  INFO reth::cli: Database opened
2024-03-07T23:05:15.185829Z  INFO reth::cli: Starting metrics endpoint addr=0.0.0.0:9001
2024-03-07T23:05:15.186553Z DEBUG reth::cli: Initializing genesis chain=mainnet genesis=0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
2024-03-07T23:05:15.187690Z DEBUG reth_node_core::init: Genesis already written, skipping.
2024-03-07T23:05:15.187904Z  INFO reth::cli: Pre-merge hard forks (block based):
- Frontier                         @0
- Homestead                        @1150000
- Dao                              @1920000
- Tangerine                        @2463000
- SpuriousDragon                   @2675000
- Byzantium                        @4370000
- Constantinople                   @7280000
- Petersburg                       @7280000
- Istanbul                         @9069000
- MuirGlacier                      @9200000
- Berlin                           @12244000
- London                           @12965000
- ArrowGlacier                     @13773000
- GrayGlacier                      @15050000
Merge hard forks:
- Paris                            @58750000000000000000000 (network is known to be merged)

Post-merge hard forks (timestamp based):
- Shanghai                         @1681338455
- Cancun                           @1710338135

2024-03-07T23:05:15.187916Z DEBUG reth::cli: Spawning stages metrics listener task
2024-03-07T23:05:15.192744Z DEBUG reth::cli: configured blockchain tree
2024-03-07T23:05:15.193091Z DEBUG reth::cli: creating components
2024-03-07T23:05:15.193671Z DEBUG txpool::blob: Removed blob store directory blob_dir="/root/.local/share/reth/mainnet/blobstore"
2024-03-07T23:05:15.193683Z DEBUG txpool::blob: Creating blob store blob_dir="/root/.local/share/reth/mainnet/blobstore"
2024-03-07T23:05:15.245017Z  INFO reth::cli: Transaction pool initialized
2024-03-07T23:05:15.245269Z DEBUG reth::cli: Spawned txpool maintenance task
2024-03-07T23:05:15.245309Z  INFO reth::cli: Connecting to P2P network
2024-03-07T23:05:15.245320Z DEBUG reth::cli: Loading p2p key file network_secret_path="/root/.local/share/reth/mainnet/discovery-secret"
2024-03-07T23:05:15.246505Z  INFO net::peers: Loading saved peers file=/root/.local/share/reth/mainnet/known-peers.json
2024-03-07T23:05:15.252224Z DEBUG discv4: pinging boot node record=NodeRecord { address: 18.138.108.67, tcp_port: 30303, udp_port: 30303, id: 0xd860a01f9722d78051619d1e2351aba3f43f943f6f00718d1b9baa410
1932a1f5011f16bb2b1bb35db20d6fe28fa0bf09636d26a87d31de9ec6203eeedb1f666 }
2024-03-07T23:05:15.252284Z DEBUG discv4: pinging boot node record=NodeRecord { address: 3.209.45.79, tcp_port: 30303, udp_port: 30303, id: 0x22a8232c3abc76a16ae9d6c3b164f98775fe226f0917b0ca871128a74a8
e9630b458460865bab457221f1d448dd9791d24c4e5d88786180ac185df813a68d4de }
2024-03-07T23:05:15.252320Z DEBUG discv4: pinging boot node record=NodeRecord { address: 65.108.70.101, tcp_port: 30303, udp_port: 30303, id: 0x2b252ab6a1d0f971d9722cb839a42cb81db019ba44c08754628ab4a82
3487071b5695317c8ccd085219c3a03af063495b2f1da8d18218da2d6a82981b45e6ffc }
2024-03-07T23:05:15.252348Z DEBUG discv4: pinging boot node record=NodeRecord { address: 157.90.35.166, tcp_port: 30303, udp_port: 30303, id: 0x4aeb4ab6c14b23e2c4cfdce879c04b0748a20d8e9b59e25ded2a08143
e265c6c25936e74cbc8e641e3312ca288673d91f2f93f8e277de3cfa444ecdaaf982052 }
2024-03-07T23:05:15.252876Z DEBUG discv4: pinging boot node record=NodeRecord { address: 18.138.108.67, tcp_port: 30303, udp_port: 30303, id: 0xd860a01f9722d78051619d1e2351aba3f43f943f6f00718d1b9baa410
1932a1f5011f16bb2b1bb35db20d6fe28fa0bf09636d26a87d31de9ec6203eeedb1f666 }
2024-03-07T23:05:15.252908Z DEBUG discv4: pinging boot node record=NodeRecord { address: 3.209.45.79, tcp_port: 30303, udp_port: 30303, id: 0x22a8232c3abc76a16ae9d6c3b164f98775fe226f0917b0ca871128a74a8
e9630b458460865bab457221f1d448dd9791d24c4e5d88786180ac185df813a68d4de }
2024-03-07T23:05:15.252911Z DEBUG discv4: pinging boot node record=NodeRecord { address: 65.108.70.101, tcp_port: 30303, udp_port: 30303, id: 0x2b252ab6a1d0f971d9722cb839a42cb81db019ba44c08754628ab4a82
3487071b5695317c8ccd085219c3a03af063495b2f1da8d18218da2d6a82981b45e6ffc }
2024-03-07T23:05:15.252912Z DEBUG discv4: pinging boot node record=NodeRecord { address: 157.90.35.166, tcp_port: 30303, udp_port: 30303, id: 0x4aeb4ab6c14b23e2c4cfdce879c04b0748a20d8e9b59e25ded2a08143
e265c6c25936e74cbc8e641e3312ca288673d91f2f93f8e277de3cfa444ecdaaf982052 }
2024-03-07T23:05:15.252940Z DEBUG igd_next::aio::tokio: sending broadcast request to: 239.255.255.250:1900 on interface: Ok(0.0.0.0:51379)    
2024-03-07T23:05:15.253069Z DEBUG reth::cli: calling on_component_initialized hook
2024-03-07T23:05:15.253412Z  INFO reth::cli: StaticFileProducer initialized
2024-03-07T23:05:15.254803Z  INFO reth::cli: Pruner initialized prune_config=PruneConfig { block_interval: 5, segments: PruneModes { sender_recovery: None, transaction_lookup: None, receipts: None, acc
ount_history: None, storage_history: None, receipts_log_filter: ReceiptsLogPruneConfig({}) } }
2024-03-07T23:05:15.254875Z  INFO reth::cli: Consensus engine initialized
2024-03-07T23:05:15.254913Z  INFO reth::cli: Engine API handler initialized
2024-03-07T23:05:15.254921Z DEBUG reth::cli: Reading JWT auth secret file user_path="/root/jwt/jwt.hex"
2024-03-07T23:05:15.255187Z DEBUG reth::cli: Using RPC module config http=Some(Selection([Eth, Net, Web3])) ws=None
2024-03-07T23:05:15.266427Z  INFO reth::cli: RPC auth server started url=0.0.0.0:8551
2024-03-07T23:05:15.267047Z  INFO reth::cli: RPC IPC server started url=/tmp/reth.ipc
2024-03-07T23:05:15.267063Z  INFO reth::cli: RPC HTTP server started url=0.0.0.0:8545
2024-03-07T23:05:15.267096Z  INFO reth::cli: Starting consensus engine
2024-03-07T23:05:15.268700Z DEBUG consensus::engine::hooks: Next hook is not ready hook="StaticFile"
2024-03-07T23:05:15.283453Z DEBUG disc::dns: Failed to lookup entry err=invalid child hash in branch:  domain=all.mainnet.ethdisco.net hash="FDXN3SN67NA5DKA4J2GOK7BVQI"
2024-03-07T23:05:17.328108Z DEBUG connection{ESC[3mremote_addrESC[0mESC[2m=ESC[0m172.22.0.2:59228 ESC[3mconn_idESC[0mESC[2m=ESC[0m0}: jsonrpsee_server::server: Accepting new connection 1/500
2024-03-07T23:05:18.257686Z  INFO reth::cli: Status connected_peers=0 freelist=13 latest_block=0
2024-03-07T23:05:21.384556Z DEBUG disc::dns: No dns entry domain=all.mainnet.ethdisco.net hash="6QKNDVNXUKNWQNCURI5BUQ"
2024-03-07T23:05:24.379767Z DEBUG disc::dns: No dns entry domain=all.mainnet.ethdisco.net hash="WEEBRG5GPKFH5V563N7I4P"
2024-03-07T23:05:25.255606Z DEBUG net::nat: Failed to resolve external IP via UPnP: failed to find gateway err=IO error: search timed out
2024-03-07T23:05:25.289620Z DEBUG discv4: Updating external ip external_ip=
2024-03-07T23:05:25.508200Z DEBUG storage::db::mdbx: Commit total_duration=2.961<C2><B5>s commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 0, audit: 0, write: 0, sync: 0, ending: 0, whole: 1, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=true
2024-03-07T23:05:25.647629Z DEBUG storage::db::mdbx: Commit total_duration=2.844<C2><B5>s commit_latency=Some(CommitLatency(MDBX_commit_latency { preparation: 0, gc_wallclock: 0, audit: 0, write: 0, sync: 0, ending: 0, whole: 1, gc_cputime: 0, gc_prof: MDBX_commit_latency__bindgen_ty_1 { wloops: 0, coalescences: 0, wipes: 0, flushes: 0, kicks: 0, work_counter: 0, work_rtime_monotonic: 0, work_xtime_cpu: 0, work_rsteps: 0, work_xpages: 0, work_majflt: 0, self_counter: 0, self_rtime_monotonic: 0, self_xtime_cpu: 0, self_rsteps: 0, self_xpages: 0, self_majflt: 0 } })) is_read_only=true

Platform(s)

Linux (x86)

What version/commit are you on?

reth Version: 0.2.0-beta.1 Commit SHA: aec59b7 Build Timestamp: 2024-03-07T17:10:28.313032547Z Build Features: jemalloc Build Profile: maxperf

What database version are you on?

reth db version
Current database version: 2
Local database version: 2

What type of node are you running?

Archive (default)

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

docker build . -t local-reth-beta --build-arg BUILD_PROFILE=maxperf --build-arg FEATURES=jemalloc,asm-keccak

Code of Conduct

  • I agree to follow the Code of Conduct
@argakiig argakiig added C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled labels Mar 7, 2024
@argakiig
Copy link
Contributor Author

argakiig commented Mar 8, 2024

oh I think I may know why this would be the case, I was using the existing docker-compose stack in ./etc, and it mounts the db as a volume but doesnt include blobstore or static_files

interestingly enough though the following diff when applied causes reth to crash

index 7323163c..3ef4694c 100644
--- a/etc/docker-compose.yml
+++ b/etc/docker-compose.yml
@@ -11,8 +11,12 @@ services:
       - '8545:8545' # rpc
       - '8551:8551' # engine
     volumes:
-      - rethdata_mainnet:/root/.local/share/reth/mainnet/db
-      - rethdata_sepolia:/root/.local/share/reth/sepolia/db
+      - rethdata_mainnet_db:/root/.local/share/reth/mainnet/db
+      - rethdata_mainnet_blobstore:/root/.local/share/reth/mainnet/blobstore
+      - rethdata_mainnet_static_files:/root/.local/share/reth/mainnet/static_files
+      - rethdata_sepolia_db:/root/.local/share/reth/sepolia/db
+      - rethdata_sepolia_blobstore:/root/.local/share/reth/sepolia/blobstore
+      - rethdata_sepolia_static_files:/root/.local/share/reth/sepolia/static_files
       - rethlogs:/root/rethlogs
       - ./jwttoken:/root/jwt:ro
     # For Sepolia, replace `--chain mainnet` with `--chain sepolia`
@@ -64,9 +68,17 @@ services:
              /run.sh"
     
 volumes:
-  rethdata_mainnet:
+  rethdata_mainnet_db:
     driver: local
-  rethdata_sepolia:
+  rethdata_mainnet_blobstore:
+    driver: local
+  rethdata_mainnet_static_files:
+    driver: local
+  rethdata_sepolia_db:
+    driver: local
+  rethdata_sepolia_blobstore:
+    driver: local
+  rethdata_sepolia_static_files:
     driver: local
   rethlogs:
     driver: local

with the following error

2024-03-07 16:32:03 2024-03-08T00:32:03.161578Z  INFO Opening database path="/root/.local/share/reth/mainnet/db"
2024-03-07 16:32:03 2024-03-08T00:32:03.166331Z  INFO Configuration loaded path="/root/.local/share/reth/mainnet/reth.toml"
2024-03-07 16:32:03 2024-03-08T00:32:03.166428Z  INFO Database opened
2024-03-07 16:32:03 2024-03-08T00:32:03.166451Z  INFO Starting metrics endpoint addr=0.0.0.0:9001
2024-03-07 16:32:03 2024-03-08T00:32:03.166602Z  INFO Pre-merge hard forks (block based):
2024-03-07 16:32:03 - Frontier                         @0
2024-03-07 16:32:03 - Homestead                        @1150000
2024-03-07 16:32:03 - Dao                              @1920000
2024-03-07 16:32:03 - Tangerine                        @2463000
2024-03-07 16:32:03 - SpuriousDragon                   @2675000
2024-03-07 16:32:03 - Byzantium                        @4370000
2024-03-07 16:32:03 - Constantinople                   @7280000
2024-03-07 16:32:03 - Petersburg                       @7280000
2024-03-07 16:32:03 - Istanbul                         @9069000
2024-03-07 16:32:03 - MuirGlacier                      @9200000
2024-03-07 16:32:03 - Berlin                           @12244000
2024-03-07 16:32:03 - London                           @12965000
2024-03-07 16:32:03 - ArrowGlacier                     @13773000
2024-03-07 16:32:03 - GrayGlacier                      @15050000
2024-03-07 16:32:03 Merge hard forks:
2024-03-07 16:32:03 - Paris                            @58750000000000000000000 (network is known to be merged)
2024-03-07 16:32:03 
2024-03-07 16:32:03 Post-merge hard forks (timestamp based):
2024-03-07 16:32:03 - Shanghai                         @1681338455
2024-03-07 16:32:03 - Cancun                           @1710338135
2024-03-07 16:32:03 
2024-03-07 16:32:03 2024-03-08T00:32:03.166837Z ERROR shutting down due to error

@argakiig
Copy link
Contributor Author

argakiig commented Mar 8, 2024

    volumes:
      - rethdata_mainnet:/root/.local/share/reth/mainnet
      - rethdata_sepolia:/root/.local/share/reth/sepolia
      - rethlogs:/root/rethlogs
      - ./jwttoken:/root/jwt:ro

also works fine

@gakonst
Copy link
Member

gakonst commented Mar 10, 2024

Thanks! @joshieDo @shekhirin

@shekhirin
Copy link
Collaborator

shekhirin commented Mar 10, 2024

@argakiig can you also share the CL logs? I don't see any forkchoice updates in Reth logs, and we can't start the sync without at least one. So I think there may be an issue with CL not sending updates.

@argakiig
Copy link
Contributor Author

It's currently a day and a half into the latest sync, I don't wanna try and repro currently, using l
Prysm, 5.0.1 and it seems to be working currently

@argakiig
Copy link
Contributor Author

updated to the latest v0.2.0-beta1 tag at 66c9403e and the networking issues seem to have gone away. In checking my logs yes prysm was also having network issues at the time..
It also appears that the issue I was seeing where my tracked peers decrease instead of increase over time has gone away with the above update. I am going to close this out. Thanks for including the directory changes as well :)

@DaniPopes DaniPopes removed the S-needs-triage This issue needs to be labelled label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug An unexpected or incorrect behavior
Projects
Archived in project
Development

No branches or pull requests

4 participants