Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chain gap/pause analysis from onchain data to CLI #14061

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

igor-aptos
Copy link
Contributor

@igor-aptos igor-aptos commented Jul 19, 2024

Description

analyze-validator-performance --analyze-mode network-health-over-time cli command gives health statistics of the network over time. Adding gap analysis to it as well, as an useful metric if there are any issues or not

Type of Change

  • New feature

Which Components or Systems Does This Change Impact?

  • Aptos CLI/SDK

How Has This Been Tested?

cargo run -p aptos -- node analyze-validator-performance --analyze-mode network-health-over-time --url https://fullnode.mainnet.aptoslabs.com/ --start-epoch=-12

returns

Max non-epoch-change gaps: 2 rounds at version 1039252739 (avg 0.00), 3.44s no progress at version 1039252739 (avg 0.22s).
Max epoch-change gaps: 0 rounds at version 0 (avg 0.00), 3.92s no progress at version 1038729529 (avg 2.76s).

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Jul 19, 2024

⏱️ 16m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-move-tests 6m 🟩
rust-cargo-deny 3m 🟩
general-lints 3m 🟩
check-dynamic-deps 1m 🟩🟩
rust-move-tests 55s
semgrep/ci 40s 🟩🟩
file_change_determinator 23s 🟩🟩
file_change_determinator 21s 🟩🟩
permission-check 11s 🟩🟩
permission-check 8s 🟩🟩
permission-check 5s 🟩🟩
permission-check 4s 🟩🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

|| gap_info.non_epoch_time_gap.max_gap > chain_progress_threshold.max_non_epoch_no_progress_secs {
bail!("Failed non-epoch-change chain progress check. {}", &gap_text);
}
println!("Passed non-epoch-change progress check. {}", gap_text);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason println is used here and info above?

@igor-aptos igor-aptos force-pushed the igor/cli_gap_analysis branch 2 times, most recently from 5ce3999 to 2a15732 Compare August 13, 2024 05:43

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 5924a44bc3e2a552284cf30782d23303308afc39

two traffics test: inner traffic : committed: 12356.52 txn/s, latency: 3221.32 ms, (p50: 3000 ms, p90: 3600 ms, p99: 5100 ms), latency samples: 4698220
two traffics test : committed: 100.09 txn/s, latency: 2794.70 ms, (p50: 2700 ms, p90: 3400 ms, p99: 5600 ms), latency samples: 1740
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.249, avg: 0.227", "QsPosToProposal: max: 0.369, avg: 0.296", "ConsensusProposalToOrdered: max: 0.331, avg: 0.317", "ConsensusOrderedToCommit: max: 0.735, avg: 0.688", "ConsensusProposalToCommit: max: 1.043, avg: 1.005"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.74s no progress at version 2598542 (avg 0.22s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 7.68s no progress at version 2598540 (avg 7.68s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite compat success on 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 5924a44bc3e2a552284cf30782d23303308afc39

Compatibility test results for 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 5924a44bc3e2a552284cf30782d23303308afc39 (PR)
1. Check liveness of validators at old version: 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5
compatibility::simple-validator-upgrade::liveness-check : committed: 7408.65 txn/s, latency: 3953.01 ms, (p50: 3200 ms, p90: 3900 ms, p99: 25300 ms), latency samples: 300420
2. Upgrading first Validator to new version: 5924a44bc3e2a552284cf30782d23303308afc39
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6738.83 txn/s, latency: 4072.34 ms, (p50: 4300 ms, p90: 5700 ms, p99: 5900 ms), latency samples: 126680
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7204.09 txn/s, latency: 4412.25 ms, (p50: 4500 ms, p90: 6500 ms, p99: 6700 ms), latency samples: 240380
3. Upgrading rest of first batch to new version: 5924a44bc3e2a552284cf30782d23303308afc39
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6131.73 txn/s, latency: 4542.97 ms, (p50: 5000 ms, p90: 5900 ms, p99: 6300 ms), latency samples: 110360
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7423.14 txn/s, latency: 4316.27 ms, (p50: 4300 ms, p90: 6600 ms, p99: 6900 ms), latency samples: 241960
4. upgrading second batch to new version: 5924a44bc3e2a552284cf30782d23303308afc39
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 8916.04 txn/s, latency: 3076.22 ms, (p50: 2400 ms, p90: 5200 ms, p99: 5900 ms), latency samples: 165540
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 8801.73 txn/s, latency: 3615.80 ms, (p50: 3100 ms, p90: 4500 ms, p99: 10400 ms), latency samples: 346180
5. check swarm health
Compatibility test for 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 5924a44bc3e2a552284cf30782d23303308afc39 passed
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 5924a44bc3e2a552284cf30782d23303308afc39

Compatibility test results for 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 5924a44bc3e2a552284cf30782d23303308afc39 (PR)
Upgrade the nodes to version: 5924a44bc3e2a552284cf30782d23303308afc39
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1170.64 txn/s, submitted: 1173.62 txn/s, failed submission: 2.98 txn/s, expired: 2.98 txn/s, latency: 2614.72 ms, (p50: 2100 ms, p90: 4800 ms, p99: 9300 ms), latency samples: 102100
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1132.21 txn/s, submitted: 1135.42 txn/s, failed submission: 3.21 txn/s, expired: 3.21 txn/s, latency: 2736.03 ms, (p50: 2100 ms, p90: 4800 ms, p99: 9000 ms), latency samples: 98820
5. check swarm health
Compatibility test for 1c2ee7082d6eff8c811ee25d6f5a7d00860a75d5 ==> 5924a44bc3e2a552284cf30782d23303308afc39 passed
Upgrade the remaining nodes to version: 5924a44bc3e2a552284cf30782d23303308afc39
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1110.84 txn/s, submitted: 1113.47 txn/s, failed submission: 2.63 txn/s, expired: 2.63 txn/s, latency: 2688.38 ms, (p50: 2100 ms, p90: 4600 ms, p99: 9100 ms), latency samples: 101400
Test Ok

@igor-aptos igor-aptos merged commit 2064a1c into main Aug 13, 2024
73 of 104 checks passed
@igor-aptos igor-aptos deleted the igor/cli_gap_analysis branch August 13, 2024 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants