Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new RPC endpoint: health check #3729

Closed
wants to merge 12 commits into from

Conversation

pavitthrap
Copy link
Contributor

Description

  • Implements v2/health endpoint + tests. A node is considered healthy if it is caught up with its bootstrap peers.
  • Makes a change to get_random_neighbors: it now checks that the obtained peers satisfy (peer_version & 0x000000ff) <= network_epoch (as opposed to (peer_version & 0x000000ff) >= network_epoch).

Applicable issues

Feedback needed

  • In get_valid_initial_neighbors I am passing in the peer_version. Alternatively I could pass in the network_epoch.
  • In handle_get_health in rpc.rs, I need feedback on how I computed last_processed_burn_height and burnchain_height. The former should be the last processed burn height (so less than the actual burn chain tip if ibd=true. The latter should be the actual current height of the burnchain. I want to mimic how ibd is calculated in pox_sync_wait.

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated
  • Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)

@pavitthrap pavitthrap mentioned this pull request May 26, 2023
4 tasks
@pavitthrap pavitthrap changed the title Feat/health check v2 Implement new RPC endpoint: health check May 26, 2023
@codecov
Copy link

codecov bot commented May 26, 2023

Codecov Report

Attention: 1041 lines in your changes are missing coverage. Please review.

Comparison is base (09ec5d3) 0.18% compared to head (4338707) 0.18%.
Report is 617 commits behind head on develop.

Files Patch % Lines
src/net/rpc.rs 0.00% 373 Missing ⚠️
src/net/codec.rs 0.00% 333 Missing ⚠️
src/net/db.rs 0.00% 123 Missing ⚠️
src/net/http.rs 0.00% 104 Missing ⚠️
src/net/mod.rs 0.00% 50 Missing ⚠️
src/net/inv.rs 0.00% 25 Missing ⚠️
src/chainstate/burn/db/sortdb.rs 0.00% 17 Missing ⚠️
stacks-common/src/codec/mod.rs 0.00% 8 Missing ⚠️
src/net/p2p.rs 0.00% 2 Missing ⚠️
testnet/stacks-node/src/neon_node.rs 0.00% 2 Missing ⚠️
... and 3 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3729      +/-   ##
===========================================
- Coverage     0.18%    0.18%   -0.01%     
===========================================
  Files          306      306              
  Lines       278640   281246    +2606     
===========================================
  Hits           512      512              
- Misses      278128   280734    +2606     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pavitthrap pavitthrap requested review from kantai and obycode May 30, 2023 15:50
Copy link
Contributor

@obycode obycode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but I have a few comments/questions.

docs/rpc/openapi.yaml Outdated Show resolved Hide resolved
src/net/db.rs Outdated Show resolved Hide resolved
src/net/db.rs Outdated Show resolved Hide resolved
src/net/mod.rs Outdated Show resolved Hide resolved
src/net/http.rs Outdated Show resolved Hide resolved
@pavitthrap pavitthrap requested a review from obycode June 6, 2023 15:28
Copy link
Contributor

@obycode obycode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/net/rpc.rs Outdated Show resolved Hide resolved
Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that the two error paths for this endpoint are using unallocated HTTP status codes in addition to an application-level error message body. I think perhaps we should consider using a well-defined HTTP status code and perhaps make the message body more detailed (e.g. inform the caller to check /v2/neighbors to diagnose why the node can't report health data).

@pavitthrap
Copy link
Contributor Author

I notice that the two error paths for this endpoint are using unallocated HTTP status codes in addition to an application-level error message body. I think perhaps we should consider using a well-defined HTTP status code and perhaps make the message body more detailed (e.g. inform the caller to check /v2/neighbors to diagnose why the node can't report health data).

I decided to go ahead with the unallocated HTTP status codes based on a discussion on the old PR for this same issue. @wileyj do you have any opinions on how to proceed here?

@wileyj
Copy link
Contributor

wileyj commented Jul 13, 2023

I notice that the two error paths for this endpoint are using unallocated HTTP status codes in addition to an application-level error message body. I think perhaps we should consider using a well-defined HTTP status code and perhaps make the message body more detailed (e.g. inform the caller to check /v2/neighbors to diagnose why the node can't report health data).

I decided to go ahead with the unallocated HTTP status codes based on a discussion on the old PR for this same issue. @wileyj do you have any opinions on how to proceed here?

i see what you're doing - i tend to agree with @jcnelson here in that we shouldn't define unused http status codes (even though the endpoint errors don't specifically line up to a defined status).

I think a generic 500 error for a non-successful request would be more than sufficient here - the request either returns some data with a 200 that can be evaluated, or it has failed and throws a 500

@pavitthrap pavitthrap requested a review from jcnelson July 17, 2023 21:27
src/net/db.rs Outdated Show resolved Hide resolved
@@ -2091,6 +2102,27 @@ pub trait Requestable: std::fmt::Display {
fn make_request_type(&self, peer_host: PeerHost) -> HttpRequestType;
}

// TODO: DRY up from PoxSyncWatchdog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine if you want to address this in this PR :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that might not be straightforward given the structure of the codebase. Let me know if it is straightforward though.

src/net/rpc.rs Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only minor things need to be done before this can be merged. Thanks for getting this over the finish line, @pavitthrap!

@stacks-network stacks-network deleted a comment from OSTO79 Sep 14, 2023
@CLAassistant
Copy link

CLAassistant commented Nov 16, 2023

CLA assistant check
All committers have signed the CLA.

@kantai
Copy link
Member

kantai commented Nov 16, 2023

Closing this PR for now. It's stale, and would require some effort to revive. If someone wants to take it on, they can always revive it later.

@kantai kantai closed this Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants