Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix and document aggregate cluster behavior #13134

Open
markdroth opened this issue Sep 16, 2020 · 9 comments
Open

fix and document aggregate cluster behavior #13134

markdroth opened this issue Sep 16, 2020 · 9 comments

Comments

@markdroth
Copy link
Contributor

The aggregate cluster documentation describes how load balancing works with an aggregate cluster, but it doesn't describe how any of the other cluster behavior works.

Here are a few off-the-cuff examples (not a comprehensive list):

  • What happens for circuit breakers? Does the circuit breaking config in the aggregate cluster get used, or does it somehow delegate to the circuit breaking configs in the clusters being delegated to?
  • How does load reporting work? Does the load get reported as belonging to the underlying clusters or the aggregate cluster?
  • How does outlier detection work?

@yxue, can you please update the documentation to explain how these sorts of things are supposed to work? Ideally, the doc should cover the behavior for every feature configurable via the Cluster xDS resource.

Thanks!

@markdroth markdroth added the triage Issue requires triage label Sep 16, 2020
@mattklein123 mattklein123 added area/aggregate_cluster area/docs help wanted Needs help! and removed triage Issue requires triage labels Sep 16, 2020
@yxue
Copy link
Contributor

yxue commented Sep 16, 2020

Sure, I will update the document. Please assign the issue to me. Thanks!

@markdroth
Copy link
Contributor Author

Also, how are EDS drops supposed to work in an aggregate cluster? They are supposed to be top-level parameters, but what happens if an aggregate cluster points to an EDS cluster? Or worse, what happens if the aggregate cluster points to multiple EDS clusters, each with their own drop config?

Note that Envoy does not currently implement EDS drop support, but gRPC does, so we need to understand how this is supposed to work.

@markdroth
Copy link
Contributor Author

Similar question for the overprovisioning factor.

@markdroth
Copy link
Contributor Author

gRPC is about to implement support for aggregate clusters, and I want to make sure we agree on the expected semantics, so that we don't want up making a decision that is incompatible with what Envoy does or may do in the future.

I've put together the following doc proposing what I think the semantics should be:

https://docs.google.com/document/d/17fc0rAktQpc5oMkerOLHSTnXwX1xWAqW7vZ6_LL47NY/edit#

Already discussed with @yxue, @htuch, @ejona86, and @dfawley.

I'd appreciate feedback from @mattklein123, @snowp, as well as any of the other @envoyproxy/api-shepherds that may be interested.

@markdroth
Copy link
Contributor Author

https://docs.google.com/document/d/17fc0rAktQpc5oMkerOLHSTnXwX1xWAqW7vZ6_LL47NY/edit# has now been approved as specifying the expected behavior of aggregate clusters.

This issue should now cover the following:

  • Updating the aggregate cluster tests to cover every behavior described in that doc.
  • Fixing any behavior that the tests show does not currently match what is specified in the doc.
  • Updating the aggregate cluster documentation to specify the behavior in all of these cases.

@markdroth markdroth changed the title improve documentation of aggregate cluster behavior fix and document aggregate cluster behavior Oct 14, 2020
@markdroth
Copy link
Contributor Author

See discussion in #17412 about the max_retries circuit breaker. In general, we do want circuit breakers to come from the underlying clusters instead of from the aggregate cluster, as per our previous discussions, but the max_retries circuit breaker will need to come from the aggregate cluster instead. I've updated the design doc mentioned above to reflect this.

@markdroth
Copy link
Contributor Author

@yanavlasov has just pointed out to me that we got this slightly wrong.

Our previous discussions in this issue assumed that the core idea of an aggregate cluster was that it concatenated the priorities from the underlying clusters into a single list, so that it could use a single LB policy defined at the aggregate cluster layer to choose a priority from that combined list. This made sense as a way to use the normal logic for choosing a priority, since in Envoy priorities are normally handled by the LB policy. Therefore, we assumed that aggregate clusters always used the LB policy defined in the aggregate cluster rather than the ones defined in the underlying clusters.

However, Yan pointed out to me that aggregate clusters don't actually define the LB policy in the aggregate cluster; instead, the aggregate cluster uses a special cluster provided LB policy that first chooses the underlying cluster and then delegates to the LB policy of the underlying cluster. This is actually the documented behavior, and it looks like the Envoy implementation here is indeed doing a two-level pick.

I think this means that we really do want basically all of the configuration to come from the underlying cluster, with the sole exception of the max_retries circuit breaker, as per discussion in #17412 -- and it seems fine to say that that particular circuit breaker simply won't work for aggregate clusters.

Note that the current Envoy implementation has the aggregate cluster pick the underlying cluster only when the LB pick is requested. However, since we also want features that are unrelated to the LB pick (e.g., circuit breaking, outlier detection, and EDS drops (see #31345)) to be configured at the underlying cluster level, Envoy will probably need to decouple this from the LB pick somehow. It may need a layer of indirection whereby it first asks the cluster for its configuration and then uses that configuration for all of the above features plus the LB pick.

@markdroth
Copy link
Contributor Author

A couple of other things to note from a conversation with @yanavlasov:

  • Apparently, Envoy currently does not support an aggregate cluster pointing to another aggregate cluster. gRPC does support this, so we should probably add it to Envoy too (since that's easier than removing support in gRPC). Note that when we do this, we need to be careful about config validation here; in gRPC, we impose a recursion depth limit of 16, and we deal with diamond dependencies in the graph by simply ignoring a node that we've already previously seen.
  • Envoy expects the aggregate cluster to be configured with a CLUSTER_PROVIDED LB policy. It's not clear what its behavior would be if an aggregate cluster was configured with a different LB policy, but we should probably define this. Note that we're currently needing to define this in gRPC, because we incorrectly implemented aggregate clusters to use the LB policy in the aggregate cluster instead of the policy in the underlying cluster, which means that there are probably existing control planes that set the LB policy in the aggregate cluster to something other than CLUSTER_PROVIDED, and we don't want to break them when we fix our implementation. We're probably going to change gRPC to simply ignore the LB policy config in the aggregate cluster -- i.e., it will basically always use the cluster-provded implementation. So we could decide to do the same thing in Envoy to be consistent.

@markdroth
Copy link
Contributor Author

FYI, I've written a design for fixing the aggregate cluster behavior in gRPC in grpc/proposal#405.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants