Handle local externalTrafficPolicy correctly #54

tommasopozzetti · 2021-03-01T18:37:06Z

LoadBalancer services can specify an extra property called externalTrafficPolicy which defines how kube-proxy sets up the load balancing on the nodes. If such property is set to Local, kube-proxy will only forward traffic for externalIPs or LoadBalancer IPs to pods backing the service that are running on the node on which the packet arrived. If no pods are running on that node, that traffic gets dropped.

This PR introduces a vrrp_script for services that set this property to Local which will healthcheck the Kubernetes provided health endpoint for the service. This endpoint returns 200 if at least a pod backing the service is running on the node, or 503 if no pods are running on the node. By adding this healthcheck script, keepalived will deem the node unhealthy if the healthcheck fails, thus ensuring the VIP moves to a different owner that does have at least a pod backing the service running on it. This way no traffic will be lost and the Local externalTrafficPolicy will work as intended.

This PR requires the keepalived image to contain the /usr/bin/curl binary which is used to perform the healthcheck.

This PR should address and fix #13

LoadBalancer services can specify an extra property called `externalTrafficPolicy` which defines how kube-proxy sets up the load balancing on the nodes. If such property is set to `Local`, kube-proxy will only forward traffic for externalIPs or LoadBalancer IPs to pods backing the service that are running on the node on which the packet arrived. If no pods are running on that node, that traffic gets dropped. This commit introduces a vrrp_script for services that set this property to `Local` which will healthcheck the Kubernetes provided health endpoint for the service. This endpoint returns 200 if at least a pod backing the service is running on the node, or 503 if no pods are running on the node. By adding this healthcheck script, keepalived will deem the node unhealthy if the healthcheck fails, thus ensuring the VIP moves to a different owner that does have at least a pod backing the service running on it. This way no traffic will be lost and the `Local` `externalTrafficPolicy` will work as intended.

tommasopozzetti · 2021-03-01T18:38:28Z

This PR modifies a file which is also slightly modified by #53. If both are of interest, after merging the first one the second should be adjusted to reflect the changes. It should be a trivial change but I'm just noting it to avoid forgetting.

raffaelespazzoli · 2021-03-02T13:25:19Z

I have two concerns about this PR:

it's unclear what we expect to happen when the target pods are not running on the keepalived group nodes.
it seems like we are making an assumption that all pods will expose the /health endpoint. This will not be true. The healthcheck should just check for tcp connectivity.

tommasopozzetti · 2021-03-02T16:43:33Z

@raffaelespazzoli, let me try to address the two concerns:

If no target pod is running on any of the keepalived group nodes, then the VIP will not be owned by any of them and any traffic toward that VIP will not get to the backend pod. Notice, however, that this is indeed the expected behavior if the externalTrafficPolicy is set to Local. It is, in fact, responsibility of the user when setting such option to ensure that nodes that are receiving the traffic do have backing pods running on them, otherwise local traffic is simply impossible. Also notice that this is the same behavior that is shown even without this PR. If externalTrafficPolicy is set to Local, kube-proxy only sets up load balancing towards pods on the node itself, so if no pods are running on the node, traffic is lost. The issue right now is that without this PR, even if a backing pod is running on one of the keepalived group nodes, but the VIP is owned by another node, the traffic will still be lost. This PR aims at making keepalived aware of the situation to move the VIP to a node that does have backing pods (if at least one exists).
The healthCheckNodePort field of the service is a port that Kubernetes sets up only for services with Local externalTrafficPolicy and does not directly expose the backing pods. It exposes a healthcheck service provided by kube-proxy which always provides a response to requests to the /health endpoint, whether a backing pod is running on that specific node or not. The HTTP response code will be 200 if at least one backing pod is running on the given node or 503 if no backing pod is running on the node. This is an example of such response:
```
 $ curl "http://localhost:<healthCheckNodePort>/health"
 {
     "service": {
         "namespace": "<namespace>",
         "name": "<service-name>"
     },
     "localEndpoints": 1
 }
```

raffaelespazzoli · 2021-03-02T18:07:12Z

ok, perfect. Then I think we should incorporate this PR. Let's merge this one first since it's simple. curl should be onboard the keepalived image.

raffaelespazzoli merged commit f46eca7 into redhat-cop:master Mar 2, 2021

tommasopozzetti mentioned this pull request Mar 3, 2021

Add annotation to allow spreading VIPs across keepalived instances #53

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle local externalTrafficPolicy correctly #54

Handle local externalTrafficPolicy correctly #54

tommasopozzetti commented Mar 1, 2021

tommasopozzetti commented Mar 1, 2021

raffaelespazzoli commented Mar 2, 2021

tommasopozzetti commented Mar 2, 2021

raffaelespazzoli commented Mar 2, 2021

Handle local externalTrafficPolicy correctly #54

Handle local externalTrafficPolicy correctly #54

Conversation

tommasopozzetti commented Mar 1, 2021

tommasopozzetti commented Mar 1, 2021

raffaelespazzoli commented Mar 2, 2021

tommasopozzetti commented Mar 2, 2021

raffaelespazzoli commented Mar 2, 2021