Add annotation to allow spreading VIPs across keepalived instances #53

tommasopozzetti · 2021-02-26T23:17:47Z

A common use case for keepalived in environments with higher load is to have multiple VIPs for HA configured in such a way that they are spread across the nodes. This configuration allows for maximum load balancing, for example through DNS round robin, while maintaining HA, as a failure of a node will still result in another node getting ownership of the VIP.

This PR introduces the possibility to specify an extra annotation keepalived-operator.redhat-cop.io/spreadvips: "true" on services requesting keepalived-managed IPs, that will instruct keepalived-operator to spread the VIPs of the service across all of the keepalived instances of the KeepalivedGroup. If the annotation is found, keepalived-operator will add extra configs to set a preferred owner for each VIP in the service, using a round-robin algorithm among the keepalived pods. The PR also adds the ability to watch for pod changes, to ensure that any change in the pods backing the daemonset is reflected in the configurations.

This PR aims at being 100% backwards-compatible, by adding this functionality as an opt-in option. Services that do not specify the new annotation will continue to behave as they always have.

raffaelespazzoli · 2021-02-27T14:29:30Z

I don't understand the intent of this PR, would you be able to elaborate with an example or perhaps provide a small demo?
Also, I keep forgetting to do it but I intend to deprecate this operator in favor of metallb. Do you think metallb would address your requirements?

tommasopozzetti · 2021-02-27T17:23:01Z

Hi @raffaelespazzoli, thanks for taking a look at this.
To answer your first question, let me give you an example of why I think this can be useful.
Let's assume I have a KeepalivedGroup targeting three nodes, Node A, Node B and Node C. Let's also assume I have a service backed by multiple pods that I want to expose, for example nginx-ingress-controller.
Let's say I specify the annotation to tie this service to that KeepalivedGroup and I specify three externalIPs:

externalIPs:
- 172.20.20.1
- 172.20.20.2
- 172.20.20.3

The current behavior of keepalived-operator will assign all IPs to the same VRRP instance, which means that they will all be owned by the same keepalived pod, therefore they will all belong to the same node, for example Node A.
This effectively makes it useless to utilize multiple IPs, because all ingress traffic in the cluster towards that service will pass through Node A, even if the backing pods are spread on multiple nodes. Node A, therefore, can easily become a bottleneck if the amount of traffic is high enough.
A better configuration would be for those 3 externalIPs to be owned one per node, so that, if the ingress traffic is balanced across the 3 IPs (for example via DNS round robin), the three nodes will share the load, providing much better load balancing.
The new annotation introduced in the PR specifies that the user is requesting this behavior for the given service and assigns a preferred owner per VIP so that, if the owner is up, it will win over the others and own the VIP. The owners are assigned in a round robin fashion among the keepalived pods in the KeepalivedGroup, thus maximizing this "spread" for maximum load balancing.

Unfortunately, MetalLB either requires using a BGP-based advertisement (which however requires the underlying networking infrastructure to support it, which is not necessarily a common case), or it suffers from the same issue when using a layer-2 based approach, as explained in their docs (they call it single-node bottlenecking). Therefore, I still believe keepalived is a valuable alternative that can provide additional benefits.

tommasopozzetti · 2021-02-27T17:33:52Z

It looks like this feature has been requested in MetalLB as well in metallb/metallb#439 so it might eventually get implemented, but for now it looks like there is no simple way to achieve it.

raffaelespazzoli · 2021-02-28T02:31:45Z

Tommaso, I like what you are proposing, can we just make it the default instead of an opt-in? Also can you make sure it will work for a mix of expernalIPs and loadbalancer IPs also?

…

On Sat, Feb 27, 2021 at 12:34 PM Tommaso Pozzetti ***@***.***> wrote: It looks like this feature has been requested in MetalLB as well in metallb/metallb#439 <metallb/metallb#439> so it might eventually get implemented, but for now it looks like there is no simple way to achieve it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPERXCWF2TTAC6NWCLS5X3TBEUIZANCNFSM4YJIL4NA> .

-- ciao/bye Raffaele

tommasopozzetti · 2021-02-28T19:26:41Z

@raffaelespazzoli the current implementation should already work for externalIPs, loadbalancer IPs or a mixture of the two.
If we make it the default, do you think it makes sense to still have an annotation to allow for opting out of this behavior?
The downside of spreading VIPs is that each VIP now requires its own VRRP instance, which means that the 255 limit could be reached faster. In cases where this spread is not necessary users might prefer having the possibility of opting out.
Let me know your thoughts and I can modify the PR to reflect the intended behavior.
One more note that I forgot to mention is that, for this PR to work, the keepalived image needs to contain the bash binary, because it is used to pass the pod name from an environment variable to the keepalived process. We have it in the image we use, but I have not been able to verify whether it is present in the default registry.redhat.io/openshift4/ose-keepalived-ipfailover image.

raffaelespazzoli · 2021-03-01T12:29:35Z

@tommasopozzetti thanks for the clarifications, so would it be fair to say that I can achieve the same result as this PR, by creating multiple ExternalIP Services in kubernetes?

tommasopozzetti · 2021-03-01T15:59:21Z

@raffaelespazzoli You would achieve the same number of VRRP instances, but not necessarily the same result. If you currently create multiple Services with one externalIP each with the same selector, there is no guarantee that these VIPs will be owned by different nodes. They might end up all on the same node, creating the bottleneck. Also, even if they were to be perfectly spread out, if one node were to go down, the VIP would move to another node and remain there even after the original owner came back up (thus creating a situation where one node is taking 2x traffic and another one 0 traffic).
This PR instead introduces configurations that select a preferred owner per VIP to guarantee that this spread is always true at all times. It is guaranteed that multiple VIPs will be owned by different nodes (unless of course there is a number of VIPs > number of nodes), and if a node goes down the VIP will indeed move to another node for HA, but it will go back to its original owner as soon as the owner is available again.

raffaelespazzoli · 2021-03-02T13:19:01Z

ok, let's keep this as it is. I'll try to do some tests in the next days. In the meantime can you add a paragraph to the docs about this functionality?

tommasopozzetti · 2021-03-02T17:00:18Z

@raffaelespazzoli I added some info about this feature to the docs. Let me know if you think it is clear enough.

A common use case for keepalived in environments with higher load is to have multiple VIPs for HA configured in such a way that they are spread across the nodes. This configuration allows for maximum load balancing, for example through DNS round robin, while maintaining HA, as a failure of a node will still result in another node getting ownership of the VIP. This commit introduces the possibility to specify an extra annotation on services requesting keepalived-managed IPs, that will instruct keepalived-operator to spread the VIPs of the service across all of the keepalived instances of the KeepalivedGroup. If the annotation is found, keepalived-operator will add extra configs to set a preferred owner for each VIP in the service, using a round-robin algorithm among the keepalived pods. The commit also adds the ability to watch for pod changes, to ensure that any change in the pods backing the daemonset is reflected in the configurations. This commit aims at being 100% backwards-compatible, by adding this functionality as an opt-in option. Services that do not specify the new annotation will continue to behave as they always have.

tommasopozzetti · 2021-03-03T04:07:45Z

I rebased this PR over master to include the capabilities from #54 and handle them correctly even with the spreadvips annotation.

David-Igou · 2021-03-03T21:36:46Z

@raffaelespazzoli I have tested these changes and they LGTM

baryluk · 2021-12-09T18:43:02Z

Correct me if I am wrong, but analyzing the code and template, it looks that if this option is enabled, the first IP of every service, will end up being put on first loadbalancer pod / node.

      {{- $owner := index $root.KeepalivedPods (modulus $i (len $root.KeepalivedPods)) }}
      vrrp_instance {{ $namespacedNameForIP }} {
          @{{ $owner.ObjectMeta.Name }} state MASTER

That is not great.

Considering keepalived default preemption back to master, is about 5 minutes, after a rolling restart, one can end up with ALL services first IP (which often is the ONLY ip of the service), all on the same node.

Would be good to use some hash of a service name as an offset, to ensure more uniformity, or do some permutation of the indexes based on the service name (preferably using consistent hashing, to handle change in the $root.KeepalivedPods length well).

tommasopozzetti mentioned this pull request Mar 1, 2021

Handle local externalTrafficPolicy correctly #54

Merged

tommasopozzetti added 2 commits March 2, 2021 22:06

Add docs on the spreadvips annotation

4ff6f76

tommasopozzetti force-pushed the spread-vips branch from 8c16a0b to 4ff6f76 Compare March 3, 2021 04:06

David-Igou merged commit da6d67b into redhat-cop:master Mar 3, 2021

aneagoe mentioned this pull request Dec 7, 2021

spread VIPs across instances #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add annotation to allow spreading VIPs across keepalived instances #53

Add annotation to allow spreading VIPs across keepalived instances #53

tommasopozzetti commented Feb 26, 2021

raffaelespazzoli commented Feb 27, 2021

tommasopozzetti commented Feb 27, 2021

tommasopozzetti commented Feb 27, 2021

raffaelespazzoli commented Feb 28, 2021 via email

tommasopozzetti commented Feb 28, 2021

raffaelespazzoli commented Mar 1, 2021

tommasopozzetti commented Mar 1, 2021

raffaelespazzoli commented Mar 2, 2021

tommasopozzetti commented Mar 2, 2021

tommasopozzetti commented Mar 3, 2021

David-Igou commented Mar 3, 2021

baryluk commented Dec 9, 2021

Add annotation to allow spreading VIPs across keepalived instances #53

Add annotation to allow spreading VIPs across keepalived instances #53

Conversation

tommasopozzetti commented Feb 26, 2021

raffaelespazzoli commented Feb 27, 2021

tommasopozzetti commented Feb 27, 2021

tommasopozzetti commented Feb 27, 2021

raffaelespazzoli commented Feb 28, 2021 via email

tommasopozzetti commented Feb 28, 2021

raffaelespazzoli commented Mar 1, 2021

tommasopozzetti commented Mar 1, 2021

raffaelespazzoli commented Mar 2, 2021

tommasopozzetti commented Mar 2, 2021

tommasopozzetti commented Mar 3, 2021

David-Igou commented Mar 3, 2021

baryluk commented Dec 9, 2021