Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate AvailabilitySet based clusters #396

Open
dkistner opened this issue Nov 9, 2021 · 3 comments
Open

Deprecate AvailabilitySet based clusters #396

dkistner opened this issue Nov 9, 2021 · 3 comments
Labels
area/robustness Robustness, reliability, resilience related kind/enhancement Enhancement, improvement, extension kind/roadmap Roadmap BLI kind/technical-debt Something that is only solved on the surface, but requires more (re)work to be done properly lifecycle/rotten Nobody worked on this for 12 months (final aging stage) platform/azure Microsoft Azure platform/infrastructure priority/2 Priority (lower number equals higher priority) status/blocked Issue is blocked (e.g. because of dependencies)

Comments

@dkistner
Copy link
Member

dkistner commented Nov 9, 2021

How to categorize this issue?
/area robustness
/kind technical-debt
/priority 2
/platform azure

What would you like to be added:

We are currently using AvailabilitySets to ensure that machines get distributed across compute units for non zonal deployments (primarily in regions which does not consists of multiple zones).

Due to legacy reasons we use just one single AvailabilitySets for all machines in the cluster (even with multiple worker pools). This approach come with several drawbacks (basic load balancer, no different hardware skus etc.).

Therefore we started already a while ago to support also Azure cluster based on VirtualMachineScaleSets with flexible orchestration (VMSS flex/VMO). So far this was just useable as an alpha feature (activated via annotation alpha.azure.provider.extensions.gardener.cloud/vmo=true on Shoot resource) as the feature was not general available on Azure.

As the VirtualMachineScaleSets with flexible orchestration feature now turned GA on Azure we should start and make it the default for non zonal deployments. Ref

This is an umbrella issue to track what need to be done to deprecate (probably first forbid to create new?) AvailabilitySet based clusters and to install VMSS flex as new default deployment model for non-zonal clusters. Goal should be to get rid of the AvailabilitySet deployment model entirely at a certain point in time.

Why is this needed:
More robust/flexible machine distribution support in non zonal regions.

cc @kon-angelo, @MSSedusch, @HappyTobi

@dkistner dkistner added the kind/enhancement Enhancement, improvement, extension label Nov 9, 2021
@gardener-robot gardener-robot added area/robustness Robustness, reliability, resilience related kind/technical-debt Something that is only solved on the surface, but requires more (re)work to be done properly platform/azure Microsoft Azure platform/infrastructure priority/2 Priority (lower number equals higher priority) labels Nov 9, 2021
@MSSedusch
Copy link

Limitations are documented here:

https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes#a-comparison-of-flexible-uniform-and-availability-sets

for example:

D series, E series, F series, A series, B series, Intel, AMD; Specialty SKUs (G, H, L, M, N) are not supported

@timebertt
Copy link
Member

what need to be done to deprecate (probably first forbid to create new?) AvailabilitySet based clusters

For deprecating AVS based clusters, it would be nice to make use of the warning headers in the validation webhook for azure shoots (see https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#response).
I.e. return a warning for users that are creating AVS based clusters.

@dkistner
Copy link
Member Author

dkistner commented Dec 3, 2021

Status update:
At the moment we cannot remove availability set based deployments entirely as the replacement vmss flex (vmo) is currently lacking support for important machine type series (see here).
We need to wait with the deprecation until this gap is closed on the Azure side.

We might want to consider to make vmss flex (vmo) based clusters the default deployment model for non zonal clusters.

/status blocked

@gardener-robot gardener-robot added the status/blocked Issue is blocked (e.g. because of dependencies) label Dec 3, 2021
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jun 2, 2022
@dkistner dkistner removed the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jun 29, 2022
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Dec 26, 2022
@gardener-robot gardener-robot added kind/roadmap Roadmap BLI and removed roadmap/cloud labels Mar 23, 2023
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness Robustness, reliability, resilience related kind/enhancement Enhancement, improvement, extension kind/roadmap Roadmap BLI kind/technical-debt Something that is only solved on the surface, but requires more (re)work to be done properly lifecycle/rotten Nobody worked on this for 12 months (final aging stage) platform/azure Microsoft Azure platform/infrastructure priority/2 Priority (lower number equals higher priority) status/blocked Issue is blocked (e.g. because of dependencies)
Projects
None yet
Development

No branches or pull requests

4 participants