Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade across multiple major versions #4064

Open
hmi12 opened this issue Jun 6, 2024 · 8 comments
Open

Upgrade across multiple major versions #4064

hmi12 opened this issue Jun 6, 2024 · 8 comments
Labels
question Further information is requested waiting-on-user-response Waiting on more information from the original user before progressing.

Comments

@hmi12
Copy link

hmi12 commented Jun 6, 2024

Describe the current behavior
Currently, we're runing ASO v2.0.0-beta.2
Is that possible to directly upgrade to the latest version?

@matthchr
Copy link
Member

matthchr commented Jun 6, 2024

You cannot go directly from beta.2 to the latest GA version, as there were a number of resource management changes between those two versions.

You need to pay close attention to the following changes:

  • breaking changes in -beta.4
    • Manual migration of certain resources may be required. Check if you're using those resources/properties (documented in link above).
  • breaking changes in v2.0.0
    • Alpha versions of CRDs removed, need to run asoctl to clean up alpha resources if you've ever installed them. If you never installed alpha version of CRDs you're fine
    • Must upgrade from v2.0.0-beta.5 due to Helm chart changes. Doesn't apply if you didn't use Helm to install/upgrade.
  • breaking changes in v2.1.0
    • Need to use --crd-patterns to install new CRDs. Your existing CRDs will be upgraded.
  • breaking changes in v2.4.0
    • Beta CRDs are deprecated. You've definitely used these so you'll need to run the asoctl command once you're on v2.3.0.

I would recommend you read the other breaking change notices as well just to make sure you're not using the resources impacted.

The recommended upgrade pattern would be to go to every individual ASO version. This isn't strictly required but it's safest and is what we recommend. That way if something goes wrong it's obvious what the old/new versions are and the changes that might be causing the problem. You're more likely to get quality support from us following this pattern.

A (relatively) cautious but still slightly more risky upgrade would be:
v2.0.0-beta.2 -> v2.0.0-beta.4 -> v2.0.0-beta.5 -> v2.0.0 -> v2.1.0 -> v2.3.0 -> v2.4.0 -> v2.7.0

This hits all of the versions that contain major changes but skips over some of the minor version releases that don't have major changes.

A risky but it might work upgrade
v2.0.0-beta.2 -> v2.0.0-beta.4 -> v2.0.0-beta.5 -> v2.0.0 -> v2.7.0

This hits the minimum versions that you MUST hit to get from where you are to latest.

Note that in all cases, when you do the upgrade from v2.0.0+ to v2.4.0+ you must follow the v2.4.0 instructions on beta CRD deprecation and swap your CRDs to the GA versions. I would recommend you just do it one ASO version at a time (the recommended pattern). You don't need to actually spend lots of time at each ASO version, you can upgrade to a version, ensure the ASO pod launches successfully with no errors, maybe re-apply one of your resources with a simple edit (change tags or similar) to make sure things are working, and then upgrade again to the next version.

@hmi12
Copy link
Author

hmi12 commented Jun 7, 2024

@matthchr Really appreciate your detailed recommendation. We need to conduct some verification in the testing environment.
Or, is the following solution feasible?

  1. Add the "skip-reconcile" annotation to all Azure resources;
  2. Uninstall the old version of ASO from AKS;
  3. Install the latest version of ASO directly;
  4. Finally import the Azure resources using asoctl.

@matthchr matthchr added the question Further information is requested label Jun 7, 2024
@matthchr
Copy link
Member

matthchr commented Jun 7, 2024

That should at least in theory also work. Note that the annotation is reconcile-policy.

You'll need to make sure that you uninstall the CRDs too (which Helm won't do by default but you can do manually once you've deleted all of the instances of the CRDs).

Since asoctl gives you YAML that you still might need to massage a bit (for providing secrets, etc), and you also already have some (beta) YAML whose shape is likely very similar to the GA YAML shape, it's not clear to me if it'll be easier to start completely from scratch with asoctl imported resources or if it'd be easier to just modify your YAMLs locally to move from beta to GA version of CRDs (which if you follow that breaking change documentation should just be the version itself and maybe a few other small things) and then reapply them.

As to which is easier, full upgrade outlined above or this approach, it probably depends on how many ASO resources you have. if you have hundreds or thousands of resources you'd need to re-import, it'll probably be easier to just do the upgrade, even accounting for the fact that some of those resources may need to be updated due to the breaking changes mentioned above. Most resources will just need their version changed by swapping the v1beta1 to v1api. On the other hand, if you don't have that many resources, marking them as reconcile-policy: skip, deleting them (in k8s but not azure) and then re-importing with asoctl might be easier.

@matthchr
Copy link
Member

matthchr commented Jun 7, 2024

It's also worth noting that while the above is a lot of special-cases and gotchas, that's primarily because of the large amount of time between beta2 and 2.7.0, the fact that the beta CRDs were deprecated, and the fact that in beta.5 we added so many CRDs that we couldn't use Helm to manage them anymore because the chart was too large, we had to start managing them ourselves.

  • Removal of CRD versions that were once the storage version in Kubernetes require special handling and running the asoctl tool + updating all of your applied CR verisons.
  • Moving from Helm managing the CRDs to the ASO pod managing the CRDs itself required some manual updates of the CRDs to remove Helm annotations for the upgrade path from beta4 to beta5.

Once you're into the GA version (2.0.0+), there are technically small breaking changes here and there but none that are going to impact every resource like the beta->GA migration does. I wouldn't expect a hypothetical v2.5.0 -> 2.14.0 to be this complicated.

@matthchr matthchr added waiting-on-user-response Waiting on more information from the original user before progressing. and removed needs-triage 🔍 labels Jun 10, 2024
@theunrepentantgeek
Copy link
Member

How did you get on? Did you successfully upgrade - and which route did you take?

@hmi12
Copy link
Author

hmi12 commented Jul 2, 2024

The upgrade is still pending on our task list. We might test both methods in the test environment, but we haven't started yet. We will update here if any new findings.

@hmi12
Copy link
Author

hmi12 commented Jul 10, 2024

@matthchr @theunrepentantgeek
We tried to uninstall ASO 2.0.0-Beta2, also including remove the old CRDs and then install the latest version, but an error occurred while installing the new CRDs. It seems that the deprecated version is still present in etcd and cannot be manually removed. We might need to use asoctl clean crds to migrate the deprecated CRDs, but the prerequisite is: Ensure the current ASO v2 version in your cluster is beta.5....
Therefore, it seems we have to follow the recommended solution and upgrade through each individual ASO version sequentially.

Error message during install latest CRDs: request to convert CR from an invalid group/version: resources.azure.com/v1beta20200601

@matthchr
Copy link
Member

That upgrade documentation was definitely written with the "upgrade 1 version at a time" in mind. The reason for the "must be beta.5" is because asoctl clean crds will only remove the beta CRDs if there are other versions "ahead" of them (the GA versions). So it won't work in your case because the CRDs are still old and don't have the new versions yet. BUT: if you've already deleted all of your old Custom Resources and it's just the CRDs that are left, you could just delete the ASO CRDs too and then reinstall them.

Normally deleting CRDs is scary/bad, but if you know there are no instances of the CRs in the cluster it should work.
Going 1 version at a time should also work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested waiting-on-user-response Waiting on more information from the original user before progressing.
Projects
Development

No branches or pull requests

3 participants