EKS Upgrades Are No Longer a One-Way Door: Kubernetes Version Rollback

Published: | Category: Kubernetes & EKS
Quick Summary: For years, upgrading an EKS control plane was a one-way door - bump the Kubernetes minor version, and if something broke, you had to fix forward. AWS just changed that. Amazon EKS now supports Kubernetes version rollback: revert the control plane to the previous minor version within 7 days of an in-place upgrade, from the console, CLI, or SDK, at no additional cost. Here's how it works and how it should change your upgrade strategy.

The problem: upgrades you couldn't take back

If you've run Amazon EKS for any length of time, you know the quiet anxiety of a version upgrade. You move a cluster from, say, 1.29 to 1.30. The upgrade succeeds. And then, hours later, something misbehaves - a deprecated API your workloads still call, an add-on that isn't happy on the new version, a controller acting strangely under real traffic.

Until now, there was no going back. The control plane version was a one-way ratchet. Your only option was to fix forward - debug and patch under pressure, in production, with no escape hatch. That fear is exactly why so many teams sit several versions behind: upgrading felt like a leap of faith.

What AWS just shipped

Amazon EKS now supports Kubernetes version rollback. In plain terms: if an upgrade goes sideways, you can put the control plane back the way it was.

  • Revert to the previous minor version (e.g. roll 1.30 back to 1.29).
  • Within 7 days of the upgrade completing.
  • Trigger it from the console, AWS CLI, or SDK.
  • No additional cost, in all regions where EKS is available.

It's not a blind revert: rollback readiness checks

The part I like most is that EKS doesn't just yank the version back and hope. Before it rolls back, it evaluates rollback readiness insights - automated checks that include:

  • API compatibility - will your workloads still work on the older version?
  • Version skew - the allowed gap between control plane and nodes (Kubernetes only supports the control plane being a version or two ahead of kubelets, not behind).
  • Add-on compatibility - managed add-ons that need to match the target version.
  • Cluster health - general readiness signals.

So the rollback is guided, not a shot in the dark.

EKS Auto Mode gets special handling

For clusters running EKS Auto Mode, EKS orchestrates the rollback in the right order: it rolls back the worker nodes first, then the control plane, while honoring your configured disruption controls. That ordering matters because of version skew - you don't want kubelets stranded ahead of the control plane.

"In-place upgrade only" - what that means

The rollback applies only to in-place upgrades. There are two ways to move an EKS cluster to a newer version, and the distinction matters:

In-place upgradeBlue/green (migration)
What upgradesThe existing cluster's version (1.29 → 1.30, same cluster)A brand-new cluster on the new version; you migrate workloads over
Rollback methodAWS's new 7-day rollbackPoint traffic back to the old cluster (it still exists)
EffortLow - a click / one API callHigh - build, migrate, cut over

With a blue/green upgrade you already have your own rollback path - the old cluster is still running, so you just shift back and delete the new one. There's nothing for AWS to revert. The new feature is for the far more common in-place path, where previously you had no safety net at all.

The catch worth understanding

This is a safety net, not an "undo anytime" button. Two limits define it:
  • In-place upgrades only - blue/green migrations aren't covered (and don't need it).
  • 7-day window - once it closes, the previous version is gone and you cannot roll back.

The intent is clear: use the window to validate the new version under real production load, and pull the ripcord fast if it goes wrong. It's not a license to run indefinitely on a version you can casually undo months later.

How this should change your upgrade playbook

  1. Still test in non-prod first. Rollback lowers the risk; it doesn't remove the need to test. Catch what you can before prod.
  2. Treat the 7 days as an active validation window. After upgrading prod, deliberately exercise the risky paths - the deprecated APIs, the finicky add-ons, peak load - while you still have the escape hatch.
  3. Know your rollback command before you need it. Don't learn the API during an incident. Confirm the console flow / CLI call in a lower environment first.
  4. Mind version skew on nodes. If you rolled the control plane forward and updated nodes, rolling the control plane back means the readiness checks will care about that gap - which is exactly why Auto Mode sequences nodes first.

Frequently Asked Questions

Can you roll back an EKS version upgrade?

Yes - EKS now supports rolling the control plane back to the previous minor version within 7 days of an in-place upgrade, via console, CLI, or SDK, at no extra cost.

How long is the rollback window?

7 days from when the in-place upgrade completes. After that, the previous version is no longer available.

Does it cost extra?

No. It's available at no additional cost in all regions where EKS is available.

What is an in-place upgrade vs blue/green?

In-place upgrades the same existing cluster's version; blue/green builds a new cluster and migrates workloads. Rollback only applies to in-place upgrades - with blue/green you fall back by pointing to the old cluster.

My Takeaway

This quietly removes one of the scariest parts of operating EKS. Upgrades stop being a leap of faith and become something you can validate and, if needed, reverse. It won't replace proper testing, and the 7-day limit means you still need discipline - but the days of being trapped on a bad version with no way back are over.

If you've been sitting a few versions behind because upgrading felt too risky, this is a good moment to revisit that decision.

References: AWS What's New announcement, the AWS Containers blog, and the EKS rollback documentation.