Upgrade Guide
Upgrading Nantian Gateway follows a straightforward Helm workflow. The control plane and data plane use rolling updates, so there’s no downtime for traffic during the upgrade. The two planes can tolerate brief version skew during the rollout window.
This guide covers Helm-based upgrades, version compatibility rules, and what to do if something goes wrong.
Before you upgrade
Section titled “Before you upgrade”Run through these checks before starting any upgrade:
# Confirm the current versionhelm list -n nantian-gw
# Check that all pods are healthykubectl get pods -n nantian-gw
# Verify the GatewayClass is presentkubectl get gatewayclass nantian-gw
# Check for any stuck or failing Gatewayskubectl get gateways -A
# Save your current valueshelm get values nantian-gw -n nantian-gw > pre-upgrade-values.yamlThe last step is important. helm get values captures the computed values from the current release, including defaults you didn’t explicitly set. If the upgrade changes any defaults, you’ll want to reference this file.
Version compatibility
Section titled “Version compatibility”Nantian Gateway follows these compatibility rules:
| Component | Compatibility |
|---|---|
| Control plane and data plane | Minor version skew tolerated (e.g., CP v0.1.0 with DP v0.1.1) |
| Gateway API version | Must match the Gateway API CRD version installed in the cluster |
| Kubernetes version | 1.24+ required |
The control plane and data plane communicate over gRPC/xDS. The protocol is designed to handle minor version differences. The control plane can push configuration to a data plane running a slightly newer or older version, as long as the xDS protocol version is compatible.
Standard upgrade procedure
Section titled “Standard upgrade procedure”Step 1: Update the Helm repository (if using a repo)
Section titled “Step 1: Update the Helm repository (if using a repo)”helm repo updateIf you’re installing from a local chart directory, pull the latest version:
git pull origin mainStep 2: Diff the changes
Section titled “Step 2: Diff the changes”Before applying, see what will change:
helm diff upgrade nantian-gw nantian-gw/nantian-gw \ -f my-values.yaml \ -n nantian-gwThis requires the helm-diff plugin. If you don’t have it, install it:
helm plugin install https://github.com/databus23/helm-diffStep 3: Apply the upgrade
Section titled “Step 3: Apply the upgrade”helm upgrade nantian-gw nantian-gw/nantian-gw \ -f my-values.yaml \ -n nantian-gwHelm computes the difference between the current release and the new chart, then applies only the changes. The Deployment controller handles the rolling update.
Step 4: Watch the rollout
Section titled “Step 4: Watch the rollout”kubectl rollout status deployment/nantian-gw-controlplane -n nantian-gwkubectl rollout status deployment/nantian-gw-dataplane -n nantian-gwkubectl rollout status deployment/nantian-gw-dashboard -n nantian-gwStep 5: Verify
Section titled “Step 5: Verify”After all rollouts complete:
# All pods should be Running and Readykubectl get pods -n nantian-gw
# Check that Gateways are still programmedkubectl get gateways -A
# Verify data planes are connected to the control planekubectl logs -n nantian-gw deployment/nantian-gw-controlplane | grep "data plane connected"What happens during the upgrade
Section titled “What happens during the upgrade”The upgrade process varies by component:
Dashboard: Upgraded first (or last, depending on Deployment ordering). Since it’s stateless and doesn’t handle traffic, the upgrade is instant. A single replica is replaced with no impact.
Control plane: Rolling update with maxSurge: 1 and maxUnavailable: 0. The new pod starts while the old pod is still running. Once the new pod passes its readiness probe, the old pod is terminated. If the old pod was the leader, the new pod (or another standby) takes over via leader election. Data planes may briefly disconnect and reconnect to the new leader.
Data plane: Rolling update with maxSurge: 1 and maxUnavailable: 0. One new pod starts before one old pod is removed. The new pod connects to the control plane, receives the current configuration snapshot, and becomes ready. Traffic is routed to the new pod, and the old pod is terminated. This process repeats until all replicas are updated.
Data plane upgrades take the longest because each new pod must establish a gRPC/xDS connection and receive a full configuration snapshot before it can accept traffic. With the default startup probe (failureThreshold: 45, periodSeconds: 2), each new pod has up to 90 seconds to become ready.
Rolling back
Section titled “Rolling back”If the upgrade causes problems, roll back to the previous release:
helm rollback nantian-gw -n nantian-gwThis reverts to the last successful release. Helm stores previous releases based on the history limit (defaults to 10). To see available rollback targets:
helm history nantian-gw -n nantian-gwRollback to a specific revision:
helm rollback nantian-gw 3 -n nantian-gwUpgrading Gateway API CRDs
Section titled “Upgrading Gateway API CRDs”Gateway API CRDs are versioned separately from Nantian Gateway. Before upgrading to a version of Nantian Gateway that targets a newer Gateway API version, install the corresponding CRDs:
# Install Gateway API v1.2 CRDs (example)kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yamlCheck the release notes for the required Gateway API version. Installing newer CRDs before upgrading the gateway is safe. The new CRDs add fields and resources that the old controller simply ignores.
Upgrading across multiple minor versions
Section titled “Upgrading across multiple minor versions”Skipping versions is generally supported, but test in a staging environment first. The safest path is sequential upgrades:
- Upgrade to the next minor version (e.g., v0.1.0 to v0.2.0)
- Verify the upgrade is stable
- Repeat for the next minor version
If you must skip versions, pay extra attention to the release notes for any breaking changes in the versions you’re skipping.
Troubleshooting failed upgrades
Section titled “Troubleshooting failed upgrades”Pods stuck in Pending
Section titled “Pods stuck in Pending”If new pods can’t schedule, check node resources and anti-affinity rules:
kubectl describe pod -n nantian-gw -l app.kubernetes.io/component=dataplane | grep -A5 EventsCommon causes: insufficient CPU/memory, anti-affinity rules that can’t be satisfied, or PDBs blocking evictions.
Data plane pods failing readiness
Section titled “Data plane pods failing readiness”If new data plane pods cycle through CrashLoopBackOff or never become ready:
kubectl logs -n nantian-gw deployment/nantian-gw-dataplane --tail=50Common causes: can’t connect to the control plane (wrong controlPlaneAddr), TLS handshake failure (certificate mismatch), or the control plane isn’t sending a snapshot (check control plane logs).
Control plane leader flapping
Section titled “Control plane leader flapping”If the control plane leader flips repeatedly after upgrade:
kubectl logs -n nantian-gw deployment/nantian-gw-controlplane | grep "leader"Common causes: API server latency higher than renewDeadline, network issues between pods and the API server, or a misconfigured leader election ID.
Helm upgrade fails with schema validation
Section titled “Helm upgrade fails with schema validation”If helm upgrade rejects your values file:
# See what the new chart expectshelm show values nantian-gw/nantian-gw
# Compare with your current valuesdiff <(helm get values nantian-gw -n nantian-gw) <(helm show values nantian-gw/nantian-gw)The chart’s values schema may have changed between versions. Update your values file to match the new schema, then retry the upgrade.