Skip to content

Upgrade Guide

Upgrading Nantian Gateway follows a straightforward Helm workflow. The control plane and data plane use rolling updates, so there’s no downtime for traffic during the upgrade. The two planes can tolerate brief version skew during the rollout window.

This guide covers Helm-based upgrades, version compatibility rules, and what to do if something goes wrong.

Run through these checks before starting any upgrade:

Terminal window
# Confirm the current version
helm list -n nantian-gw
# Check that all pods are healthy
kubectl get pods -n nantian-gw
# Verify the GatewayClass is present
kubectl get gatewayclass nantian-gw
# Check for any stuck or failing Gateways
kubectl get gateways -A
# Save your current values
helm get values nantian-gw -n nantian-gw > pre-upgrade-values.yaml

The last step is important. helm get values captures the computed values from the current release, including defaults you didn’t explicitly set. If the upgrade changes any defaults, you’ll want to reference this file.

Nantian Gateway follows these compatibility rules:

ComponentCompatibility
Control plane and data planeMinor version skew tolerated (e.g., CP v0.1.0 with DP v0.1.1)
Gateway API versionMust match the Gateway API CRD version installed in the cluster
Kubernetes version1.24+ required

The control plane and data plane communicate over gRPC/xDS. The protocol is designed to handle minor version differences. The control plane can push configuration to a data plane running a slightly newer or older version, as long as the xDS protocol version is compatible.

Step 1: Update the Helm repository (if using a repo)

Section titled “Step 1: Update the Helm repository (if using a repo)”
Terminal window
helm repo update

If you’re installing from a local chart directory, pull the latest version:

Terminal window
git pull origin main

Before applying, see what will change:

Terminal window
helm diff upgrade nantian-gw nantian-gw/nantian-gw \
-f my-values.yaml \
-n nantian-gw

This requires the helm-diff plugin. If you don’t have it, install it:

Terminal window
helm plugin install https://github.com/databus23/helm-diff
Terminal window
helm upgrade nantian-gw nantian-gw/nantian-gw \
-f my-values.yaml \
-n nantian-gw

Helm computes the difference between the current release and the new chart, then applies only the changes. The Deployment controller handles the rolling update.

Terminal window
kubectl rollout status deployment/nantian-gw-controlplane -n nantian-gw
kubectl rollout status deployment/nantian-gw-dataplane -n nantian-gw
kubectl rollout status deployment/nantian-gw-dashboard -n nantian-gw

After all rollouts complete:

Terminal window
# All pods should be Running and Ready
kubectl get pods -n nantian-gw
# Check that Gateways are still programmed
kubectl get gateways -A
# Verify data planes are connected to the control plane
kubectl logs -n nantian-gw deployment/nantian-gw-controlplane | grep "data plane connected"

The upgrade process varies by component:

Dashboard: Upgraded first (or last, depending on Deployment ordering). Since it’s stateless and doesn’t handle traffic, the upgrade is instant. A single replica is replaced with no impact.

Control plane: Rolling update with maxSurge: 1 and maxUnavailable: 0. The new pod starts while the old pod is still running. Once the new pod passes its readiness probe, the old pod is terminated. If the old pod was the leader, the new pod (or another standby) takes over via leader election. Data planes may briefly disconnect and reconnect to the new leader.

Data plane: Rolling update with maxSurge: 1 and maxUnavailable: 0. One new pod starts before one old pod is removed. The new pod connects to the control plane, receives the current configuration snapshot, and becomes ready. Traffic is routed to the new pod, and the old pod is terminated. This process repeats until all replicas are updated.

Data plane upgrades take the longest because each new pod must establish a gRPC/xDS connection and receive a full configuration snapshot before it can accept traffic. With the default startup probe (failureThreshold: 45, periodSeconds: 2), each new pod has up to 90 seconds to become ready.

If the upgrade causes problems, roll back to the previous release:

Terminal window
helm rollback nantian-gw -n nantian-gw

This reverts to the last successful release. Helm stores previous releases based on the history limit (defaults to 10). To see available rollback targets:

Terminal window
helm history nantian-gw -n nantian-gw

Rollback to a specific revision:

Terminal window
helm rollback nantian-gw 3 -n nantian-gw

Gateway API CRDs are versioned separately from Nantian Gateway. Before upgrading to a version of Nantian Gateway that targets a newer Gateway API version, install the corresponding CRDs:

Terminal window
# Install Gateway API v1.2 CRDs (example)
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml

Check the release notes for the required Gateway API version. Installing newer CRDs before upgrading the gateway is safe. The new CRDs add fields and resources that the old controller simply ignores.

Skipping versions is generally supported, but test in a staging environment first. The safest path is sequential upgrades:

  1. Upgrade to the next minor version (e.g., v0.1.0 to v0.2.0)
  2. Verify the upgrade is stable
  3. Repeat for the next minor version

If you must skip versions, pay extra attention to the release notes for any breaking changes in the versions you’re skipping.

If new pods can’t schedule, check node resources and anti-affinity rules:

Terminal window
kubectl describe pod -n nantian-gw -l app.kubernetes.io/component=dataplane | grep -A5 Events

Common causes: insufficient CPU/memory, anti-affinity rules that can’t be satisfied, or PDBs blocking evictions.

If new data plane pods cycle through CrashLoopBackOff or never become ready:

Terminal window
kubectl logs -n nantian-gw deployment/nantian-gw-dataplane --tail=50

Common causes: can’t connect to the control plane (wrong controlPlaneAddr), TLS handshake failure (certificate mismatch), or the control plane isn’t sending a snapshot (check control plane logs).

If the control plane leader flips repeatedly after upgrade:

Terminal window
kubectl logs -n nantian-gw deployment/nantian-gw-controlplane | grep "leader"

Common causes: API server latency higher than renewDeadline, network issues between pods and the API server, or a misconfigured leader election ID.

If helm upgrade rejects your values file:

Terminal window
# See what the new chart expects
helm show values nantian-gw/nantian-gw
# Compare with your current values
diff <(helm get values nantian-gw -n nantian-gw) <(helm show values nantian-gw/nantian-gw)

The chart’s values schema may have changed between versions. Update your values file to match the new schema, then retry the upgrade.