Upgrade Guide

Upgrading Nantian Gateway follows a straightforward Helm workflow. The control plane and data plane use rolling updates, so there’s no downtime for traffic during the upgrade. The two planes can tolerate brief version skew during the rollout window.

This guide covers Helm-based upgrades, version compatibility rules, and what to do if something goes wrong.

Before you upgrade

Run through these checks before starting any upgrade:

# Confirm the current version
helm list -n nantian-gw

# Check that all pods are healthy
kubectl get pods -n nantian-gw

# Verify the GatewayClass is present
kubectl get gatewayclass nantian-gw

# Check for any stuck or failing Gateways
kubectl get gateways -A

# Save your current values
helm get values nantian-gw -n nantian-gw > pre-upgrade-values.yaml

The last step is important. helm get values captures the computed values from the current release, including defaults you didn’t explicitly set. If the upgrade changes any defaults, you’ll want to reference this file.

Version compatibility

Nantian Gateway follows these compatibility rules:

Component	Compatibility
Control plane and data plane	Minor version skew tolerated (e.g., CP v0.1.0 with DP v0.1.1)
Gateway API version	Must match the Gateway API CRD version installed in the cluster
Kubernetes version	1.24+ required

The control plane and data plane communicate over gRPC/xDS. The protocol is designed to handle minor version differences. The control plane can push configuration to a data plane running a slightly newer or older version, as long as the xDS protocol version is compatible.

Standard upgrade procedure

Step 1: Update the Helm repository (if using a repo)

helm repo update

If you’re installing from a local chart directory, pull the latest version:

git pull origin main

Step 2: Diff the changes

Before applying, see what will change:

helm diff upgrade nantian-gw nantian-gw/nantian-gw \
  -f my-values.yaml \
  -n nantian-gw

This requires the helm-diff plugin. If you don’t have it, install it:

helm plugin install https://github.com/databus23/helm-diff

Step 3: Apply the upgrade

helm upgrade nantian-gw nantian-gw/nantian-gw \
  -f my-values.yaml \
  -n nantian-gw

Helm computes the difference between the current release and the new chart, then applies only the changes. The Deployment controller handles the rolling update.

Step 4: Watch the rollout

kubectl rollout status deployment/nantian-gw-controlplane -n nantian-gw
kubectl rollout status deployment/nantian-gw-dataplane -n nantian-gw
kubectl rollout status deployment/nantian-gw-dashboard -n nantian-gw

Step 5: Verify

After all rollouts complete:

# All pods should be Running and Ready
kubectl get pods -n nantian-gw

# Check that Gateways are still programmed
kubectl get gateways -A

# Verify data planes are connected to the control plane
kubectl logs -n nantian-gw deployment/nantian-gw-controlplane | grep "data plane connected"

What happens during the upgrade

The upgrade process varies by component:

Dashboard: Upgraded first (or last, depending on Deployment ordering). Since it’s stateless and doesn’t handle traffic, the upgrade is instant. A single replica is replaced with no impact.

Control plane: Rolling update with maxSurge: 1 and maxUnavailable: 0. The new pod starts while the old pod is still running. Once the new pod passes its readiness probe, the old pod is terminated. If the old pod was the leader, the new pod (or another standby) takes over via leader election. Data planes may briefly disconnect and reconnect to the new leader.

Data plane: Rolling update with maxSurge: 1 and maxUnavailable: 0. One new pod starts before one old pod is removed. The new pod connects to the control plane, receives the current configuration snapshot, and becomes ready. Traffic is routed to the new pod, and the old pod is terminated. This process repeats until all replicas are updated.

Data plane upgrades take the longest because each new pod must establish a gRPC/xDS connection and receive a full configuration snapshot before it can accept traffic. With the default startup probe (failureThreshold: 45, periodSeconds: 2), each new pod has up to 90 seconds to become ready.

Rolling back

If the upgrade causes problems, roll back to the previous release:

helm rollback nantian-gw -n nantian-gw

This reverts to the last successful release. Helm stores previous releases based on the history limit (defaults to 10). To see available rollback targets:

helm history nantian-gw -n nantian-gw

Rollback to a specific revision:

helm rollback nantian-gw 3 -n nantian-gw

Upgrading Gateway API CRDs

Gateway API CRDs are versioned separately from Nantian Gateway. Before upgrading to a version of Nantian Gateway that targets a newer Gateway API version, install the corresponding CRDs:

# Install Gateway API v1.2 CRDs (example)
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml

Check the release notes for the required Gateway API version. Installing newer CRDs before upgrading the gateway is safe. The new CRDs add fields and resources that the old controller simply ignores.

Upgrading across multiple minor versions

Skipping versions is generally supported, but test in a staging environment first. The safest path is sequential upgrades:

Upgrade to the next minor version (e.g., v0.1.0 to v0.2.0)
Verify the upgrade is stable
Repeat for the next minor version

If you must skip versions, pay extra attention to the release notes for any breaking changes in the versions you’re skipping.

Troubleshooting failed upgrades

Pods stuck in Pending

If new pods can’t schedule, check node resources and anti-affinity rules:

kubectl describe pod -n nantian-gw -l app.kubernetes.io/component=dataplane | grep -A5 Events

Common causes: insufficient CPU/memory, anti-affinity rules that can’t be satisfied, or PDBs blocking evictions.

Data plane pods failing readiness

If new data plane pods cycle through CrashLoopBackOff or never become ready:

kubectl logs -n nantian-gw deployment/nantian-gw-dataplane --tail=50

Common causes: can’t connect to the control plane (wrong controlPlaneAddr), TLS handshake failure (certificate mismatch), or the control plane isn’t sending a snapshot (check control plane logs).

Control plane leader flapping

If the control plane leader flips repeatedly after upgrade:

kubectl logs -n nantian-gw deployment/nantian-gw-controlplane | grep "leader"

Common causes: API server latency higher than renewDeadline, network issues between pods and the API server, or a misconfigured leader election ID.

Helm upgrade fails with schema validation

If helm upgrade rejects your values file:

# See what the new chart expects
helm show values nantian-gw/nantian-gw

# Compare with your current values
diff <(helm get values nantian-gw -n nantian-gw) <(helm show values nantian-gw/nantian-gw)

The chart’s values schema may have changed between versions. Update your values file to match the new schema, then retry the upgrade.