Skip to content

Use Cases

Nantian Gateway’s architecture makes it suitable for several distinct deployment patterns. This page walks through real-world scenarios where the gateway provides particular value.

AI Gateway: Centralized AI Traffic Management

Section titled “AI Gateway: Centralized AI Traffic Management”

Organizations deploying large language models and AI APIs face a set of problems that traditional API gateways don’t address: multiple provider formats, token-based billing, prompt-level security, and model A/B testing.

Nantian Gateway’s built-in AI gateway module handles these concerns at the proxy layer.

AI providers use different API formats. OpenAI, Anthropic, and Ollama each have their own request and response schemas. Nantian Gateway normalizes these differences, letting your applications speak one format while the gateway translates to whatever each backend expects.

apiVersion: gateway.nantian.io/v1alpha1
kind: AIService
metadata:
name: llm-routing
spec:
backends:
- name: openai-gpt4
provider: openai
endpoint: https://api.openai.com/v1
models:
- gpt-4
- gpt-4-turbo
- name: anthropic-claude
provider: anthropic
endpoint: https://api.anthropic.com
models:
- claude-3-opus
- claude-3-sonnet
- name: ollama-local
provider: ollama
endpoint: http://ollama.internal:11434
models:
- llama3
- mistral

Applications send requests to a single gateway endpoint. The gateway routes to the correct provider based on the requested model, handling protocol conversion transparently.

AI APIs charge by token count, not by request count. A single request with a 10,000-token prompt costs the same as a hundred requests with 100-token prompts. Traditional rate limiting that counts requests misses this entirely.

Nantian Gateway’s TokenPolicy CRD enforces limits based on actual token usage:

apiVersion: gateway.nantian.io/v1alpha1
kind: TokenPolicy
metadata:
name: team-quotas
spec:
rules:
- selector:
team: engineering
limit:
tokensPerMinute: 100000
tokensPerDay: 5000000
- selector:
team: marketing
limit:
tokensPerMinute: 50000
tokensPerDay: 2000000

The gateway counts tokens on every request and response, rejecting traffic that exceeds quotas before it reaches the provider. This prevents surprise bills and gives teams predictable cost boundaries.

Sending sensitive data to external AI providers creates compliance risk. Nantian Gateway can mask personally identifiable information in prompts before they leave your network. The masking happens at the proxy level, so no separate service is needed.

When rolling out a new model version or comparing providers, you need to split traffic and compare results. Nantian Gateway supports percentage-based traffic splitting between AI backends, with the same routing primitives used for regular HTTP traffic:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: model-ab-test
spec:
rules:
- matches:
- path:
type: PathPrefix
value: /v1/chat
backendRefs:
- name: model-v1
port: 8080
weight: 90
- name: model-v2
port: 8080
weight: 10

The most common deployment pattern: Nantian Gateway sits at the edge of your Kubernetes cluster, routing external traffic to internal services.

Since Nantian Gateway implements the Gateway API spec, your routing configuration looks the same as it would with any compliant implementation:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: public-gateway
spec:
gatewayClassName: nantian
listeners:
- name: http
port: 80
protocol: HTTP
- name: https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: wildcard-tls
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: app-routing
spec:
parentRefs:
- name: public-gateway
rules:
- matches:
- path:
type: PathPrefix
value: /api
backendRefs:
- name: api-service
port: 8080
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: frontend-service
port: 3000

This pattern works for any standard HTTP workload: REST APIs, gRPC services, WebSocket connections, and static file serving.

Nantian Gateway supports TLS termination, passthrough, and mixed modes on the same listener. You can terminate TLS for most routes while passing through encrypted connections to backends that handle their own certificate validation:

listeners:
- name: mixed-tls
port: 443
protocol: TLS
tls:
mode: Passthrough
allowedRoutes:
namespaces:
from: All

Combined with BackendTLSPolicy for upstream TLS and SAN validation, you get end-to-end encryption without gaps.

Not all workloads speak HTTP. Databases, message queues, game servers, and legacy protocols need TCP or UDP proxying. Nantian Gateway handles these alongside HTTP traffic on the same gateway.

apiVersion: gateway.networking.k8s.io/v1alpha3
kind: TCPRoute
metadata:
name: postgres-route
spec:
parentRefs:
- name: public-gateway
rules:
- backendRefs:
- name: postgres-primary
port: 5432
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: UDPRoute
metadata:
name: dns-route
spec:
parentRefs:
- name: public-gateway
rules:
- backendRefs:
- name: coredns
port: 53

gRPC services benefit from method-level routing, which Nantian Gateway supports through GRPCRoute with named route rules:

apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
name: grpc-service-routing
spec:
parentRefs:
- name: public-gateway
rules:
- matches:
- method:
service: users.v1.UserService
method: GetUser
backendRefs:
- name: user-service-read
port: 9090
- matches:
- method:
service: users.v1.UserService
method: UpdateUser
backendRefs:
- name: user-service-write
port: 9090

This lets you route read and write operations to different backends, apply different rate limits per method, or direct specific RPCs to canary deployments.

Nantian Gateway implements the Gateway API Mesh model, which provides mesh capabilities without injecting sidecars into every pod.

Instead of a sidecar per pod, the mesh model uses a shared gateway proxy that handles inter-service traffic. This reduces resource consumption (no per-pod proxy overhead), eliminates sidecar lifecycle ordering issues, and simplifies debugging.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: mesh-gateway
spec:
gatewayClassName: nantian-mesh
listeners:
- name: mesh-http
port: 8080
protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: service-a-to-service-b
spec:
parentRefs:
- name: mesh-gateway
kind: Gateway
group: gateway.networking.k8s.io
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: service-b
port: 8080
filters:
- type: RequestHeaderModifier
requestHeaderModifier:
add:
- name: X-Routed-By
value: nantian-mesh

The sidecar-free mesh model works well when:

  • You’re running in resource-constrained environments where per-pod proxies would consume too much CPU and memory
  • You have a large number of short-lived pods (batch jobs, serverless functions) where sidecar injection adds unacceptable startup latency
  • You want mesh features (header modification, traffic splitting, redirects) without the full complexity of a mesh control plane
  • You’re already using Nantian Gateway as an ingress gateway and want to extend it for east-west traffic

When built-in filters aren’t enough, Nantian Gateway’s Wasm plugin system lets you inject custom logic at the proxy level.

Custom authentication: Validate JWTs against an internal identity provider, check API keys against a custom database, or implement request signing that doesn’t match any standard.

Request transformation: Reshape JSON payloads, convert between API versions, or enrich requests with data from internal services before forwarding to backends.

Custom observability: Emit metrics or traces in formats specific to your observability stack, sample requests based on custom criteria, or log structured data that your existing tooling expects.

Compliance and governance: Enforce data residency rules by inspecting request content, block requests to disallowed domains, or inject compliance headers.

Wasm plugins are distributed as .wasm binaries and managed through the WasmPlugin CRD:

apiVersion: gateway.nantian.io/v1alpha1
kind: WasmPlugin
metadata:
name: custom-auth
spec:
pluginURL: oci://registry.example.com/plugins/auth:v1.2.0
phase: OnRequestHeaders
selector:
matchLabels:
app: payment-service

The plugin runs inside the data plane’s wasmtime runtime, with access to host functions for logging, metrics, and HTTP dispatch. Plugins are sandboxed and cannot access the host filesystem or network unless explicitly granted.

Nantian Gateway ships a Wasm SDK (Rust crate: ntgw-wasm-sdk) that provides typed bindings for plugin development. You write your plugin in Rust, compile to wasm32-wasi, and deploy through the CRD.

For latency-sensitive workloads, Nantian Gateway’s Rust data plane provides predictable performance with low tail latency. Combined with the observability stack, you get full visibility into gateway behavior.

The gateway exposes Prometheus metrics covering:

  • Request rate, latency (p50/p90/p99), and error rate per route and backend
  • TLS handshake duration and certificate expiry
  • xDS configuration staleness and update latency
  • AI gateway token usage and rate limit status
  • Wasm plugin execution time and error counts

A typical production deployment runs:

  • Two or more control plane replicas with leader election for high availability
  • Multiple data plane replicas behind a Kubernetes Service of type LoadBalancer
  • Horizontal Pod Autoscaling on the data plane based on CPU and memory
  • Prometheus scraping both control plane and data plane metrics endpoints
  • Grafana dashboards for cluster-level visibility
  • Alertmanager rules for gateway health, TLS expiry, and error rate thresholds

See the Production Installation guide for a complete walkthrough.

Nantian Gateway’s strength is that these use cases compose. A single deployment can:

  • Serve as your primary Kubernetes ingress gateway
  • Route AI traffic to multiple providers with token-based rate limiting
  • Proxy TCP connections to databases and message queues
  • Run Wasm plugins for custom authentication and request transformation
  • Expose Prometheus metrics to your existing observability stack

You don’t need separate gateways for HTTP, AI, and TCP traffic. One Nantian Gateway deployment handles all of them, configured through the same Gateway API resources and managed through the same dashboard.