Use Cases
Nantian Gateway’s architecture makes it suitable for several distinct deployment patterns. This page walks through real-world scenarios where the gateway provides particular value.
AI Gateway: Centralized AI Traffic Management
Section titled “AI Gateway: Centralized AI Traffic Management”Organizations deploying large language models and AI APIs face a set of problems that traditional API gateways don’t address: multiple provider formats, token-based billing, prompt-level security, and model A/B testing.
Nantian Gateway’s built-in AI gateway module handles these concerns at the proxy layer.
Multi-Provider Protocol Adaptation
Section titled “Multi-Provider Protocol Adaptation”AI providers use different API formats. OpenAI, Anthropic, and Ollama each have their own request and response schemas. Nantian Gateway normalizes these differences, letting your applications speak one format while the gateway translates to whatever each backend expects.
apiVersion: gateway.nantian.io/v1alpha1kind: AIServicemetadata: name: llm-routingspec: backends: - name: openai-gpt4 provider: openai endpoint: https://api.openai.com/v1 models: - gpt-4 - gpt-4-turbo - name: anthropic-claude provider: anthropic endpoint: https://api.anthropic.com models: - claude-3-opus - claude-3-sonnet - name: ollama-local provider: ollama endpoint: http://ollama.internal:11434 models: - llama3 - mistralApplications send requests to a single gateway endpoint. The gateway routes to the correct provider based on the requested model, handling protocol conversion transparently.
Token-Based Rate Limiting
Section titled “Token-Based Rate Limiting”AI APIs charge by token count, not by request count. A single request with a 10,000-token prompt costs the same as a hundred requests with 100-token prompts. Traditional rate limiting that counts requests misses this entirely.
Nantian Gateway’s TokenPolicy CRD enforces limits based on actual token usage:
apiVersion: gateway.nantian.io/v1alpha1kind: TokenPolicymetadata: name: team-quotasspec: rules: - selector: team: engineering limit: tokensPerMinute: 100000 tokensPerDay: 5000000 - selector: team: marketing limit: tokensPerMinute: 50000 tokensPerDay: 2000000The gateway counts tokens on every request and response, rejecting traffic that exceeds quotas before it reaches the provider. This prevents surprise bills and gives teams predictable cost boundaries.
PII Masking and Security
Section titled “PII Masking and Security”Sending sensitive data to external AI providers creates compliance risk. Nantian Gateway can mask personally identifiable information in prompts before they leave your network. The masking happens at the proxy level, so no separate service is needed.
A/B Testing Model Deployments
Section titled “A/B Testing Model Deployments”When rolling out a new model version or comparing providers, you need to split traffic and compare results. Nantian Gateway supports percentage-based traffic splitting between AI backends, with the same routing primitives used for regular HTTP traffic:
apiVersion: gateway.networking.k8s.io/v1kind: HTTPRoutemetadata: name: model-ab-testspec: rules: - matches: - path: type: PathPrefix value: /v1/chat backendRefs: - name: model-v1 port: 8080 weight: 90 - name: model-v2 port: 8080 weight: 10Kubernetes Ingress and Edge Routing
Section titled “Kubernetes Ingress and Edge Routing”The most common deployment pattern: Nantian Gateway sits at the edge of your Kubernetes cluster, routing external traffic to internal services.
Standard Gateway API Configuration
Section titled “Standard Gateway API Configuration”Since Nantian Gateway implements the Gateway API spec, your routing configuration looks the same as it would with any compliant implementation:
apiVersion: gateway.networking.k8s.io/v1kind: Gatewaymetadata: name: public-gatewayspec: gatewayClassName: nantian listeners: - name: http port: 80 protocol: HTTP - name: https port: 443 protocol: HTTPS tls: mode: Terminate certificateRefs: - name: wildcard-tls---apiVersion: gateway.networking.k8s.io/v1kind: HTTPRoutemetadata: name: app-routingspec: parentRefs: - name: public-gateway rules: - matches: - path: type: PathPrefix value: /api backendRefs: - name: api-service port: 8080 - matches: - path: type: PathPrefix value: / backendRefs: - name: frontend-service port: 3000This pattern works for any standard HTTP workload: REST APIs, gRPC services, WebSocket connections, and static file serving.
TLS Everywhere
Section titled “TLS Everywhere”Nantian Gateway supports TLS termination, passthrough, and mixed modes on the same listener. You can terminate TLS for most routes while passing through encrypted connections to backends that handle their own certificate validation:
listeners: - name: mixed-tls port: 443 protocol: TLS tls: mode: Passthrough allowedRoutes: namespaces: from: AllCombined with BackendTLSPolicy for upstream TLS and SAN validation, you get end-to-end encryption without gaps.
Multi-Protocol Proxy
Section titled “Multi-Protocol Proxy”Not all workloads speak HTTP. Databases, message queues, game servers, and legacy protocols need TCP or UDP proxying. Nantian Gateway handles these alongside HTTP traffic on the same gateway.
TCP and UDP Routing
Section titled “TCP and UDP Routing”apiVersion: gateway.networking.k8s.io/v1alpha3kind: TCPRoutemetadata: name: postgres-routespec: parentRefs: - name: public-gateway rules: - backendRefs: - name: postgres-primary port: 5432---apiVersion: gateway.networking.k8s.io/v1alpha2kind: UDPRoutemetadata: name: dns-routespec: parentRefs: - name: public-gateway rules: - backendRefs: - name: coredns port: 53gRPC with Method-Level Routing
Section titled “gRPC with Method-Level Routing”gRPC services benefit from method-level routing, which Nantian Gateway supports through GRPCRoute with named route rules:
apiVersion: gateway.networking.k8s.io/v1kind: GRPCRoutemetadata: name: grpc-service-routingspec: parentRefs: - name: public-gateway rules: - matches: - method: service: users.v1.UserService method: GetUser backendRefs: - name: user-service-read port: 9090 - matches: - method: service: users.v1.UserService method: UpdateUser backendRefs: - name: user-service-write port: 9090This lets you route read and write operations to different backends, apply different rate limits per method, or direct specific RPCs to canary deployments.
Sidecar-Free Service Mesh
Section titled “Sidecar-Free Service Mesh”Nantian Gateway implements the Gateway API Mesh model, which provides mesh capabilities without injecting sidecars into every pod.
How It Works
Section titled “How It Works”Instead of a sidecar per pod, the mesh model uses a shared gateway proxy that handles inter-service traffic. This reduces resource consumption (no per-pod proxy overhead), eliminates sidecar lifecycle ordering issues, and simplifies debugging.
apiVersion: gateway.networking.k8s.io/v1kind: Gatewaymetadata: name: mesh-gatewayspec: gatewayClassName: nantian-mesh listeners: - name: mesh-http port: 8080 protocol: HTTP---apiVersion: gateway.networking.k8s.io/v1kind: HTTPRoutemetadata: name: service-a-to-service-bspec: parentRefs: - name: mesh-gateway kind: Gateway group: gateway.networking.k8s.io rules: - matches: - path: type: PathPrefix value: / backendRefs: - name: service-b port: 8080 filters: - type: RequestHeaderModifier requestHeaderModifier: add: - name: X-Routed-By value: nantian-meshWhen to Use This Pattern
Section titled “When to Use This Pattern”The sidecar-free mesh model works well when:
- You’re running in resource-constrained environments where per-pod proxies would consume too much CPU and memory
- You have a large number of short-lived pods (batch jobs, serverless functions) where sidecar injection adds unacceptable startup latency
- You want mesh features (header modification, traffic splitting, redirects) without the full complexity of a mesh control plane
- You’re already using Nantian Gateway as an ingress gateway and want to extend it for east-west traffic
Wasm-Powered Custom Extensions
Section titled “Wasm-Powered Custom Extensions”When built-in filters aren’t enough, Nantian Gateway’s Wasm plugin system lets you inject custom logic at the proxy level.
Use Cases for Wasm Plugins
Section titled “Use Cases for Wasm Plugins”Custom authentication: Validate JWTs against an internal identity provider, check API keys against a custom database, or implement request signing that doesn’t match any standard.
Request transformation: Reshape JSON payloads, convert between API versions, or enrich requests with data from internal services before forwarding to backends.
Custom observability: Emit metrics or traces in formats specific to your observability stack, sample requests based on custom criteria, or log structured data that your existing tooling expects.
Compliance and governance: Enforce data residency rules by inspecting request content, block requests to disallowed domains, or inject compliance headers.
Plugin Lifecycle
Section titled “Plugin Lifecycle”Wasm plugins are distributed as .wasm binaries and managed through the WasmPlugin CRD:
apiVersion: gateway.nantian.io/v1alpha1kind: WasmPluginmetadata: name: custom-authspec: pluginURL: oci://registry.example.com/plugins/auth:v1.2.0 phase: OnRequestHeaders selector: matchLabels: app: payment-serviceThe plugin runs inside the data plane’s wasmtime runtime, with access to host functions for logging, metrics, and HTTP dispatch. Plugins are sandboxed and cannot access the host filesystem or network unless explicitly granted.
SDK and Development
Section titled “SDK and Development”Nantian Gateway ships a Wasm SDK (Rust crate: ntgw-wasm-sdk) that provides typed bindings for plugin development. You write your plugin in Rust, compile to wasm32-wasi, and deploy through the CRD.
High-Performance Edge with Observability
Section titled “High-Performance Edge with Observability”For latency-sensitive workloads, Nantian Gateway’s Rust data plane provides predictable performance with low tail latency. Combined with the observability stack, you get full visibility into gateway behavior.
Metrics That Matter
Section titled “Metrics That Matter”The gateway exposes Prometheus metrics covering:
- Request rate, latency (p50/p90/p99), and error rate per route and backend
- TLS handshake duration and certificate expiry
- xDS configuration staleness and update latency
- AI gateway token usage and rate limit status
- Wasm plugin execution time and error counts
Production Deployment Pattern
Section titled “Production Deployment Pattern”A typical production deployment runs:
- Two or more control plane replicas with leader election for high availability
- Multiple data plane replicas behind a Kubernetes Service of type LoadBalancer
- Horizontal Pod Autoscaling on the data plane based on CPU and memory
- Prometheus scraping both control plane and data plane metrics endpoints
- Grafana dashboards for cluster-level visibility
- Alertmanager rules for gateway health, TLS expiry, and error rate thresholds
See the Production Installation guide for a complete walkthrough.
Combining Use Cases
Section titled “Combining Use Cases”Nantian Gateway’s strength is that these use cases compose. A single deployment can:
- Serve as your primary Kubernetes ingress gateway
- Route AI traffic to multiple providers with token-based rate limiting
- Proxy TCP connections to databases and message queues
- Run Wasm plugins for custom authentication and request transformation
- Expose Prometheus metrics to your existing observability stack
You don’t need separate gateways for HTTP, AI, and TCP traffic. One Nantian Gateway deployment handles all of them, configured through the same Gateway API resources and managed through the same dashboard.