Use Cases

Nantian Gateway’s architecture makes it suitable for several distinct deployment patterns. This page walks through real-world scenarios where the gateway provides particular value.

AI Gateway: Centralized AI Traffic Management

Organizations deploying large language models and AI APIs face a set of problems that traditional API gateways don’t address: multiple provider formats, token-based billing, prompt-level security, and model A/B testing.

Nantian Gateway’s built-in AI gateway module handles these concerns at the proxy layer.

Multi-Provider Protocol Adaptation

AI providers use different API formats. OpenAI, Anthropic, and Ollama each have their own request and response schemas. Nantian Gateway normalizes these differences, letting your applications speak one format while the gateway translates to whatever each backend expects.

apiVersion: gateway.nantian.io/v1alpha1
kind: AIService
metadata:
  name: llm-routing
spec:
  backends:
    - name: openai-gpt4
      provider: openai
      endpoint: https://api.openai.com/v1
      models:
        - gpt-4
        - gpt-4-turbo
    - name: anthropic-claude
      provider: anthropic
      endpoint: https://api.anthropic.com
      models:
        - claude-3-opus
        - claude-3-sonnet
    - name: ollama-local
      provider: ollama
      endpoint: http://ollama.internal:11434
      models:
        - llama3
        - mistral

Applications send requests to a single gateway endpoint. The gateway routes to the correct provider based on the requested model, handling protocol conversion transparently.

Token-Based Rate Limiting

AI APIs charge by token count, not by request count. A single request with a 10,000-token prompt costs the same as a hundred requests with 100-token prompts. Traditional rate limiting that counts requests misses this entirely.

Nantian Gateway’s TokenPolicy CRD enforces limits based on actual token usage:

apiVersion: gateway.nantian.io/v1alpha1
kind: TokenPolicy
metadata:
  name: team-quotas
spec:
  rules:
    - selector:
        team: engineering
      limit:
        tokensPerMinute: 100000
        tokensPerDay: 5000000
    - selector:
        team: marketing
      limit:
        tokensPerMinute: 50000
        tokensPerDay: 2000000

The gateway counts tokens on every request and response, rejecting traffic that exceeds quotas before it reaches the provider. This prevents surprise bills and gives teams predictable cost boundaries.

PII Masking and Security

Sending sensitive data to external AI providers creates compliance risk. Nantian Gateway can mask personally identifiable information in prompts before they leave your network. The masking happens at the proxy level, so no separate service is needed.

A/B Testing Model Deployments

When rolling out a new model version or comparing providers, you need to split traffic and compare results. Nantian Gateway supports percentage-based traffic splitting between AI backends, with the same routing primitives used for regular HTTP traffic:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: model-ab-test
spec:
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v1/chat
      backendRefs:
        - name: model-v1
          port: 8080
          weight: 90
        - name: model-v2
          port: 8080
          weight: 10

Kubernetes Ingress and Edge Routing

The most common deployment pattern: Nantian Gateway sits at the edge of your Kubernetes cluster, routing external traffic to internal services.

Standard Gateway API Configuration

Since Nantian Gateway implements the Gateway API spec, your routing configuration looks the same as it would with any compliant implementation:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: public-gateway
spec:
  gatewayClassName: nantian
  listeners:
    - name: http
      port: 80
      protocol: HTTP
    - name: https
      port: 443
      protocol: HTTPS
      tls:
        mode: Terminate
        certificateRefs:
          - name: wildcard-tls
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: app-routing
spec:
  parentRefs:
    - name: public-gateway
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api-service
          port: 8080
    - matches:
        - path:
            type: PathPrefix
            value: /
      backendRefs:
        - name: frontend-service
          port: 3000

This pattern works for any standard HTTP workload: REST APIs, gRPC services, WebSocket connections, and static file serving.

TLS Everywhere

Nantian Gateway supports TLS termination, passthrough, and mixed modes on the same listener. You can terminate TLS for most routes while passing through encrypted connections to backends that handle their own certificate validation:

listeners:
  - name: mixed-tls
    port: 443
    protocol: TLS
    tls:
      mode: Passthrough
    allowedRoutes:
      namespaces:
        from: All

Combined with BackendTLSPolicy for upstream TLS and SAN validation, you get end-to-end encryption without gaps.

Multi-Protocol Proxy

Not all workloads speak HTTP. Databases, message queues, game servers, and legacy protocols need TCP or UDP proxying. Nantian Gateway handles these alongside HTTP traffic on the same gateway.

TCP and UDP Routing

apiVersion: gateway.networking.k8s.io/v1alpha3
kind: TCPRoute
metadata:
  name: postgres-route
spec:
  parentRefs:
    - name: public-gateway
  rules:
    - backendRefs:
        - name: postgres-primary
          port: 5432
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: UDPRoute
metadata:
  name: dns-route
spec:
  parentRefs:
    - name: public-gateway
  rules:
    - backendRefs:
        - name: coredns
          port: 53

gRPC with Method-Level Routing

gRPC services benefit from method-level routing, which Nantian Gateway supports through GRPCRoute with named route rules:

apiVersion: gateway.networking.k8s.io/v1
kind: GRPCRoute
metadata:
  name: grpc-service-routing
spec:
  parentRefs:
    - name: public-gateway
  rules:
    - matches:
        - method:
            service: users.v1.UserService
            method: GetUser
      backendRefs:
        - name: user-service-read
          port: 9090
    - matches:
        - method:
            service: users.v1.UserService
            method: UpdateUser
      backendRefs:
        - name: user-service-write
          port: 9090

This lets you route read and write operations to different backends, apply different rate limits per method, or direct specific RPCs to canary deployments.

Sidecar-Free Service Mesh

Nantian Gateway implements the Gateway API Mesh model, which provides mesh capabilities without injecting sidecars into every pod.

How It Works

Instead of a sidecar per pod, the mesh model uses a shared gateway proxy that handles inter-service traffic. This reduces resource consumption (no per-pod proxy overhead), eliminates sidecar lifecycle ordering issues, and simplifies debugging.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: mesh-gateway
spec:
  gatewayClassName: nantian-mesh
  listeners:
    - name: mesh-http
      port: 8080
      protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: service-a-to-service-b
spec:
  parentRefs:
    - name: mesh-gateway
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /
      backendRefs:
        - name: service-b
          port: 8080
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            add:
              - name: X-Routed-By
                value: nantian-mesh

When to Use This Pattern

The sidecar-free mesh model works well when:

You’re running in resource-constrained environments where per-pod proxies would consume too much CPU and memory
You have a large number of short-lived pods (batch jobs, serverless functions) where sidecar injection adds unacceptable startup latency
You want mesh features (header modification, traffic splitting, redirects) without the full complexity of a mesh control plane
You’re already using Nantian Gateway as an ingress gateway and want to extend it for east-west traffic

Wasm-Powered Custom Extensions

When built-in filters aren’t enough, Nantian Gateway’s Wasm plugin system lets you inject custom logic at the proxy level.

Use Cases for Wasm Plugins

Custom authentication: Validate JWTs against an internal identity provider, check API keys against a custom database, or implement request signing that doesn’t match any standard.

Request transformation: Reshape JSON payloads, convert between API versions, or enrich requests with data from internal services before forwarding to backends.

Custom observability: Emit metrics or traces in formats specific to your observability stack, sample requests based on custom criteria, or log structured data that your existing tooling expects.

Compliance and governance: Enforce data residency rules by inspecting request content, block requests to disallowed domains, or inject compliance headers.

Plugin Lifecycle

Wasm plugins are distributed as .wasm binaries and managed through the WasmPlugin CRD:

apiVersion: gateway.nantian.io/v1alpha1
kind: WasmPlugin
metadata:
  name: custom-auth
spec:
  pluginURL: oci://registry.example.com/plugins/auth:v1.2.0
  phase: OnRequestHeaders
  selector:
    matchLabels:
      app: payment-service

The plugin runs inside the data plane’s wasmtime runtime, with access to host functions for logging, metrics, and HTTP dispatch. Plugins are sandboxed and cannot access the host filesystem or network unless explicitly granted.

SDK and Development

Nantian Gateway ships a Wasm SDK (Rust crate: ntgw-wasm-sdk) that provides typed bindings for plugin development. You write your plugin in Rust, compile to wasm32-wasi, and deploy through the CRD.

High-Performance Edge with Observability

For latency-sensitive workloads, Nantian Gateway’s Rust data plane provides predictable performance with low tail latency. Combined with the observability stack, you get full visibility into gateway behavior.

Metrics That Matter

The gateway exposes Prometheus metrics covering:

Request rate, latency (p50/p90/p99), and error rate per route and backend
TLS handshake duration and certificate expiry
xDS configuration staleness and update latency
AI gateway token usage and rate limit status
Wasm plugin execution time and error counts

Production Deployment Pattern

A typical production deployment runs:

Two or more control plane replicas with leader election for high availability
Multiple data plane replicas behind a Kubernetes Service of type LoadBalancer
Horizontal Pod Autoscaling on the data plane based on CPU and memory
Prometheus scraping both control plane and data plane metrics endpoints
Grafana dashboards for cluster-level visibility
Alertmanager rules for gateway health, TLS expiry, and error rate thresholds

See the Production Installation guide for a complete walkthrough.

Combining Use Cases

Nantian Gateway’s strength is that these use cases compose. A single deployment can:

Serve as your primary Kubernetes ingress gateway
Route AI traffic to multiple providers with token-based rate limiting
Proxy TCP connections to databases and message queues
Run Wasm plugins for custom authentication and request transformation
Expose Prometheus metrics to your existing observability stack

You don’t need separate gateways for HTTP, AI, and TCP traffic. One Nantian Gateway deployment handles all of them, configured through the same Gateway API resources and managed through the same dashboard.