Skip to content

Custom CRDs

Nantian Gateway defines four custom resources (CRDs) on top of the standard Gateway API, extending capabilities for AI service management, token quota control, Wasm plugin injection, and custom load balancing. All CRDs are under the gateway.nantian.io/v1alpha1 API group.

CRDPurposeScope
AIServiceManage connection configuration for AI model providers, including authentication, model routing, provider selection, and timeout controlNamespaced
TokenPolicyRate-limit and quota-control AI API calls based on token countNamespaced
WasmPluginBind Wasm extensions at the listener or route level to inject custom request/response processing logicNamespaced
BackendLBPolicyConfigure backend load balancing strategy and session persistenceNamespaced

AIService manages the configuration for connecting to AI model providers. It defines the provider type, API endpoint, supported model list, and authentication method. A single AIService can contain multiple backends, each corresponding to an AI provider or model.

FieldTypeRequiredDescription
spec.backends[]BackendYesList of AI backends
spec.backends[].namestringYesBackend name, used for referencing
spec.backends[].providerstringYesProvider type: openai, anthropic, ollama
spec.backends[].endpointstringYesAPI endpoint URL
spec.backends[].models[]stringNoList of models supported by this backend
spec.backends[].authAuthConfigNoAuthentication configuration
spec.backends[].auth.typestringNoAuth type: bearer, api_key
spec.backends[].auth.secretRefstringNoReferenced Kubernetes Secret name
spec.backends[].auth.headerstringNoHTTP header name carrying auth credentials, default Authorization
spec.backends[].timeoutDurationNoRequest timeout
apiVersion: gateway.nantian.io/v1alpha1
kind: AIService
metadata:
name: llm-routing
namespace: default
spec:
backends:
- name: openai-gpt4
provider: openai
endpoint: https://api.openai.com/v1
models:
- gpt-4
- gpt-4-turbo
- gpt-4o
auth:
type: bearer
secretRef: openai-api-key
header: Authorization
timeout: 60s
- name: anthropic-claude
provider: anthropic
endpoint: https://api.anthropic.com
models:
- claude-3-opus
- claude-3-sonnet
- claude-3-haiku
auth:
type: api_key
secretRef: anthropic-api-key
header: x-api-key
timeout: 90s
- name: ollama-local
provider: ollama
endpoint: http://ollama.internal:11434
models:
- llama3
- mistral
- codellama

When an AIService is created, the control plane translates it into an AIServiceConfig in the internal IR. After the data plane receives the configuration, it enables AI protocol translation and model routing capabilities within the proxy process. Applications simply send requests to the gateway’s AI endpoint. The gateway automatically routes to the appropriate provider based on the model specified in the request and performs protocol format translation.

TokenPolicy enforces rate limiting on AI API calls based on actual token usage. Unlike traditional request-count-based rate limiting, TokenPolicy counts the number of tokens in each request and response and enforces limits based on total token volume.

FieldTypeRequiredDescription
spec.rules[]RuleYesList of rate limit rules
spec.rules[].selectormap[string]stringNoLabel selector to match request sources
spec.rules[].limit.tokensPerMinuteuint64NoMaximum tokens per minute
spec.rules[].limit.tokensPerHouruint64NoMaximum tokens per hour
spec.rules[].limit.tokensPerDayuint64NoMaximum tokens per day
spec.rules[].limit.requestsPerMinuteuint64NoMaximum requests per minute
spec.rules[].limit.burstfloat64NoBurst multiplier, allows temporary over-limit
spec.rules[].limit.onLimitstringNoAction on limit: reject or queue
apiVersion: gateway.nantian.io/v1alpha1
kind: TokenPolicy
metadata:
name: team-quotas
namespace: default
spec:
rules:
- selector:
team: engineering
limit:
tokensPerMinute: 100000
tokensPerHour: 5000000
tokensPerDay: 50000000
requestsPerMinute: 1000
burst: 1.5
onLimit: reject
- selector:
team: marketing
limit:
tokensPerMinute: 50000
tokensPerHour: 2000000
tokensPerDay: 10000000
requestsPerMinute: 500
burst: 1.2
onLimit: reject

The data plane counts tokens on every AI API request and response. Token counts are based on actual usage returned by the AI provider, not estimates. When a source’s token usage exceeds the limits configured in the TokenPolicy, the gateway rejects the request before it reaches the provider, returning an HTTP 429 status code.

WasmPlugin manages the loading and binding of WebAssembly extensions. Plugins are distributed as .wasm module files. The WasmPlugin CRD declares which request lifecycle hooks should execute plugin logic.

FieldTypeRequiredDescription
spec.wasmBytes[]byteNoWasm module byte content (inline mode)
spec.wasmRefstringNoReferenced Wasm module (OCI image or URL)
spec.sha256stringNoSHA256 checksum of the Wasm module
spec.hooks[]stringYesList of execution hooks
spec.configstringNoConfiguration passed to the plugin (JSON string)
spec.sandbox.maxMemoryBytesuint64NoSandbox maximum memory limit
spec.sandbox.maxExecutionTimeMsuint64NoMaximum execution time per invocation
spec.sandbox.allowNetworkboolNoWhether the plugin is allowed network access
spec.sandbox.allowFileSystemboolNoWhether the plugin is allowed filesystem access
HookPhaseDescription
on_request_headersRequestExecutes after request headers are received, before route matching
on_request_bodyRequestExecutes after the full request body is received
on_response_headersResponseExecutes after backend response headers are received
on_response_bodyResponseExecutes after the full response body is received, before sending to the client
apiVersion: gateway.nantian.io/v1alpha1
kind: WasmPlugin
metadata:
name: pii-sanitizer
namespace: default
spec:
wasmRef: oci://registry.example.com/plugins/pii-sanitizer:v1.2.0
sha256: "sha256:abc123def456..."
hooks:
- on_request_body
- on_response_body
config: |
{
"patterns": ["email", "phone", "credit_card"],
"replacement": "[REDACTED]"
}
sandbox:
maxMemoryBytes: 67108864
maxExecutionTimeMs: 100
allowNetwork: false
allowFileSystem: false

WasmPlugin can be bound to traffic in the following ways:

  • Listener level: Specified in Gateway extension references, affecting all traffic on that listener
  • Route level: Referenced in HTTPRoute or GRPCRoute filter extensions, affecting only traffic matching that route
  • Backend level: Referenced in BackendRef filter extensions, affecting only traffic to a specific backend

BackendLBPolicy configures backend service load balancing strategy and session persistence behavior.

FieldTypeRequiredDescription
spec.loadBalancing.typestringNoLoad balancing algorithm: RoundRobin (default), ConsistentHash
spec.loadBalancing.consistentHash.keyTypestringNoConsistent hash key type: SourceIP, Header, Hostname
spec.loadBalancing.consistentHash.headerNamestringNoHeader name when keyType is Header
spec.sessionPersistence.typestringNoSession persistence type: Cookie, Header
spec.sessionPersistence.sessionNamestringNoSession identifier name
spec.sessionPersistence.absoluteTimeoutDurationNoSession absolute timeout
spec.sessionPersistence.idleTimeoutDurationNoSession idle timeout
apiVersion: gateway.nantian.io/v1alpha1
kind: BackendLBPolicy
metadata:
name: sticky-sessions
namespace: default
spec:
loadBalancing:
type: ConsistentHash
consistentHash:
keyType: Header
headerName: x-session-id
sessionPersistence:
type: Cookie
sessionName: NANTIAN_SESSION
absoluteTimeout: 3600s
idleTimeout: 600s

BackendLBPolicy can be referenced in HTTPRoute or GRPCRoute backend references. The control plane translates it into the LoadBalancingPolicy and SessionPersistence fields of the BackendCluster configuration, and the data plane applies these policies when proxying requests.

  • See Admin API for runtime diagnostics endpoints
  • See xDS Protocol for the control plane/data plane communication protocol
  • See Use Cases for real-world AI gateway usage examples