Custom CRDs
Nantian Gateway defines four custom resources (CRDs) on top of the standard Gateway API, extending capabilities for AI service management, token quota control, Wasm plugin injection, and custom load balancing. All CRDs are under the gateway.nantian.io/v1alpha1 API group.
Overview
Section titled “Overview”| CRD | Purpose | Scope |
|---|---|---|
AIService | Manage connection configuration for AI model providers, including authentication, model routing, provider selection, and timeout control | Namespaced |
TokenPolicy | Rate-limit and quota-control AI API calls based on token count | Namespaced |
WasmPlugin | Bind Wasm extensions at the listener or route level to inject custom request/response processing logic | Namespaced |
BackendLBPolicy | Configure backend load balancing strategy and session persistence | Namespaced |
AIService
Section titled “AIService”AIService manages the configuration for connecting to AI model providers. It defines the provider type, API endpoint, supported model list, and authentication method. A single AIService can contain multiple backends, each corresponding to an AI provider or model.
Field Reference
Section titled “Field Reference”| Field | Type | Required | Description |
|---|---|---|---|
spec.backends | []Backend | Yes | List of AI backends |
spec.backends[].name | string | Yes | Backend name, used for referencing |
spec.backends[].provider | string | Yes | Provider type: openai, anthropic, ollama |
spec.backends[].endpoint | string | Yes | API endpoint URL |
spec.backends[].models | []string | No | List of models supported by this backend |
spec.backends[].auth | AuthConfig | No | Authentication configuration |
spec.backends[].auth.type | string | No | Auth type: bearer, api_key |
spec.backends[].auth.secretRef | string | No | Referenced Kubernetes Secret name |
spec.backends[].auth.header | string | No | HTTP header name carrying auth credentials, default Authorization |
spec.backends[].timeout | Duration | No | Request timeout |
Example
Section titled “Example”apiVersion: gateway.nantian.io/v1alpha1kind: AIServicemetadata: name: llm-routing namespace: defaultspec: backends: - name: openai-gpt4 provider: openai endpoint: https://api.openai.com/v1 models: - gpt-4 - gpt-4-turbo - gpt-4o auth: type: bearer secretRef: openai-api-key header: Authorization timeout: 60s - name: anthropic-claude provider: anthropic endpoint: https://api.anthropic.com models: - claude-3-opus - claude-3-sonnet - claude-3-haiku auth: type: api_key secretRef: anthropic-api-key header: x-api-key timeout: 90s - name: ollama-local provider: ollama endpoint: http://ollama.internal:11434 models: - llama3 - mistral - codellamaHow It Works
Section titled “How It Works”When an AIService is created, the control plane translates it into an AIServiceConfig in the internal IR. After the data plane receives the configuration, it enables AI protocol translation and model routing capabilities within the proxy process. Applications simply send requests to the gateway’s AI endpoint. The gateway automatically routes to the appropriate provider based on the model specified in the request and performs protocol format translation.
TokenPolicy
Section titled “TokenPolicy”TokenPolicy enforces rate limiting on AI API calls based on actual token usage. Unlike traditional request-count-based rate limiting, TokenPolicy counts the number of tokens in each request and response and enforces limits based on total token volume.
Field Reference
Section titled “Field Reference”| Field | Type | Required | Description |
|---|---|---|---|
spec.rules | []Rule | Yes | List of rate limit rules |
spec.rules[].selector | map[string]string | No | Label selector to match request sources |
spec.rules[].limit.tokensPerMinute | uint64 | No | Maximum tokens per minute |
spec.rules[].limit.tokensPerHour | uint64 | No | Maximum tokens per hour |
spec.rules[].limit.tokensPerDay | uint64 | No | Maximum tokens per day |
spec.rules[].limit.requestsPerMinute | uint64 | No | Maximum requests per minute |
spec.rules[].limit.burst | float64 | No | Burst multiplier, allows temporary over-limit |
spec.rules[].limit.onLimit | string | No | Action on limit: reject or queue |
Example
Section titled “Example”apiVersion: gateway.nantian.io/v1alpha1kind: TokenPolicymetadata: name: team-quotas namespace: defaultspec: rules: - selector: team: engineering limit: tokensPerMinute: 100000 tokensPerHour: 5000000 tokensPerDay: 50000000 requestsPerMinute: 1000 burst: 1.5 onLimit: reject - selector: team: marketing limit: tokensPerMinute: 50000 tokensPerHour: 2000000 tokensPerDay: 10000000 requestsPerMinute: 500 burst: 1.2 onLimit: rejectHow It Works
Section titled “How It Works”The data plane counts tokens on every AI API request and response. Token counts are based on actual usage returned by the AI provider, not estimates. When a source’s token usage exceeds the limits configured in the TokenPolicy, the gateway rejects the request before it reaches the provider, returning an HTTP 429 status code.
WasmPlugin
Section titled “WasmPlugin”WasmPlugin manages the loading and binding of WebAssembly extensions. Plugins are distributed as .wasm module files. The WasmPlugin CRD declares which request lifecycle hooks should execute plugin logic.
Field Reference
Section titled “Field Reference”| Field | Type | Required | Description |
|---|---|---|---|
spec.wasmBytes | []byte | No | Wasm module byte content (inline mode) |
spec.wasmRef | string | No | Referenced Wasm module (OCI image or URL) |
spec.sha256 | string | No | SHA256 checksum of the Wasm module |
spec.hooks | []string | Yes | List of execution hooks |
spec.config | string | No | Configuration passed to the plugin (JSON string) |
spec.sandbox.maxMemoryBytes | uint64 | No | Sandbox maximum memory limit |
spec.sandbox.maxExecutionTimeMs | uint64 | No | Maximum execution time per invocation |
spec.sandbox.allowNetwork | bool | No | Whether the plugin is allowed network access |
spec.sandbox.allowFileSystem | bool | No | Whether the plugin is allowed filesystem access |
Supported Lifecycle Hooks
Section titled “Supported Lifecycle Hooks”| Hook | Phase | Description |
|---|---|---|
on_request_headers | Request | Executes after request headers are received, before route matching |
on_request_body | Request | Executes after the full request body is received |
on_response_headers | Response | Executes after backend response headers are received |
on_response_body | Response | Executes after the full response body is received, before sending to the client |
Example
Section titled “Example”apiVersion: gateway.nantian.io/v1alpha1kind: WasmPluginmetadata: name: pii-sanitizer namespace: defaultspec: wasmRef: oci://registry.example.com/plugins/pii-sanitizer:v1.2.0 sha256: "sha256:abc123def456..." hooks: - on_request_body - on_response_body config: | { "patterns": ["email", "phone", "credit_card"], "replacement": "[REDACTED]" } sandbox: maxMemoryBytes: 67108864 maxExecutionTimeMs: 100 allowNetwork: false allowFileSystem: falseBinding Methods
Section titled “Binding Methods”WasmPlugin can be bound to traffic in the following ways:
- Listener level: Specified in
Gatewayextension references, affecting all traffic on that listener - Route level: Referenced in
HTTPRouteorGRPCRoutefilter extensions, affecting only traffic matching that route - Backend level: Referenced in
BackendReffilter extensions, affecting only traffic to a specific backend
BackendLBPolicy
Section titled “BackendLBPolicy”BackendLBPolicy configures backend service load balancing strategy and session persistence behavior.
Field Reference
Section titled “Field Reference”| Field | Type | Required | Description |
|---|---|---|---|
spec.loadBalancing.type | string | No | Load balancing algorithm: RoundRobin (default), ConsistentHash |
spec.loadBalancing.consistentHash.keyType | string | No | Consistent hash key type: SourceIP, Header, Hostname |
spec.loadBalancing.consistentHash.headerName | string | No | Header name when keyType is Header |
spec.sessionPersistence.type | string | No | Session persistence type: Cookie, Header |
spec.sessionPersistence.sessionName | string | No | Session identifier name |
spec.sessionPersistence.absoluteTimeout | Duration | No | Session absolute timeout |
spec.sessionPersistence.idleTimeout | Duration | No | Session idle timeout |
Example
Section titled “Example”apiVersion: gateway.nantian.io/v1alpha1kind: BackendLBPolicymetadata: name: sticky-sessions namespace: defaultspec: loadBalancing: type: ConsistentHash consistentHash: keyType: Header headerName: x-session-id sessionPersistence: type: Cookie sessionName: NANTIAN_SESSION absoluteTimeout: 3600s idleTimeout: 600sHow It Works
Section titled “How It Works”BackendLBPolicy can be referenced in HTTPRoute or GRPCRoute backend references. The control plane translates it into the LoadBalancingPolicy and SessionPersistence fields of the BackendCluster configuration, and the data plane applies these policies when proxying requests.
Next Steps
Section titled “Next Steps”- See Admin API for runtime diagnostics endpoints
- See xDS Protocol for the control plane/data plane communication protocol
- See Use Cases for real-world AI gateway usage examples