Performance Tuning
This page covers configuration options that affect throughput, latency, and resource usage in the data plane. The default values are appropriate for moderate traffic volumes (hundreds to low thousands of requests per second). For higher throughput scenarios or resource-constrained environments, the parameters described here enable tradeoffs between performance, memory consumption, and operational safety.
Threading and Concurrency
Section titled “Threading and Concurrency”The data plane uses an async runtime built on Tokio. The following settings control how work is distributed across threads:
| Parameter | Type | Default | Description |
|---|---|---|---|
runtimeTuning.httpCapacity.workerThreads | int | 0 | Worker thread count (0 = CPU count) |
runtimeTuning.httpCapacity.acceptConcurrency | int | 16 | Concurrent TCP accept operations |
Worker Threads
Section titled “Worker Threads”The default value of 0 causes the runtime to create one worker thread per available CPU core. This configuration is optimal for CPU-bound workloads where the proxy performs significant processing: TLS termination, header manipulation, and Wasm filter execution all benefit from parallel CPU access.
For I/O-bound workloads where most time is spent waiting for upstream responses, reducing the worker count frees CPU resources for other processes. Set this to a value lower than the CPU count when the proxy primarily forwards requests with minimal transformation.
Accept Concurrency
Section titled “Accept Concurrency”acceptConcurrency controls how many threads can simultaneously invoke accept() on the listener socket. The default of 16 handles connection bursts effectively. Increase this value for workloads with very high connection rates (tens of thousands of new connections per second). Decrease if profiling reveals accept-related contention.
Connection Pools
Section titled “Connection Pools”Reusing connections to upstream services avoids the overhead of TCP handshakes and TLS negotiation on every request. Connection pooling is critical for latency-sensitive workloads.
| Parameter | Type | Default | Description |
|---|---|---|---|
runtimeTuning.httpCapacity.upstreamKeepalivePoolSize | int | 32768 | Maximum idle connections in the upstream pool |
The upstream keepalive pool maintains idle connections to backend services. When the data plane needs to forward a request, it checks the pool for an existing connection before establishing a new one. Each idle connection consumes a small amount of memory (several kilobytes for socket buffers), so the memory cost scales linearly with pool size.
Buffer Sizes
Section titled “Buffer Sizes”Buffer sizes directly affect memory usage and throughput for proxied data:
| Parameter | Type | Default | Description |
|---|---|---|---|
runtimeTuning.buffer.limit | int | 65536 | Maximum buffer size in bytes |
runtimeTuning.buffer.initialCapacity | int | 8192 | Initial buffer capacity in bytes |
The buffer limit caps the maximum amount of data buffered for a single request or response. Increase this for workloads with large payloads (file uploads, streaming responses). The memory impact is limit × concurrent_requests, so raising the limit increases peak memory proportionally.
initialCapacity sets the starting buffer size. Setting this close to the expected average payload size reduces allocation overhead by avoiding repeated buffer growth. For APIs with predictable response sizes (JSON APIs with known schemas), matching the initial capacity to the typical payload size improves throughput.
Timeouts
Section titled “Timeouts”Timeout configuration prevents resource leaks and ensures the proxy does not hold connections indefinitely:
| Parameter | Type | Default | Description |
|---|---|---|---|
runtimeTuning.timeout.requestHeader | duration | "10s" | Maximum time to receive request headers |
runtimeTuning.timeout.requestBody | duration | "30s" | Maximum time to receive the request body |
runtimeTuning.timeout.responseHeader | duration | "60s" | Maximum time to receive upstream response headers |
runtimeTuning.timeout.responseBody | duration | "0s" | Maximum time to receive the full upstream response (0 = no limit) |
runtimeTuning.timeout.idleConnection | duration | "300s" | Maximum idle time before closing a client connection |
Setting responseBody to 0 disables the body timeout, which is appropriate for long-lived connections such as WebSocket or SSE streams. For standard HTTP APIs, set this to a reasonable value to prevent stalled upstream connections from consuming proxy resources.
TLS Tuning
Section titled “TLS Tuning”TLS configuration affects both handshake latency and CPU consumption:
| Parameter | Type | Default | Description |
|---|---|---|---|
runtimeTuning.tls.sessionCacheSize | int | 2048 | TLS session cache entries for resumption |
runtimeTuning.tls.sessionCacheLifetime | duration | "10m" | Lifetime of cached TLS sessions |
TLS session caching enables session resumption, which reduces the CPU cost of repeated handshakes by reusing previously negotiated parameters. This is particularly beneficial when many clients establish short-lived connections to the same backend services.
Increase sessionCacheSize when the data plane connects to a large number of distinct backends that support session resumption. Each cache entry consumes a small amount of memory.
Profiling and Monitoring
Section titled “Profiling and Monitoring”Before adjusting tuning parameters, establish a performance baseline using the available metrics:
- Latency distribution — track p50, p95, and p99 response times via the data plane metrics endpoint
- Error rates — monitor HTTP 5xx responses and connection errors
- Resource usage — observe CPU and memory consumption at current traffic levels
- Connection pool efficiency — track
upstream_keepalive_pool_hitsvsupstream_keepalive_pool_misses
Make parameter changes incrementally, measuring the impact on each metric before proceeding to the next adjustment. Document the baseline and each change for future reference.
Next Steps
Section titled “Next Steps”- Observability — configure metrics, logs, and traces for monitoring
- Metrics Reference — detailed metric descriptions
- Production Deployment — production resource sizing guidance