Performance Tuning

This page covers configuration options that affect throughput, latency, and resource usage in the data plane. The default values are appropriate for moderate traffic volumes (hundreds to low thousands of requests per second). For higher throughput scenarios or resource-constrained environments, the parameters described here enable tradeoffs between performance, memory consumption, and operational safety.

Threading and Concurrency

The data plane uses an async runtime built on Tokio. The following settings control how work is distributed across threads:

Parameter	Type	Default	Description
`runtimeTuning.httpCapacity.workerThreads`	int	`0`	Worker thread count (0 = CPU count)
`runtimeTuning.httpCapacity.acceptConcurrency`	int	`16`	Concurrent TCP accept operations

Worker Threads

The default value of 0 causes the runtime to create one worker thread per available CPU core. This configuration is optimal for CPU-bound workloads where the proxy performs significant processing: TLS termination, header manipulation, and Wasm filter execution all benefit from parallel CPU access.

For I/O-bound workloads where most time is spent waiting for upstream responses, reducing the worker count frees CPU resources for other processes. Set this to a value lower than the CPU count when the proxy primarily forwards requests with minimal transformation.

Accept Concurrency

acceptConcurrency controls how many threads can simultaneously invoke accept() on the listener socket. The default of 16 handles connection bursts effectively. Increase this value for workloads with very high connection rates (tens of thousands of new connections per second). Decrease if profiling reveals accept-related contention.

Connection Pools

Reusing connections to upstream services avoids the overhead of TCP handshakes and TLS negotiation on every request. Connection pooling is critical for latency-sensitive workloads.

Parameter	Type	Default	Description
`runtimeTuning.httpCapacity.upstreamKeepalivePoolSize`	int	`32768`	Maximum idle connections in the upstream pool

The upstream keepalive pool maintains idle connections to backend services. When the data plane needs to forward a request, it checks the pool for an existing connection before establishing a new one. Each idle connection consumes a small amount of memory (several kilobytes for socket buffers), so the memory cost scales linearly with pool size.

Buffer Sizes

Buffer sizes directly affect memory usage and throughput for proxied data:

Parameter	Type	Default	Description
`runtimeTuning.buffer.limit`	int	`65536`	Maximum buffer size in bytes
`runtimeTuning.buffer.initialCapacity`	int	`8192`	Initial buffer capacity in bytes

The buffer limit caps the maximum amount of data buffered for a single request or response. Increase this for workloads with large payloads (file uploads, streaming responses). The memory impact is limit × concurrent_requests, so raising the limit increases peak memory proportionally.

initialCapacity sets the starting buffer size. Setting this close to the expected average payload size reduces allocation overhead by avoiding repeated buffer growth. For APIs with predictable response sizes (JSON APIs with known schemas), matching the initial capacity to the typical payload size improves throughput.

Timeouts

Timeout configuration prevents resource leaks and ensures the proxy does not hold connections indefinitely:

Parameter	Type	Default	Description
`runtimeTuning.timeout.requestHeader`	duration	`"10s"`	Maximum time to receive request headers
`runtimeTuning.timeout.requestBody`	duration	`"30s"`	Maximum time to receive the request body
`runtimeTuning.timeout.responseHeader`	duration	`"60s"`	Maximum time to receive upstream response headers
`runtimeTuning.timeout.responseBody`	duration	`"0s"`	Maximum time to receive the full upstream response (0 = no limit)
`runtimeTuning.timeout.idleConnection`	duration	`"300s"`	Maximum idle time before closing a client connection

Setting responseBody to 0 disables the body timeout, which is appropriate for long-lived connections such as WebSocket or SSE streams. For standard HTTP APIs, set this to a reasonable value to prevent stalled upstream connections from consuming proxy resources.

TLS Tuning

TLS configuration affects both handshake latency and CPU consumption:

Parameter	Type	Default	Description
`runtimeTuning.tls.sessionCacheSize`	int	`2048`	TLS session cache entries for resumption
`runtimeTuning.tls.sessionCacheLifetime`	duration	`"10m"`	Lifetime of cached TLS sessions

TLS session caching enables session resumption, which reduces the CPU cost of repeated handshakes by reusing previously negotiated parameters. This is particularly beneficial when many clients establish short-lived connections to the same backend services.

Increase sessionCacheSize when the data plane connects to a large number of distinct backends that support session resumption. Each cache entry consumes a small amount of memory.

Profiling and Monitoring

Before adjusting tuning parameters, establish a performance baseline using the available metrics:

Latency distribution — track p50, p95, and p99 response times via the data plane metrics endpoint
Error rates — monitor HTTP 5xx responses and connection errors
Resource usage — observe CPU and memory consumption at current traffic levels
Connection pool efficiency — track upstream_keepalive_pool_hits vs upstream_keepalive_pool_misses

Make parameter changes incrementally, measuring the impact on each metric before proceeding to the next adjustment. Document the baseline and each change for future reference.

Next Steps

Observability — configure metrics, logs, and traces for monitoring
Metrics Reference — detailed metric descriptions
Production Deployment — production resource sizing guidance