Skip to content

Performance Tuning

This page covers configuration options that affect throughput, latency, and resource usage in the data plane. The default values are appropriate for moderate traffic volumes (hundreds to low thousands of requests per second). For higher throughput scenarios or resource-constrained environments, the parameters described here enable tradeoffs between performance, memory consumption, and operational safety.

The data plane uses an async runtime built on Tokio. The following settings control how work is distributed across threads:

ParameterTypeDefaultDescription
runtimeTuning.httpCapacity.workerThreadsint0Worker thread count (0 = CPU count)
runtimeTuning.httpCapacity.acceptConcurrencyint16Concurrent TCP accept operations

The default value of 0 causes the runtime to create one worker thread per available CPU core. This configuration is optimal for CPU-bound workloads where the proxy performs significant processing: TLS termination, header manipulation, and Wasm filter execution all benefit from parallel CPU access.

For I/O-bound workloads where most time is spent waiting for upstream responses, reducing the worker count frees CPU resources for other processes. Set this to a value lower than the CPU count when the proxy primarily forwards requests with minimal transformation.

acceptConcurrency controls how many threads can simultaneously invoke accept() on the listener socket. The default of 16 handles connection bursts effectively. Increase this value for workloads with very high connection rates (tens of thousands of new connections per second). Decrease if profiling reveals accept-related contention.

Reusing connections to upstream services avoids the overhead of TCP handshakes and TLS negotiation on every request. Connection pooling is critical for latency-sensitive workloads.

ParameterTypeDefaultDescription
runtimeTuning.httpCapacity.upstreamKeepalivePoolSizeint32768Maximum idle connections in the upstream pool

The upstream keepalive pool maintains idle connections to backend services. When the data plane needs to forward a request, it checks the pool for an existing connection before establishing a new one. Each idle connection consumes a small amount of memory (several kilobytes for socket buffers), so the memory cost scales linearly with pool size.

Buffer sizes directly affect memory usage and throughput for proxied data:

ParameterTypeDefaultDescription
runtimeTuning.buffer.limitint65536Maximum buffer size in bytes
runtimeTuning.buffer.initialCapacityint8192Initial buffer capacity in bytes

The buffer limit caps the maximum amount of data buffered for a single request or response. Increase this for workloads with large payloads (file uploads, streaming responses). The memory impact is limit × concurrent_requests, so raising the limit increases peak memory proportionally.

initialCapacity sets the starting buffer size. Setting this close to the expected average payload size reduces allocation overhead by avoiding repeated buffer growth. For APIs with predictable response sizes (JSON APIs with known schemas), matching the initial capacity to the typical payload size improves throughput.

Timeout configuration prevents resource leaks and ensures the proxy does not hold connections indefinitely:

ParameterTypeDefaultDescription
runtimeTuning.timeout.requestHeaderduration"10s"Maximum time to receive request headers
runtimeTuning.timeout.requestBodyduration"30s"Maximum time to receive the request body
runtimeTuning.timeout.responseHeaderduration"60s"Maximum time to receive upstream response headers
runtimeTuning.timeout.responseBodyduration"0s"Maximum time to receive the full upstream response (0 = no limit)
runtimeTuning.timeout.idleConnectionduration"300s"Maximum idle time before closing a client connection

Setting responseBody to 0 disables the body timeout, which is appropriate for long-lived connections such as WebSocket or SSE streams. For standard HTTP APIs, set this to a reasonable value to prevent stalled upstream connections from consuming proxy resources.

TLS configuration affects both handshake latency and CPU consumption:

ParameterTypeDefaultDescription
runtimeTuning.tls.sessionCacheSizeint2048TLS session cache entries for resumption
runtimeTuning.tls.sessionCacheLifetimeduration"10m"Lifetime of cached TLS sessions

TLS session caching enables session resumption, which reduces the CPU cost of repeated handshakes by reusing previously negotiated parameters. This is particularly beneficial when many clients establish short-lived connections to the same backend services.

Increase sessionCacheSize when the data plane connects to a large number of distinct backends that support session resumption. Each cache entry consumes a small amount of memory.

Before adjusting tuning parameters, establish a performance baseline using the available metrics:

  1. Latency distribution — track p50, p95, and p99 response times via the data plane metrics endpoint
  2. Error rates — monitor HTTP 5xx responses and connection errors
  3. Resource usage — observe CPU and memory consumption at current traffic levels
  4. Connection pool efficiency — track upstream_keepalive_pool_hits vs upstream_keepalive_pool_misses

Make parameter changes incrementally, measuring the impact on each metric before proceeding to the next adjustment. Document the baseline and each change for future reference.