Skip to content

Overheads

Understanding and Managing Collector Overhead

Epoch's collectors are designed to minimize the overhead while collecting realtime metrics. This guide explains the techiques and tradeoffs for managing that overhead.

Functional View of Overhead

The following is the functional view of overhead in Epoch Collectors:

  • Packet capture is achieved with the rpcap or sslsplit component. It has minimal CPU overhead and must be co-located with the host. Typical overhead is ~1% of CPU runtime on average and 5-10 MB of RAM.
  • Stream processing has the maximum CPU and memory overhead (when doing layer7 analysis) but can be offloaded to another host. For layer7 protocol analysis, typical overhead is 3-8% of CPU runtime on average and 200-600 MB of RAM. For layer4 protocol analysis, typical overhead is less than 0.25% of CPU and less than 50 MB of RAM.
  • Infrastructure metrics collection has a predictable and recurring overhead of about 1% of CPU runtime and ~120MB of RAM. This has to run co-located with the host.

Default Overhead

With the default collector settings (layer4 protocol analysis mode), the typical overhead is 1-2% of CPU time and 150-200 MB of RAM.

With the layer7 protocol analysis settings all the functional overhead is on the host without any sampling or resource limits. Typical workloads incur on average 5-10 % of CPU time for the Collectors. The actual overhead depends on the throughput of network transactions being processed. The typical memory overhead is 300-700 MB.

The outgoing network bandwidth is between 5-20 KBps. The exact number depends on the number of unique series.

Controlling Overhead

The following techniques for controlling overhead can be used alone or in conjunction with one another.

Sampling Rate

This parameter controls the sampling of network traffic at the TCP flow level for the layer7 analysis. This parameter does not impact the layer4 analysis. The sampling parameter is provided as a percentage of total traffic (1-100), so a sampling rate of 50 would sample 50% of the flows. By default, the sampling is turned off. The resulting metrics are normalized based on the sampling rate. This parameter can significantly reduce CPU overhead but the resulting tradeoff is a lower fidelity of protocol metrics. For more information, see Sampling Rate

Nice Value

Run the collector under the standard Unix/Linux nice command with the value specified. Nice ensures that processes with higher priority get a larger chunk of the CPU time than a lower priority process. For more information, see Nice Value

OS Resource Limits

Overhead can also be limited using OS resource limits. The following are examples for limiting resources in some of the environments. All the examples limit the collector’s CPU to a maximum of 50% of a single CPU and to a maximum of 1024 MB of memory.

Docker

Before running your collector container, set the following parameters.

--cpu-period=100000 --cpu-quota=50000 --memory=1024m

If you use Docker 1.13 or higher, use --cpus instead.

--cpus=0.5
Kubernetes

Before deploying your collector container, set the following parameter in your collector manifest.

resources:
  requests:
    memory: "512Mi"
    cpu: "0.5"
  limits:
    memory: "1024Mi"
    cpu: "1.0"

Remote Stream Processing

The CPU and memory overhead is much smaller if stream processing is done outside the host where the collector is running. The outgoing bandwidth is higher is this mode but does not incur any overhead if stream processor(s) are running in the same VPC as the collectors. For more information, see standalone stream processor.