Infrastructure Integration¶
Configuration¶
- Configure the agent by editing
/etc/nutanix/epoch-dd-agent/conf.d/process.yaml
in the collectors.
Example:
init_config:
# the check will refresh the matching pid list every X seconds
# except if it detects a change before. You might want to set it
# low if you want to alert on process service checks.
# pid_cache_duration: 120
#
# used to override the default procfs path, e.g. for docker containers with the outside fs mounted at /host/proc
# procfs_path: /proc
instances:
# The `system.processes.cpu.pct` metric sent by this check is only accurate for processes that live
# for more than 30 seconds. Do not expect its value to be accurate for shorter-lived processes.
#
# One and only one of search_string, pid or pid_file must be specified
# - name: (required) STRING. It will be used to uniquely identify your metrics as they will be tagged with this name
# search_string: LIST OF STRINGS. If one of the elements in the list matches,
# return the counter of all the processes that contain the string
# pid: STRING. A Process id.
# pid_file: STRING. A Pid file.
# exact_match: (optional) Boolean. Default to True, if you want to look for an arbitrary
# string only use search_string, use exact_match: False
# ignore_denied_access: (optional) Boolean. Default to True, when getting the number of files descriptors, dd-agent user might
# get a denied access. Set this to true to not issue a warning if that happens.
# thresholds: (optional) Two ranges: critical and warning
# warning: (optional) List of two values: If the number of processes found is below the first value or
# above the second one, the process check will return WARNING.
# critical: (optional) List of two values: If the number of processes found is below the first value or
# above the second one, the process check will return CRITICAL.
# In this example, process check will return OK for 3 to 5 process. WARNING for 1, 2, 6, 7 processes and Critical below 1 or above 7.
# CRITICAL is always dominant in case of overlapping.
# collect_children: BOOLEAN. If true, the check will also collect metrics from all child processes of a matched process. Default to false.
# Please be aware that the collection is recursive, and might take some time depending on the use case.
#
# Examples:
#
# - name: ssh
# search_string: ['ssh', 'sshd']
# tags:
# - env:staging
# - cluster:big-data
# thresholds:
# critical if no sshd or more than 8 sshd are running
# critical: [1, 7]
# warning if 1, 2, 6, 7 sshd processes are running
# warning: [3, 5]
# ok if 3, 4, 5 processes are running
#
# - name: postgres
# search_string: ['postgres']
# ignore_denied_access: True
#
# - name: nodeserver
# search_string: ['node server.js']
#
# - name: pid_process
# pid: 1278
# Do not use search_string when searching by pid or multiple processes will be grabbed
#
# - name: pid_file
# pid_file: /var/run/sshd.pid
-
Check and make sure that all yaml files are valid with following command:
/etc/init.d/epoch-collectors configcheck
-
Restart the Agent using the following command:
/etc/init.d/epoch-collectors restart
-
Execute the info command to verify that the integration check has passed:
/etc/init.d/epoch-collectors info
Infrastructure Datasources¶
Datasource | Available Aggregations | Unit | Description |
---|---|---|---|
system.processes.cpu.pct | avg max min sum |
percent | The process CPU utilization. |
system.processes.involuntary_ctx_switches | avg max min sum |
event | The number of involuntary context switches performed by this process. |
system.processes.ioread_bytes | avg max min sum |
byte | The number of bytes read from disk by this process. |
system.processes.ioread_count | avg max min sum |
read | The number of disk reads by this process. |
system.processes.iowrite_bytes | avg max min sum |
byte | The number of bytes written to disk by this process. |
system.processes.iowrite_count | avg max min sum |
write | The number of disk writes by this process. |
system.processes.mem.page_faults.minor_faults | avg max min sum |
occurrence/second | The number of minor page faults per second for this process. |
system.processes.mem.page_faults.children_minor_faults | avg max min sum |
occurrence/second | The number of minor page faults per second for children of this process. |
system.processes.mem.page_faults.major_faults | avg max min sum |
occurrence/second | The number of major page faults per second for this process. |
system.processes.mem.page_faults.children_major_faults | avg max min sum |
occurrence/second | The number of major page faults per second for children of this process. |
system.processes.mem.pct | avg max min sum |
percent | The process memory consumption. |
system.processes.mem.real | avg max min sum |
byte | The non-swapped physical memory a process has used and cannot be shared with another process. |
system.processes.mem.rss | avg max min sum |
byte | The non-swapped physical memory a process has used. aka "Resident Set Size". |
system.processes.mem.vms | avg max min sum |
byte | The total amount of virtual memory used by the process. aka "Virtual Memory Size". |
system.processes.number | avg max min sum |
process | The number of processes. |
system.processes.open_file_descriptors | avg max min sum |
The number of file descriptors used by this process. | |
system.processes.open_handles | avg max min sum |
The number of handles used by this process. | |
system.processes.threads | avg max min sum |
thread | The number of threads used by this process. |
system.processes.voluntary_ctx_switches | avg max min sum |
event | The number of voluntary context switches performed by this process. |