Infrastructure Integration¶
Configuration¶
- Configure the agent by editing
/etc/nutanix/epoch-dd-agent/conf.d/mesos_master.yaml
and/etc/nutanix/epoch-dd-agent/conf.d/mesos_slave.yaml
in the collectors.
Example:
On master nodes, for Mesos master's:
init_config:
default_timeout: 10
instances:
- url: "http://localhost:5050"
# The (optional) disable_ssl_validation will instruct the check
# to skip the validation of the SSL certificate of the master.
# This is mostly useful for certificates that are not signed by a
# public authority.
# When true, the check logs a warning in collector.log
# Defaults to false, set to true if you want to disable
# SSL certificate validation.
#
# disable_ssl_validation: true
On slave nodes, for Mesos slave's:
init_config:
default_timeout: 10
instances:
- url: "http://localhost:5051"
# master_port: 5050
# tasks:
# - "hello"
# The (optional) disable_ssl_validation will instruct the check
# to skip the validation of the SSL certificate of the slave metric endpoint.
# This is mostly useful for certificates that are not signed by a
# public authority.
# When true, the check logs a warning in collector.log
# Defaults to false, set to true if you want to disable
# SSL certificate validation.
#
# disable_ssl_validation: true
-
Check and make sure that all yaml files are valid with following command:
/etc/init.d/epoch-collectors configcheck
-
Restart the Agent for both files using the following command:
/etc/init.d/epoch-collectors restart
-
Execute the info command to verify that the integration check has passed:
/etc/init.d/epoch-collectors info
The output of the info command should contain a section similar to the following:
Checks
======
[...]
mesos_master
----------
- instance #0 [OK]
- Collected 8 metrics & 0 events
[...]
mesos_slave
----------
- instance #0 [OK]
- Collected 8 metrics & 0 events
Infrastructure Datasources¶
Datasource | Available Aggregations | Unit | Description |
---|---|---|---|
mesos.framework.cpu | avg max min sum |
Framework cpu | |
mesos.framework.mem | avg max min sum |
mebibyte | Framework mem |
mesos.framework.disk | avg max min sum |
mebibyte | Framework disk |
mesos.role.cpu | avg max min sum |
Role cpu | |
mesos.role.mem | avg max min sum |
mebibyte | Role mem |
mesos.role.disk | avg max min sum |
mebibyte | Role disk |
mesos.cluster.tasks_error | avg max min sum |
task | Number of tasks that were invalid |
mesos.cluster.tasks_failed | avg max min sum |
task | Number of failed tasks |
mesos.cluster.tasks_finished | avg max min sum |
task | Number of finished tasks |
mesos.cluster.tasks_killed | avg max min sum |
task | Number of killed tasks |
mesos.cluster.tasks_lost | avg max min sum |
task | Number of lost tasks |
mesos.cluster.tasks_running | avg max min sum |
task | Number of running tasks |
mesos.cluster.tasks_staging | avg max min sum |
task | Number of staging tasks |
mesos.cluster.tasks_starting | avg max min sum |
task | Number of starting tasks |
mesos.cluster.slave_registrations | avg max min sum |
Number of slaves that were able to cleanly re-join the cluster and connect back to the master after the master is disconnected. | |
mesos.cluster.slave_removals | avg max min sum |
Number of slaves removed for various reasons, including maintenance | |
mesos.cluster.slave_reregistrations | avg max min sum |
Number of slave re-registrations | |
mesos.cluster.slave_shutdowns_canceled | avg max min sum |
Number of cancelled slave shutdowns | |
mesos.cluster.slave_shutdowns_scheduled | avg max min sum |
Number of slaves which have failed their health check and are scheduled to be removed | |
mesos.cluster.slaves_active | avg max min sum |
Number of active slaves | |
mesos.cluster.slaves_connected | avg max min sum |
Number of connected slaves | |
mesos.cluster.slaves_disconnected | avg max min sum |
Number of disconnected slaves | |
mesos.cluster.slaves_inactive | avg max min sum |
Number of inactive slaves | |
mesos.cluster.cpus_percent | avg max min sum |
percent | Percentage of allocated CPUs |
mesos.cluster.cpus_used | avg max min sum |
Number of allocated CPUs | |
mesos.cluster.cpus_total | avg max min sum |
Number of CPUs | |
mesos.cluster.disk_percent | avg max min sum |
percent | Percentage of allocated disk space |
mesos.cluster.disk_used | avg max min sum |
mebibyte | Allocated disk space |
mesos.cluster.disk_total | avg max min sum |
mebibyte | Disk space |
mesos.cluster.mem_percent | avg max min sum |
percent | Percentage of allocated memory |
mesos.cluster.mem_used | avg max min sum |
mebibyte | Allocated memory |
mesos.cluster.mem_total | avg max min sum |
mebibyte | Total memory |
mesos.registrar.queued_operations | avg max min sum |
Number of queued operations | |
mesos.registrar.registry_size_bytes | avg max min sum |
byte | Registry size |
mesos.registrar.state_fetch_ms | avg max min sum |
millisecond | Registry read latency |
mesos.registrar.state_store_ms | avg max min sum |
millisecond | Registry write latency |
mesos.registrar.state_store_ms.count | avg max min sum |
Registry write count | |
mesos.registrar.state_store_ms.max | avg max min sum |
millisecond | Maximum registry write latency |
mesos.registrar.state_store_ms.min | avg max min sum |
millisecond | Minimum registry write latency |
mesos.registrar.state_store_ms.p50 | avg max min sum |
millisecond | Median registry write latency |
mesos.registrar.state_store_ms.p90 | avg max min sum |
millisecond | 90th percentile registry write latency |
mesos.registrar.state_store_ms.p95 | avg max min sum |
millisecond | 95th percentile registry write latency |
mesos.registrar.state_store_ms.p99 | avg max min sum |
millisecond | 99th percentile registry write latency |
mesos.registrar.state_store_ms.p999 | avg max min sum |
millisecond | 99.9th percentile registry write latency |
mesos.registrar.state_store_ms.p9999 | avg max min sum |
millisecond | 99.99th percentile registry write latency |
mesos.registrar.log.recovered | avg max min sum |
Registrar log recovered | |
mesos.cluster.frameworks_active | avg max min sum |
Number of active frameworks | |
mesos.cluster.frameworks_connected | avg max min sum |
Number of connected frameworks | |
mesos.cluster.frameworks_disconnected | avg max min sum |
Number of disconnected frameworks | |
mesos.cluster.frameworks_inactive | avg max min sum |
Number of inactive frameworks | |
mesos.stats.system.cpus_total | avg max min sum |
Number of CPUs available | |
mesos.stats.system.load_15min | avg max min sum |
Load average for the past 15 minutes | |
mesos.stats.system.load_1min | avg max min sum |
Load average for the past minutes | |
mesos.stats.system.load_5min | avg max min sum |
Load average for the past 5 minutes | |
mesos.stats.system.mem_free_bytes | avg max min sum |
byte | Free memory |
mesos.stats.system.mem_total_bytes | avg max min sum |
byte | Total memory |
mesos.stats.elected | avg max min sum |
Whether this is the elected master | |
mesos.stats.uptime_secs | avg max min sum |
second | Uptime |
mesos.cluster.dropped_messages | avg max min sum |
message | Number of dropped messages |
mesos.cluster.outstanding_offers | avg max min sum |
Number of outstanding resource offers | |
mesos.cluster.event_queue_dispatches | avg max min sum |
Number of dispatches in the event queue | |
mesos.cluster.event_queue_http_requests | avg max min sum |
request | Number of HTTP requests in the event queue |
mesos.cluster.event_queue_messages | avg max min sum |
message | Number of messages in the event queue |
mesos.cluster.invalid_framework_to_executor_messages | avg max min sum |
message | Number of invalid framework messages |
mesos.cluster.invalid_status_update_acknowledgements | avg max min sum |
Number of invalid status update acknowledgements | |
mesos.cluster.invalid_status_updates | avg max min sum |
Number of invalid status updates | |
mesos.cluster.valid_framework_to_executor_messages | avg max min sum |
message | Number of valid framework messages |
mesos.cluster.valid_status_update_acknowledgements | avg max min sum |
Number of valid status update acknowledgements | |
mesos.cluster.valid_status_updates | avg max min sum |
Number of valid status updates | |
mesos.stats.registered | avg max min sum |
Whether this slave is registered with a master | |
mesos.stats.system.cpus_total | avg max min sum |
Number of CPUs available | |
mesos.stats.system.load_15min | avg max min sum |
Load average for the past 15 minutes | |
mesos.stats.system.load_1min | avg max min sum |
Load average for the past minutes | |
mesos.stats.system.load_5min | avg max min sum |
Load average for the past 5 minutes | |
mesos.stats.system.mem_free_bytes | avg max min sum |
byte | Free memory |
mesos.stats.system.mem_total_bytes | avg max min sum |
byte | Total memory |
mesos.state.task.cpu | avg max min sum |
Task cpu | |
mesos.state.task.mem | avg max min sum |
mebibyte | Task memory |
mesos.state.task.disk | avg max min sum |
mebibyte | Task disk |
mesos.slave.tasks_failed | avg max min sum |
task | Number of failed tasks |
mesos.slave.tasks_finished | avg max min sum |
task | Number of finished tasks |
mesos.slave.tasks_killed | avg max min sum |
task | Number of killed tasks |
mesos.slave.tasks_lost | avg max min sum |
task | Number of lost tasks |
mesos.slave.tasks_running | avg max min sum |
task | Number of running tasks |
mesos.slave.tasks_staging | avg max min sum |
task | Number of staging tasks |
mesos.slave.tasks_starting | avg max min sum |
task | Number of starting tasks |
mesos.stats.registered | avg max min sum |
Whether this slave is registered with a master | |
mesos.stats.uptime_secs | avg max min sum |
Slave uptime | |
mesos.slave.cpus_percent | avg max min sum |
percent | Percentage of allocated CPUs |
mesos.slave.cpus_used | avg max min sum |
Number of allocated CPUs | |
mesos.slave.cpus_total | avg max min sum |
Number of CPUs | |
mesos.slave.disk_percent | avg max min sum |
percent | Percentage of allocated disk space |
mesos.slave.disk_used | avg max min sum |
mebibyte | Allocated disk space |
mesos.slave.disk_total | avg max min sum |
mebibyte | Disk space |
mesos.slave.mem_percent | avg max min sum |
percent | Percentage of allocated memory |
mesos.slave.mem_used | avg max min sum |
mebibyte | Allocated memory |
mesos.slave.mem_total | avg max min sum |
mebibyte | Total memory |
mesos.slave.executors_registering | avg max min sum |
Number of executors registering | |
mesos.slave.executors_running | avg max min sum |
Number of executors running | |
mesos.slave.executors_terminated | avg max min sum |
Number of terminated executors | |
mesos.slave.executors_terminating | avg max min sum |
Number of terminating executors | |
mesos.slave.frameworks_active | avg max min sum |
Number of active frameworks | |
mesos.slave.invalid_framework_messages | avg max min sum |
message | Number of invalid framework messages |
mesos.slave.invalid_status_updates | avg max min sum |
Number of invalid status updates | |
mesos.slave.recovery_errors | avg max min sum |
error | Number of errors encountered during slave recovery |
mesos.slave.valid_framework_messages | avg max min sum |
message | Number of valid framework messages |
mesos.slave.valid_status_updates | avg max min sum |
Number of valid status updates |