Skip to content

Infrastructure Integration

Configuration

  1. Configure the agent by editing /etc/nutanix/epoch-dd-agent/conf.d/mesos_master.yaml and /etc/nutanix/epoch-dd-agent/conf.d/mesos_slave.yaml in the collectors.

Example:

On master nodes, for Mesos master's:

      init_config:
      default_timeout: 10

    instances:
      - url: "http://localhost:5050"

        # The (optional) disable_ssl_validation will instruct the check
        # to skip the validation of the SSL certificate of the master.
        # This is mostly useful for certificates that are not signed by a
        # public authority.
        # When true, the check logs a warning in collector.log
        # Defaults to false, set to true if you want to disable
        # SSL certificate validation.
        #
        # disable_ssl_validation: true

On slave nodes, for Mesos slave's:

      init_config:
      default_timeout: 10

    instances:
      - url: "http://localhost:5051"
        #  master_port: 5050
        #  tasks:
        #    - "hello"

        # The (optional) disable_ssl_validation will instruct the check
        # to skip the validation of the SSL certificate of the slave metric endpoint.
        # This is mostly useful for certificates that are not signed by a
        # public authority.
        # When true, the check logs a warning in collector.log
        # Defaults to false, set to true if you want to disable
        # SSL certificate validation.
        #
        # disable_ssl_validation: true
  1. Check and make sure that all yaml files are valid with following command:

    /etc/init.d/epoch-collectors configcheck
    
  2. Restart the Agent for both files using the following command:

    /etc/init.d/epoch-collectors restart
    
  3. Execute the info command to verify that the integration check has passed:

    /etc/init.d/epoch-collectors info
    

The output of the info command should contain a section similar to the following:

    Checks
    ======
      [...]
      mesos_master
      ----------
          - instance #0 [OK]
          - Collected 8 metrics & 0 events


      [...]
      mesos_slave
      ----------
          - instance #0 [OK]
          - Collected 8 metrics & 0 events

Infrastructure Datasources

Datasource Available Aggregations Unit Description
mesos.framework.cpu avg max min sum Framework cpu
mesos.framework.mem avg max min sum mebibyte Framework mem
mesos.framework.disk avg max min sum mebibyte Framework disk
mesos.role.cpu avg max min sum Role cpu
mesos.role.mem avg max min sum mebibyte Role mem
mesos.role.disk avg max min sum mebibyte Role disk
mesos.cluster.tasks_error avg max min sum task Number of tasks that were invalid
mesos.cluster.tasks_failed avg max min sum task Number of failed tasks
mesos.cluster.tasks_finished avg max min sum task Number of finished tasks
mesos.cluster.tasks_killed avg max min sum task Number of killed tasks
mesos.cluster.tasks_lost avg max min sum task Number of lost tasks
mesos.cluster.tasks_running avg max min sum task Number of running tasks
mesos.cluster.tasks_staging avg max min sum task Number of staging tasks
mesos.cluster.tasks_starting avg max min sum task Number of starting tasks
mesos.cluster.slave_registrations avg max min sum Number of slaves that were able to cleanly re-join the cluster and connect back to the master after the master is disconnected.
mesos.cluster.slave_removals avg max min sum Number of slaves removed for various reasons, including maintenance
mesos.cluster.slave_reregistrations avg max min sum Number of slave re-registrations
mesos.cluster.slave_shutdowns_canceled avg max min sum Number of cancelled slave shutdowns
mesos.cluster.slave_shutdowns_scheduled avg max min sum Number of slaves which have failed their health check and are scheduled to be removed
mesos.cluster.slaves_active avg max min sum Number of active slaves
mesos.cluster.slaves_connected avg max min sum Number of connected slaves
mesos.cluster.slaves_disconnected avg max min sum Number of disconnected slaves
mesos.cluster.slaves_inactive avg max min sum Number of inactive slaves
mesos.cluster.cpus_percent avg max min sum percent Percentage of allocated CPUs
mesos.cluster.cpus_used avg max min sum Number of allocated CPUs
mesos.cluster.cpus_total avg max min sum Number of CPUs
mesos.cluster.disk_percent avg max min sum percent Percentage of allocated disk space
mesos.cluster.disk_used avg max min sum mebibyte Allocated disk space
mesos.cluster.disk_total avg max min sum mebibyte Disk space
mesos.cluster.mem_percent avg max min sum percent Percentage of allocated memory
mesos.cluster.mem_used avg max min sum mebibyte Allocated memory
mesos.cluster.mem_total avg max min sum mebibyte Total memory
mesos.registrar.queued_operations avg max min sum Number of queued operations
mesos.registrar.registry_size_bytes avg max min sum byte Registry size
mesos.registrar.state_fetch_ms avg max min sum millisecond Registry read latency
mesos.registrar.state_store_ms avg max min sum millisecond Registry write latency
mesos.registrar.state_store_ms.count avg max min sum Registry write count
mesos.registrar.state_store_ms.max avg max min sum millisecond Maximum registry write latency
mesos.registrar.state_store_ms.min avg max min sum millisecond Minimum registry write latency
mesos.registrar.state_store_ms.p50 avg max min sum millisecond Median registry write latency
mesos.registrar.state_store_ms.p90 avg max min sum millisecond 90th percentile registry write latency
mesos.registrar.state_store_ms.p95 avg max min sum millisecond 95th percentile registry write latency
mesos.registrar.state_store_ms.p99 avg max min sum millisecond 99th percentile registry write latency
mesos.registrar.state_store_ms.p999 avg max min sum millisecond 99.9th percentile registry write latency
mesos.registrar.state_store_ms.p9999 avg max min sum millisecond 99.99th percentile registry write latency
mesos.registrar.log.recovered avg max min sum Registrar log recovered
mesos.cluster.frameworks_active avg max min sum Number of active frameworks
mesos.cluster.frameworks_connected avg max min sum Number of connected frameworks
mesos.cluster.frameworks_disconnected avg max min sum Number of disconnected frameworks
mesos.cluster.frameworks_inactive avg max min sum Number of inactive frameworks
mesos.stats.system.cpus_total avg max min sum Number of CPUs available
mesos.stats.system.load_15min avg max min sum Load average for the past 15 minutes
mesos.stats.system.load_1min avg max min sum Load average for the past minutes
mesos.stats.system.load_5min avg max min sum Load average for the past 5 minutes
mesos.stats.system.mem_free_bytes avg max min sum byte Free memory
mesos.stats.system.mem_total_bytes avg max min sum byte Total memory
mesos.stats.elected avg max min sum Whether this is the elected master
mesos.stats.uptime_secs avg max min sum second Uptime
mesos.cluster.dropped_messages avg max min sum message Number of dropped messages
mesos.cluster.outstanding_offers avg max min sum Number of outstanding resource offers
mesos.cluster.event_queue_dispatches avg max min sum Number of dispatches in the event queue
mesos.cluster.event_queue_http_requests avg max min sum request Number of HTTP requests in the event queue
mesos.cluster.event_queue_messages avg max min sum message Number of messages in the event queue
mesos.cluster.invalid_framework_to_executor_messages avg max min sum message Number of invalid framework messages
mesos.cluster.invalid_status_update_acknowledgements avg max min sum Number of invalid status update acknowledgements
mesos.cluster.invalid_status_updates avg max min sum Number of invalid status updates
mesos.cluster.valid_framework_to_executor_messages avg max min sum message Number of valid framework messages
mesos.cluster.valid_status_update_acknowledgements avg max min sum Number of valid status update acknowledgements
mesos.cluster.valid_status_updates avg max min sum Number of valid status updates
mesos.stats.registered avg max min sum Whether this slave is registered with a master
mesos.stats.system.cpus_total avg max min sum Number of CPUs available
mesos.stats.system.load_15min avg max min sum Load average for the past 15 minutes
mesos.stats.system.load_1min avg max min sum Load average for the past minutes
mesos.stats.system.load_5min avg max min sum Load average for the past 5 minutes
mesos.stats.system.mem_free_bytes avg max min sum byte Free memory
mesos.stats.system.mem_total_bytes avg max min sum byte Total memory
mesos.state.task.cpu avg max min sum Task cpu
mesos.state.task.mem avg max min sum mebibyte Task memory
mesos.state.task.disk avg max min sum mebibyte Task disk
mesos.slave.tasks_failed avg max min sum task Number of failed tasks
mesos.slave.tasks_finished avg max min sum task Number of finished tasks
mesos.slave.tasks_killed avg max min sum task Number of killed tasks
mesos.slave.tasks_lost avg max min sum task Number of lost tasks
mesos.slave.tasks_running avg max min sum task Number of running tasks
mesos.slave.tasks_staging avg max min sum task Number of staging tasks
mesos.slave.tasks_starting avg max min sum task Number of starting tasks
mesos.stats.registered avg max min sum Whether this slave is registered with a master
mesos.stats.uptime_secs avg max min sum Slave uptime
mesos.slave.cpus_percent avg max min sum percent Percentage of allocated CPUs
mesos.slave.cpus_used avg max min sum Number of allocated CPUs
mesos.slave.cpus_total avg max min sum Number of CPUs
mesos.slave.disk_percent avg max min sum percent Percentage of allocated disk space
mesos.slave.disk_used avg max min sum mebibyte Allocated disk space
mesos.slave.disk_total avg max min sum mebibyte Disk space
mesos.slave.mem_percent avg max min sum percent Percentage of allocated memory
mesos.slave.mem_used avg max min sum mebibyte Allocated memory
mesos.slave.mem_total avg max min sum mebibyte Total memory
mesos.slave.executors_registering avg max min sum Number of executors registering
mesos.slave.executors_running avg max min sum Number of executors running
mesos.slave.executors_terminated avg max min sum Number of terminated executors
mesos.slave.executors_terminating avg max min sum Number of terminating executors
mesos.slave.frameworks_active avg max min sum Number of active frameworks
mesos.slave.invalid_framework_messages avg max min sum message Number of invalid framework messages
mesos.slave.invalid_status_updates avg max min sum Number of invalid status updates
mesos.slave.recovery_errors avg max min sum error Number of errors encountered during slave recovery
mesos.slave.valid_framework_messages avg max min sum message Number of valid framework messages
mesos.slave.valid_status_updates avg max min sum Number of valid status updates