Infrastructure Integration¶
Configuration¶
-
Configure the agent by editing
/etc/nutanix/epoch-dd-agent/conf.d/yarn.yaml
in the collectors. Example:init_config: instances: # The YARN check retrieves metrics from YARNS's ResourceManager. This # check must be run from the Master Node and the ResourceManager URI must # be specified below. The ResourceManager URI is composed of the # ResourceManager's hostname and port. # # The ResourceManager hostname can be found in the yarn-site.xml conf file # under the property yarn.resourcemanager.address # # The ResourceManager port can be found in the yarn-site.xml conf file under # the property yarn.resourcemanager.webapp.address # - resourcemanager_uri: http://localhost:8088 # A Required friendly name for the cluster. # cluster_name: MyYarnCluster # Optional tags to be applied to every emitted metric. # tags: # - "key:value" # - "instance:production" # Optional tags retrieved from the application data to be applied to the # application metrics. # application_tags: # # tag_prefix: yarn_key # app_queue: queue # This will add a tag 'app_queue:name_of_the_queue' to the app metrics, # app_queue being the tag_prefix and queue the actual YARN key. # Allowed yarn keys: applicationType, applicationTags, name, queue, user # By default, the application name is collected with the prefix app_name.
-
Check and make sure that all yaml files are valid with following command:
/etc/init.d/epoch-collectors configcheck
-
Restart the Agent using the following command:
/etc/init.d/epoch-collectors restart
-
Execute the info command to verify that the integration check has passed:
/etc/init.d/epoch-collectors info
The output of the command should contain a section similar to the following:
Checks
======
[...]
yarn
----
- instance #0 [OK]
- Collected 8 metrics & 0 events
Infrastructure Datasources¶
Datasource | Available Aggregations | Unit | Description |
---|---|---|---|
yarn.metrics.apps_submitted | avg max min sum |
task | The number of submitted apps |
yarn.metrics.apps_completed | avg max min sum |
task | The number of completed apps |
yarn.metrics.apps_pending | avg max min sum |
task | The number of pending apps |
yarn.metrics.apps_running | avg max min sum |
task | The number of running apps |
yarn.metrics.apps_failed | avg max min sum |
task | The number of failed apps |
yarn.metrics.apps_killed | avg max min sum |
task | The number of killed apps |
yarn.metrics.reserved_mb | avg max min sum |
mebibyte | The size of reserved memory |
yarn.metrics.available_mb | avg max min sum |
mebibyte | The amount of available memory |
yarn.metrics.allocated_mb | avg max min sum |
mebibyte | The amount of allocated memory |
yarn.metrics.total_mb | avg max min sum |
mebibyte | The amount of total memory |
yarn.metrics.reserved_virtual_cores | avg max min sum |
core | The number of reserved virtual cores |
yarn.metrics.available_virtual_cores | avg max min sum |
core | The number of available virtual cores |
yarn.metrics.allocated_virtual_cores | avg max min sum |
core | The number of allocated virtual cores |
yarn.metrics.total_virtual_cores | avg max min sum |
core | The total number of virtual cores |
yarn.metrics.containers_allocated | avg max min sum |
The number of containers allocated | |
yarn.metrics.containers_reserved | avg max min sum |
The number of containers reserved | |
yarn.metrics.containers_pending | avg max min sum |
The number of containers pending | |
yarn.metrics.total_nodes | avg max min sum |
node | The total number of nodes |
yarn.metrics.active_nodes | avg max min sum |
node | The number of active nodes |
yarn.metrics.lost_nodes | avg max min sum |
node | The number of lost nodes |
yarn.metrics.unhealthy_nodes | avg max min sum |
node | The number of unhealthy nodes |
yarn.metrics.decommissioned_nodes | avg max min sum |
node | The number of decommissioned nodes |
yarn.metrics.rebooted_nodes | avg max min sum |
node | The number of rebooted nodes |
yarn.apps.progress | avg max min sum |
percent | The progress of the application as a percent |
yarn.apps.started_time | avg max min sum |
second | The time in which application started (in ms since epoch) |
yarn.apps.finished_time | avg max min sum |
second | The time in which the application finished (in ms since epoch) |
yarn.apps.elapsed_time | avg max min sum |
second | The elapsed time since the application started (in ms) |
yarn.apps.allocated_mb | avg max min sum |
mebibyte | The sum of memory in MB allocated to the applications running containers |
yarn.apps.allocated_vcores | avg max min sum |
core | The sum of virtual cores allocated to the applications running containers |
yarn.apps.running_containers | avg max min sum |
The number of containers currently running for the application | |
yarn.apps.memory_seconds | avg max min sum |
second | The amount of memory the application has allocated (megabyte-seconds) |
yarn.apps.vcore_seconds | avg max min sum |
second | The amount of CPU resources the application has allocated (virtual core-seconds) |
yarn.node.last_health_update | avg max min sum |
millisecond | The last time the node reported its health (in ms since epoch) |
yarn.node.used_memory_mb | avg max min sum |
mebibyte | The total amount of memory currently used on the node (in MB) |
yarn.node.avail_memory_mb | avg max min sum |
mebibyte | The total amount of memory currently available on the node (in MB) |
yarn.node.used_virtual_cores | avg max min sum |
core | The total number of vCores currently used on the node |
yarn.node.available_virtual_cores | avg max min sum |
core | The total number of vCores available on the node |
yarn.node.num_containers | avg max min sum |
The total number of containers currently running on the node |