Skip to content

Infrastructure Integration

Configuration

  1. Configure the agent by editing /etc/nutanix/epoch-dd-agent/conf.d/yarn.yamlin the collectors. Example:

    init_config:
    
    instances:
      # The YARN check retrieves metrics from YARNS's ResourceManager. This
      # check must be run from the Master Node and the ResourceManager URI must
      # be specified below. The ResourceManager URI is composed of the
      # ResourceManager's hostname and port.
      #
      # The ResourceManager hostname can be found in the yarn-site.xml conf file
      # under the property yarn.resourcemanager.address
      #
      # The ResourceManager port can be found in the yarn-site.xml conf file under
      # the property yarn.resourcemanager.webapp.address
      #
      - resourcemanager_uri: http://localhost:8088
    
        # A Required friendly name for the cluster.
        # cluster_name: MyYarnCluster
    
        # Optional tags to be applied to every emitted metric.
        # tags:
        #   - "key:value"
        #   - "instance:production"
    
        # Optional tags retrieved from the application data to be applied to the
        # application metrics.
        # application_tags:
        # # tag_prefix: yarn_key
        #   app_queue: queue
        # This will add a tag 'app_queue:name_of_the_queue' to the app metrics,
        # app_queue being the tag_prefix and queue the actual YARN key.
        # Allowed yarn keys: applicationType, applicationTags, name, queue, user
        # By default, the application name is collected with the prefix app_name.
    
  2. Check and make sure that all yaml files are valid with following command:

    /etc/init.d/epoch-collectors configcheck
    
  3. Restart the Agent using the following command:

    /etc/init.d/epoch-collectors restart
    
  4. Execute the info command to verify that the integration check has passed:

    /etc/init.d/epoch-collectors info
    

The output of the command should contain a section similar to the following:

    Checks
    ======

      [...]

      yarn
      ----
          - instance #0 [OK]
          - Collected 8 metrics & 0 events

Infrastructure Datasources

Datasource Available Aggregations Unit Description
yarn.metrics.apps_submitted avg max min sum task The number of submitted apps
yarn.metrics.apps_completed avg max min sum task The number of completed apps
yarn.metrics.apps_pending avg max min sum task The number of pending apps
yarn.metrics.apps_running avg max min sum task The number of running apps
yarn.metrics.apps_failed avg max min sum task The number of failed apps
yarn.metrics.apps_killed avg max min sum task The number of killed apps
yarn.metrics.reserved_mb avg max min sum mebibyte The size of reserved memory
yarn.metrics.available_mb avg max min sum mebibyte The amount of available memory
yarn.metrics.allocated_mb avg max min sum mebibyte The amount of allocated memory
yarn.metrics.total_mb avg max min sum mebibyte The amount of total memory
yarn.metrics.reserved_virtual_cores avg max min sum core The number of reserved virtual cores
yarn.metrics.available_virtual_cores avg max min sum core The number of available virtual cores
yarn.metrics.allocated_virtual_cores avg max min sum core The number of allocated virtual cores
yarn.metrics.total_virtual_cores avg max min sum core The total number of virtual cores
yarn.metrics.containers_allocated avg max min sum The number of containers allocated
yarn.metrics.containers_reserved avg max min sum The number of containers reserved
yarn.metrics.containers_pending avg max min sum The number of containers pending
yarn.metrics.total_nodes avg max min sum node The total number of nodes
yarn.metrics.active_nodes avg max min sum node The number of active nodes
yarn.metrics.lost_nodes avg max min sum node The number of lost nodes
yarn.metrics.unhealthy_nodes avg max min sum node The number of unhealthy nodes
yarn.metrics.decommissioned_nodes avg max min sum node The number of decommissioned nodes
yarn.metrics.rebooted_nodes avg max min sum node The number of rebooted nodes
yarn.apps.progress avg max min sum percent The progress of the application as a percent
yarn.apps.started_time avg max min sum second The time in which application started (in ms since epoch)
yarn.apps.finished_time avg max min sum second The time in which the application finished (in ms since epoch)
yarn.apps.elapsed_time avg max min sum second The elapsed time since the application started (in ms)
yarn.apps.allocated_mb avg max min sum mebibyte The sum of memory in MB allocated to the applications running containers
yarn.apps.allocated_vcores avg max min sum core The sum of virtual cores allocated to the applications running containers
yarn.apps.running_containers avg max min sum The number of containers currently running for the application
yarn.apps.memory_seconds avg max min sum second The amount of memory the application has allocated (megabyte-seconds)
yarn.apps.vcore_seconds avg max min sum second The amount of CPU resources the application has allocated (virtual core-seconds)
yarn.node.last_health_update avg max min sum millisecond The last time the node reported its health (in ms since epoch)
yarn.node.used_memory_mb avg max min sum mebibyte The total amount of memory currently used on the node (in MB)
yarn.node.avail_memory_mb avg max min sum mebibyte The total amount of memory currently available on the node (in MB)
yarn.node.used_virtual_cores avg max min sum core The total number of vCores currently used on the node
yarn.node.available_virtual_cores avg max min sum core The total number of vCores available on the node
yarn.node.num_containers avg max min sum The total number of containers currently running on the node