Skip to content

Infrastructure Integration

Configuration

  1. Configure the agent by editing /etc/nutanix/epoch-dd-agent/conf.d/hdfs_datanode.yaml and /etc/nutanix/epoch-dd-agent/conf.d/hdfs_namenode.yaml in the collectors.

Examples:

To configure DataNode agent:

        init_config:

        instances:
          #
          # The HDFS DataNode check retrieves metrics from the HDFS DataNode's JMX
          # interface. This check must be installed on a HDFS DataNode. The HDFS
          # DataNode JMX URI is composed of the DataNode's hostname and port.
          #
          # The hostname and port can be found in the hdfs-site.xml conf file under
          # the property dfs.datanode.http.address
          #
          - hdfs_datanode_jmx_uri: http://localhost:50075

To configure NamdeNode agent :

          init_config:

          instances:
          #
          # The HDFS NameNode check retrieves metrics from the HDFS NameNode's JMX
          # interface. This check must be installed on the NameNode. The HDFS
          # NameNode JMX URI is composed of the NameNode's hostname and port.
          #
          # The hostname and port can be found in the hdfs-site.xml conf file under
          # the property dfs.http.address or dfs.namenode.http-address
          #
          -  hdfs_namenode_jmx_uri: http://localhost:50070
  1. Check and make sure that all yaml files are valid with following command:

    /etc/init.d/epoch-collectors configcheck
    
  2. Restart the Agent after configuring both the files using the following command:

    /etc/init.d/epoch-collectors restart
    
  3. Execute the info command for both files to verify that the integration check has passed:

    /etc/init.d/epoch-collectors info
    

The output of the info command should contain a section similar to the following:

    Checks
    ======
      [...]
      hdfs_datanode
      ----------
          - instance #0 [OK]
          - Collected 8 metrics & 0 events

      [...]
      hdfs_namenode
      ----------
          - instance #0 [OK]
          - Collected 8 metrics & 0 events

Infrastructure Datasources

Datasource Available Aggregations Unit Description
hdfs.datanode.dfs_remaining avg max min sum byte The remaining disk space left in bytes
hdfs.datanode.dfs_capacity avg max min sum byte Disk capacity in bytes
hdfs.datanode.dfs_used avg max min sum byte Disk usage in bytes
hdfs.datanode.cache_capacity avg max min sum byte Cache capacity in bytes
hdfs.datanode.cache_used avg max min sum byte Cache used in bytes
hdfs.datanode.num_failed_volumes avg max min sum Number of failed volumes
hdfs.datanode.last_volume_failure_date avg max min sum millisecond The date/time of the last volume failure in milliseconds since epoch
hdfs.datanode.estimated_capacity_lost_total avg max min sum byte The estimated capacity lost in bytes
hdfs.datanode.num_blocks_cached avg max min sum block The number of blocks cached
hdfs.datanode.num_blocks_failed_to_cache avg max min sum block The number of blocks that failed to cache
hdfs.datanode.num_blocks_failed_to_uncache avg max min sum block The number of failed blocks to remove from cache
hdfs.namenode.capacity_total avg max min sum byte Total disk capacity in bytes
hdfs.namenode.capacity_used avg max min sum byte Disk usage in bytes
hdfs.namenode.capacity_remaining avg max min sum byte Remaining disk space left in bytes
hdfs.namenode.total_load avg max min sum Total load on the file system
hdfs.namenode.fs_lock_queue_length avg max min sum Lock queue length
hdfs.namenode.blocks_total avg max min sum block Total number of blocks
hdfs.namenode.max_objects avg max min sum object Maximum number of files HDFS supports
hdfs.namenode.files_total avg max min sum file Total number of files
hdfs.namenode.pending_replication_blocks avg max min sum block Number of blocks pending replication
hdfs.namenode.under_replicated_blocks avg max min sum block Number of under replicated blocks
hdfs.namenode.scheduled_replication_blocks avg max min sum block Number of blocks scheduled for replication
hdfs.namenode.pending_deletion_blocks avg max min sum block Number of pending deletion blocks
hdfs.namenode.num_live_data_nodes avg max min sum node Total number of live data nodes
hdfs.namenode.num_dead_data_nodes avg max min sum node Total number of dead data nodes
hdfs.namenode.num_decom_live_data_nodes avg max min sum node Number of decommissioning live data nodes
hdfs.namenode.num_decom_dead_data_nodes avg max min sum node Number of decommissioning dead data nodes
hdfs.namenode.volume_failures_total avg max min sum Total volume failures
hdfs.namenode.estimated_capacity_lost_total avg max min sum byte Estimated capacity lost in bytes
hdfs.namenode.num_decommissioning_data_nodes avg max min sum node Number of decommissioning data nodes
hdfs.namenode.num_stale_data_nodes avg max min sum node Number of stale data nodes
hdfs.namenode.num_stale_storages avg max min sum Number of stale storages
hdfs.namenode.missing_blocks avg max min sum block Number of missing blocks
hdfs.namenode.corrupt_blocks avg max min sum block Number of corrupt blocks