Infrastructure Integration¶
Configuration¶
- Configure the agent by editing
/etc/nutanix/epoch-dd-agent/conf.d/hdfs_datanode.yaml
and/etc/nutanix/epoch-dd-agent/conf.d/hdfs_namenode.yaml
in the collectors.
Examples:
To configure DataNode agent:
init_config:
instances:
#
# The HDFS DataNode check retrieves metrics from the HDFS DataNode's JMX
# interface. This check must be installed on a HDFS DataNode. The HDFS
# DataNode JMX URI is composed of the DataNode's hostname and port.
#
# The hostname and port can be found in the hdfs-site.xml conf file under
# the property dfs.datanode.http.address
#
- hdfs_datanode_jmx_uri: http://localhost:50075
To configure NamdeNode agent :
init_config:
instances:
#
# The HDFS NameNode check retrieves metrics from the HDFS NameNode's JMX
# interface. This check must be installed on the NameNode. The HDFS
# NameNode JMX URI is composed of the NameNode's hostname and port.
#
# The hostname and port can be found in the hdfs-site.xml conf file under
# the property dfs.http.address or dfs.namenode.http-address
#
- hdfs_namenode_jmx_uri: http://localhost:50070
-
Check and make sure that all yaml files are valid with following command:
/etc/init.d/epoch-collectors configcheck
-
Restart the Agent after configuring both the files using the following command:
/etc/init.d/epoch-collectors restart
-
Execute the info command for both files to verify that the integration check has passed:
/etc/init.d/epoch-collectors info
The output of the info command should contain a section similar to the following:
Checks
======
[...]
hdfs_datanode
----------
- instance #0 [OK]
- Collected 8 metrics & 0 events
[...]
hdfs_namenode
----------
- instance #0 [OK]
- Collected 8 metrics & 0 events
Infrastructure Datasources¶
Datasource | Available Aggregations | Unit | Description |
---|---|---|---|
hdfs.datanode.dfs_remaining | avg max min sum |
byte | The remaining disk space left in bytes |
hdfs.datanode.dfs_capacity | avg max min sum |
byte | Disk capacity in bytes |
hdfs.datanode.dfs_used | avg max min sum |
byte | Disk usage in bytes |
hdfs.datanode.cache_capacity | avg max min sum |
byte | Cache capacity in bytes |
hdfs.datanode.cache_used | avg max min sum |
byte | Cache used in bytes |
hdfs.datanode.num_failed_volumes | avg max min sum |
Number of failed volumes | |
hdfs.datanode.last_volume_failure_date | avg max min sum |
millisecond | The date/time of the last volume failure in milliseconds since epoch |
hdfs.datanode.estimated_capacity_lost_total | avg max min sum |
byte | The estimated capacity lost in bytes |
hdfs.datanode.num_blocks_cached | avg max min sum |
block | The number of blocks cached |
hdfs.datanode.num_blocks_failed_to_cache | avg max min sum |
block | The number of blocks that failed to cache |
hdfs.datanode.num_blocks_failed_to_uncache | avg max min sum |
block | The number of failed blocks to remove from cache |
hdfs.namenode.capacity_total | avg max min sum |
byte | Total disk capacity in bytes |
hdfs.namenode.capacity_used | avg max min sum |
byte | Disk usage in bytes |
hdfs.namenode.capacity_remaining | avg max min sum |
byte | Remaining disk space left in bytes |
hdfs.namenode.total_load | avg max min sum |
Total load on the file system | |
hdfs.namenode.fs_lock_queue_length | avg max min sum |
Lock queue length | |
hdfs.namenode.blocks_total | avg max min sum |
block | Total number of blocks |
hdfs.namenode.max_objects | avg max min sum |
object | Maximum number of files HDFS supports |
hdfs.namenode.files_total | avg max min sum |
file | Total number of files |
hdfs.namenode.pending_replication_blocks | avg max min sum |
block | Number of blocks pending replication |
hdfs.namenode.under_replicated_blocks | avg max min sum |
block | Number of under replicated blocks |
hdfs.namenode.scheduled_replication_blocks | avg max min sum |
block | Number of blocks scheduled for replication |
hdfs.namenode.pending_deletion_blocks | avg max min sum |
block | Number of pending deletion blocks |
hdfs.namenode.num_live_data_nodes | avg max min sum |
node | Total number of live data nodes |
hdfs.namenode.num_dead_data_nodes | avg max min sum |
node | Total number of dead data nodes |
hdfs.namenode.num_decom_live_data_nodes | avg max min sum |
node | Number of decommissioning live data nodes |
hdfs.namenode.num_decom_dead_data_nodes | avg max min sum |
node | Number of decommissioning dead data nodes |
hdfs.namenode.volume_failures_total | avg max min sum |
Total volume failures | |
hdfs.namenode.estimated_capacity_lost_total | avg max min sum |
byte | Estimated capacity lost in bytes |
hdfs.namenode.num_decommissioning_data_nodes | avg max min sum |
node | Number of decommissioning data nodes |
hdfs.namenode.num_stale_data_nodes | avg max min sum |
node | Number of stale data nodes |
hdfs.namenode.num_stale_storages | avg max min sum |
Number of stale storages | |
hdfs.namenode.missing_blocks | avg max min sum |
block | Number of missing blocks |
hdfs.namenode.corrupt_blocks | avg max min sum |
block | Number of corrupt blocks |