Infrastructure Integration¶
Configuration¶
- Configure the agent by editing
/etc/nutanix/epoch-dd-agent/conf.d/mapreduce.yaml
in the collectors.
Example:
instances:
#
# The MapReduce check retrieves metrics from YARN's ResourceManager. This
# check must be run from the Master Node and the ResourceManager URI must
# be specified below. The ResourceManager URI is composed of the
# ResourceManager's hostname and port.
#
# The ResourceManager hostname can be found in the yarn-site.xml conf file
# under the property yarn.resourcemanager.address
#
# The ResourceManager port can be found in the yarn-site.xml conf file under
# the property yarn.resourcemanager.webapp.address
#
- resourcemanager_uri: http://localhost:8088
# A Required friendly name for the cluster.
# cluster_name: MyMapReduceCluster
# Set to true to collect histograms on the elapsed time of
# map and reduce tasks (default: false)
# collect_task_metrics: false
# Optional tags to be applied to every emitted metric.
# tags:
# - key:value
# - instance:production
init_config:
#
# Optional metrics can be specified for counters. For more information on
# counters visit the MapReduce documentation page:
# https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html#Job_Counters_API
#
general_counters:
#
# general_counters are job agnostic metrics that create a metric for each
# specified counter
#
# - counter_group_name: 'org.apache.hadoop.mapreduce.TaskCounter'
# counters:
# - counter_name: 'MAP_INPUT_RECORDS'
# - counter_name: 'MAP_OUTPUT_RECORDS'
# - counter_name: 'REDUCE_INPUT_RECORDS'
# - counter_name: 'REDUCE_OUTPUT_RECORDS'
#
# Additional counter's can be specified as following
#
# - counter_group_name: 'org.apache.hadoop.mapreduce.FileSystemCounter'
# counters:
# - counter_name: 'HDFS_BYTES_READ'
job_specific_counters:
#
# job_specific_counters are metrics that are specific to a particular job.
# The following example specifies counters for the jobs 'Foo' and 'Bar'.
#
# - job_name: 'Foo'
# metrics:
# - counter_group_name: 'org.apache.hadoop.mapreduce.FileSystemCounter'
# counters:
# - counter_name: 'FILE_BYTES_WRITTEN'
# - counter_name: 'HDFS_BYTES_WRITTEN'
# - counter_group_name: 'org.apache.hadoop.mapreduce.FileSystemCounter'
# counters:
# - counter_name: 'HDFS_BYTES_READ'
# - job_name: 'Bar'
# metrics:
# - counter_group_name: 'org.apache.hadoop.mapreduce.FileSystemCounter'
# counters:
# - counter_name: 'FILE_BYTES_WRITTEN'
-
Check and make sure that all yaml files are valid with following command:
/etc/init.d/epoch-collectors configcheck
-
Restart the Agent using the following command:
/etc/init.d/epoch-collectors restart
-
Execute the info command to verify that the integration check has passed:
/etc/init.d/epoch-collectors info
The output of the command should contain a section similar to the following:
Checks
======
[...]
mapreduce
---------
- instance #0 [OK]
- Collected 8 metrics & 0 events
Infrastructure Datasources¶
Datasource | Available Aggregations | Unit | Description |
---|---|---|---|
mapreduce.job.elapsed_time.max | avg max min sum |
millisecond | Max elapsed time since the application started |
mapreduce.job.elapsed_time.avg | avg max min sum |
millisecond | Average elapsed time since the application started |
mapreduce.job.elapsed_time.median | avg max min sum |
millisecond | Median elapsed time since the application started |
mapreduce.job.elapsed_time.95percentile | avg max min sum |
millisecond | 95th percentile elapsed time since the application started |
mapreduce.job.elapsed_time.count | avg max min sum |
Number of times the elapsed time was sampled | |
mapreduce.job.maps_total | avg max min sum |
task/second | Total number of maps |
mapreduce.job.maps_completed | avg max min sum |
task/second | Number of completed maps |
mapreduce.job.reduces_total | avg max min sum |
task/second | Number of reduces |
mapreduce.job.reduces_completed | avg max min sum |
task/second | Number of completed reduces |
mapreduce.job.maps_pending | avg max min sum |
task/second | Number of pending maps |
mapreduce.job.maps_running | avg max min sum |
task/second | Number of running maps |
mapreduce.job.reduces_pending | avg max min sum |
task/second | Number of pending reduces |
mapreduce.job.reduces_running | avg max min sum |
task/second | Number of running reduces |
mapreduce.job.new_reduce_attempts | avg max min sum |
task/second | Number of new reduce attempts |
mapreduce.job.running_reduce_attempts | avg max min sum |
task/second | Number of running reduce attempts |
mapreduce.job.failed_reduce_attempts | avg max min sum |
task/second | Number of failed reduce attempts |
mapreduce.job.killed_reduce_attempts | avg max min sum |
task/second | Number of killed reduce attempts |
mapreduce.job.successful_reduce_attempts | avg max min sum |
task/second | Number of successful reduce attempts |
mapreduce.job.new_map_attempts | avg max min sum |
task/second | Number of new map attempts |
mapreduce.job.running_map_attempts | avg max min sum |
task/second | Number of running map attempts |
mapreduce.job.failed_map_attempts | avg max min sum |
task/second | Number of failed map attempts |
mapreduce.job.killed_map_attempts | avg max min sum |
task/second | Number of killed map attempts |
mapreduce.job.successful_map_attempts | avg max min sum |
task/second | Number of successful map attempts |
mapreduce.job.counter.reduce_counter_value | avg max min sum |
task/second | Counter value of reduce tasks |
mapreduce.job.counter.map_counter_value | avg max min sum |
task/second | Counter value of map tasks |
mapreduce.job.counter.total_counter_value | avg max min sum |
task/second | Counter value of all tasks |
mapreduce.job.map.task.elapsed_time.max | avg max min sum |
millisecond | Max of all map tasks elapsed time |
mapreduce.job.map.task.elapsed_time.avg | avg max min sum |
millisecond | Average of all map tasks elapsed time |
mapreduce.job.map.task.elapsed_time.median | avg max min sum |
millisecond | Median of all map tasks elapsed time |
mapreduce.job.map.task.elapsed_time.95percentile | avg max min sum |
millisecond | 95th percentile of all map tasks elapsed time |
mapreduce.job.map.task.elapsed_time.count | avg max min sum |
Number of times the map tasks elapsed time were sampled | |
mapreduce.job.reduce.task.elapsed_time.max | avg max min sum |
millisecond | Max of all reduce tasks elapsed time |
mapreduce.job.reduce.task.elapsed_time.avg | avg max min sum |
millisecond | Average of all reduce tasks elapsed time |
mapreduce.job.reduce.task.elapsed_time.median | avg max min sum |
millisecond | Median of all reduce tasks elapsed time |
mapreduce.job.reduce.task.elapsed_time.95percentile | avg max min sum |
millisecond | 95th percentile of all reduce tasks elapsed time |
mapreduce.job.reduce.task.elapsed_time.count | avg max min sum |
Number of times the reduce tasks elapsed time were sampled |