Hadoop

Hadoop monitoring in DESK provides a high-level overview of the main Hadoop components within your cluster. Enhanced insights are available for HDFS and MapReduce. Hadoop-specific metrics are presented alongside all infrastructure measurements, providing you with in-depth Hadoop performance analysis of both current and historical data.

Prerequisites

  • DESK OneAgent version 1.103+
  • For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processes: NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
  • Linux OS
  • Hadoop version 2.4.1+

Enabling Hadoop monitoring globally

  1. In the navigation menu, select Settings.
  2. Select Monitoring > Monitored technologies.
  3. On the Supported technologies tab, find the Hadoop entry.
  4. Set the Hadoop switch to the On position.

With Hadoop monitoring enabled globally, DESK automatically collects Hadoop metrics whenever a new host running Hadoop is detected in your environment.

Analyzing your Hadoop components

  1. In the navigation menu, select Technologies.
  2. Click the Hadoop tile on the Technology overview page.
  3. Click an individual Hadoop component in the Process group table to view metrics and a timeline chart specific to that component.

Enhanced insights for HDFS

Viewing NameNode metrics

  1. In the Process group table, select a NameNode process group.
  2. Click the Process group details button.
  3. On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop NameNode pages provide details about your HDFS capacity, usage, blocks, cache, files, and data-node health.
  4. Further down the page, you’ll find a number of cluster-specific charts.

Viewing DataNode metrics

  1. In the Process group table, select a DataNode process group.
  2. Click the Process group details button.
  3. On the Process group details page, select the Technology-specific metrics tab and select the DataNode.
  4. Select the Hadoop HDFS metrics tab.

Enhanced insights for MapReduce

Viewing ResourceManager metrics

  1. Expand the Details section of the ResourceManager process group.
  2. Click the Process group details button.
  3. On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop ResourceManager metrics pages provide information about your nodes, applications, memory, cores, and containers.
  4. Further down the page, you’ll find a number of ResourceManager-specific charts.

Viewing MRAppMaster metrics

  1. Expand the Details section of an MRAppMaster process group.
  2. Click the Process group details button.
  3. On the Process group details page, click the Technology-specific metrics tab and select the MRAppMaster process.
  4. Click the Hadoop MapReduce tab.

To view NodeManager metrics

  1. Expand the Details section of the NodeManager manager process group.
  2. Click the Process group details button.
  3. On the Process group details page, click the Technology-specific metrics tab and select a NodeManager process.
  4. Click the Hadoop MapReduce tab.

NameNode metrics

Metric Description
Total Raw capacity of DataNodes in bytes.
Used Used capacity across all DataNodes in bytes.
Remaining Remaining capacity in bytes.
Total load The number of connections.
Total The number of allocated blocks in the system.
Pending deletion The number of blocks pending deletion.
Files total Total number of files.
Pending replication The number of blocks pending to be replicated.
Under replicated The number of under-replicated blocks.
Scheduled replication The number of blocks scheduled for replication.
Live The number of live DataNodes.
Dead The number of dead DataNodes.
Decommission Live The number of decommissioning live DataNodes.
Decommission Dead The number of decommissioning dead DataNodes.
Usage – Volume failures total Total volume failures.
Estimated capacity lost total Estimated capacity lost in bytes.
Decommission Decommissioning The number of decommissioning data DataNodes.
Stale The number of stale DataNodes.
Blocks missing and corrupt – Missing The number of missing blocks.
Capacity Cache capacity in bytes.
Used Cache used in bytes.
Blocks missing and corrupt – Corrupt The number of corrupt blocks.
Capacity in bytes – Used, non-DFS Capacity used, non-DFS in bytes.
Appended The number of files appended.
Created The number of files and directories created by create or mkdir operations.
Deleted The number of files and directories deleted by delete or rename operations.
Renamed The number of rename operations.

DataNode metrics

Metric Description
Live The number of live DataNodes.
Dead The number of dead DataNodes.
Decommission Live The number of decommissioning live DataNodes.
Decommission Dead The number of decommissioning dead DataNodes.
Decommission Decommissioning The number of decommissioning data DataNodes.
Stale The number of stale DataNodes.
Capacity Cache capacity in bytes.
Used Cache used in bytes.
Capacity Disk capacity in bytes.
DfsUsed Disk usage in bytes.
Cached The number of blocks cached.
Failed to cache The number of blocks that failed to cache.
Failed to uncache The number of blocks that failed to remove from cache.
Number of failed volumes The number of volume failures occurred.
Capacity in bytes – Remaining The remaining disk space left in bytes.
Blocks The number of blocks read from DataNode.
Removed The number of blocks removed.
Replicated The number of blocks replicated.
Verified The number of blocks verified.
Blocks The number of blocks written to DataNode.
Bytes The number of bytes read from DataNode.
Bytes The number of bytes written to DataNode.

ResourceManager metrics

Metric Description
Active Number of active NodeManagers.
Decommissioned Number of decommissioned NodeManagers.
Lost Number of lost NodeManagers – no heartbeats.
Rebooted Number of rebooted NodeManagers.
Unhealthy Number of unhealthy NodeManagers.
Allocated Number of allocated containers.
Allocated Allocated memory in bytes.
Allocated Number of allocated CPU in virtual cores.
Completed Number of successfully completed applications.
Failed Number of failed applications.
Killed Number of killed applications.
Pending Number of pending applications.
Running Number of running applications.
Submitted Number of submitted applications.
Available Amount of available memory in bytes.
Available Number of available CPU in virtual cores.
Pending Amount of pending memory resource requests in bytes that are not yet fulfilled by the scheduler.
Pending Pending CPU allocation requests in virtual cores that are not yet fulfilled by the scheduler.
Reserved Amount of reserved memory in bytes.
Reserved Number of reserved CPU in virtual cores.