Hadoop

Hadoop monitoring in DESK provides a high-level overview of the main Hadoop components within your cluster. Enhanced insights are available for HDFS and MapReduce. Hadoop-specific metrics are presented alongside all infrastructure measurements, providing you with in-depth Hadoop performance analysis of both current and historical data.

Prerequisites

DESK OneAgent version 1.103+
For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processes: NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
Linux OS
Hadoop version 2.4.1+

Enabling Hadoop monitoring globally

In the navigation menu, select Settings.
Select Monitoring > Monitored technologies.
On the Supported technologies tab, find the Hadoop entry.
Set the Hadoop switch to the On position.

With Hadoop monitoring enabled globally, DESK automatically collects Hadoop metrics whenever a new host running Hadoop is detected in your environment.

Analyzing your Hadoop components

In the navigation menu, select Technologies.
Click the Hadoop tile on the Technology overview page.
Click an individual Hadoop component in the Process group table to view metrics and a timeline chart specific to that component.

Enhanced insights for HDFS

Viewing NameNode metrics

In the Process group table, select a NameNode process group.
Click the Process group details button.
On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop NameNode pages provide details about your HDFS capacity, usage, blocks, cache, files, and data-node health.
Further down the page, you’ll find a number of cluster-specific charts.

Viewing DataNode metrics

In the Process group table, select a DataNode process group.
Click the Process group details button.
On the Process group details page, select the Technology-specific metrics tab and select the DataNode.
Select the Hadoop HDFS metrics tab.

Enhanced insights for MapReduce

Viewing ResourceManager metrics

Expand the Details section of the ResourceManager process group.
Click the Process group details button.
On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop ResourceManager metrics pages provide information about your nodes, applications, memory, cores, and containers.
Further down the page, you’ll find a number of ResourceManager-specific charts.

Viewing MRAppMaster metrics

Expand the Details section of an MRAppMaster process group.
Click the Process group details button.
On the Process group details page, click the Technology-specific metrics tab and select the MRAppMaster process.
Click the Hadoop MapReduce tab.

To view NodeManager metrics

Expand the Details section of the NodeManager manager process group.
Click the Process group details button.
On the Process group details page, click the Technology-specific metrics tab and select a NodeManager process.
Click the Hadoop MapReduce tab.

NameNode metrics

Metric	Description
Total	Raw capacity of DataNodes in bytes.
Used	Used capacity across all DataNodes in bytes.
Remaining	Remaining capacity in bytes.
Total load	The number of connections.
Total	The number of allocated blocks in the system.
Pending deletion	The number of blocks pending deletion.
Files total	Total number of files.
Pending replication	The number of blocks pending to be replicated.
Under replicated	The number of under-replicated blocks.
Scheduled replication	The number of blocks scheduled for replication.
Live	The number of live DataNodes.
Dead	The number of dead DataNodes.
Decommission Live	The number of decommissioning live DataNodes.
Decommission Dead	The number of decommissioning dead DataNodes.
Usage – Volume failures total	Total volume failures.
Estimated capacity lost total	Estimated capacity lost in bytes.
Decommission Decommissioning	The number of decommissioning data DataNodes.
Stale	The number of stale DataNodes.
Blocks missing and corrupt – Missing	The number of missing blocks.
Capacity	Cache capacity in bytes.
Used	Cache used in bytes.
Blocks missing and corrupt – Corrupt	The number of corrupt blocks.
Capacity in bytes – Used, non-DFS	Capacity used, non-DFS in bytes.
Appended	The number of files appended.
Created	The number of files and directories created by create or mkdir operations.
Deleted	The number of files and directories deleted by delete or rename operations.
Renamed	The number of rename operations.

DataNode metrics

Metric	Description
Live	The number of live DataNodes.
Dead	The number of dead DataNodes.
Decommission Live	The number of decommissioning live DataNodes.
Decommission Dead	The number of decommissioning dead DataNodes.
Decommission Decommissioning	The number of decommissioning data DataNodes.
Stale	The number of stale DataNodes.
Capacity	Cache capacity in bytes.
Used	Cache used in bytes.
Capacity	Disk capacity in bytes.
DfsUsed	Disk usage in bytes.
Cached	The number of blocks cached.
Failed to cache	The number of blocks that failed to cache.
Failed to uncache	The number of blocks that failed to remove from cache.
Number of failed volumes	The number of volume failures occurred.
Capacity in bytes – Remaining	The remaining disk space left in bytes.
Blocks	The number of blocks read from DataNode.
Removed	The number of blocks removed.
Replicated	The number of blocks replicated.
Verified	The number of blocks verified.
Blocks	The number of blocks written to DataNode.
Bytes	The number of bytes read from DataNode.
Bytes	The number of bytes written to DataNode.

ResourceManager metrics

Metric	Description
Active	Number of active NodeManagers.
Decommissioned	Number of decommissioned NodeManagers.
Lost	Number of lost NodeManagers – no heartbeats.
Rebooted	Number of rebooted NodeManagers.
Unhealthy	Number of unhealthy NodeManagers.
Allocated	Number of allocated containers.
Allocated	Allocated memory in bytes.
Allocated	Number of allocated CPU in virtual cores.
Completed	Number of successfully completed applications.
Failed	Number of failed applications.
Killed	Number of killed applications.
Pending	Number of pending applications.
Running	Number of running applications.
Submitted	Number of submitted applications.
Available	Amount of available memory in bytes.
Available	Number of available CPU in virtual cores.
Pending	Amount of pending memory resource requests in bytes that are not yet fulfilled by the scheduler.
Pending	Pending CPU allocation requests in virtual cores that are not yet fulfilled by the scheduler.
Reserved	Amount of reserved memory in bytes.
Reserved	Number of reserved CPU in virtual cores.