Hadoop monitoring in DESK provides a high-level overview of the main Hadoop components within your cluster. Enhanced insights are available for HDFS and MapReduce. Hadoop-specific metrics are presented alongside all infrastructure measurements, providing you with in-depth Hadoop performance analysis of both current and historical data.
Prerequisites
- DESK OneAgent version 1.103+
- For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processes: NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
- Linux OS
- Hadoop version 2.4.1+
Enabling Hadoop monitoring globally
- In the navigation menu, select Settings.
- Select Monitoring > Monitored technologies.
- On the Supported technologies tab, find the Hadoop entry.
- Set the Hadoop switch to the On position.
With Hadoop monitoring enabled globally, DESK automatically collects Hadoop metrics whenever a new host running Hadoop is detected in your environment.
Analyzing your Hadoop components
- In the navigation menu, select Technologies.
- Click the Hadoop tile on the Technology overview page.
- Click an individual Hadoop component in the Process group table to view metrics and a timeline chart specific to that component.
Enhanced insights for HDFS
Viewing NameNode metrics
- In the Process group table, select a NameNode process group.
- Click the Process group details button.
- On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop NameNode pages provide details about your HDFS capacity, usage, blocks, cache, files, and data-node health.
- Further down the page, you’ll find a number of cluster-specific charts.
Viewing DataNode metrics
- In the Process group table, select a DataNode process group.
- Click the Process group details button.
- On the Process group details page, select the Technology-specific metrics tab and select the DataNode.
- Select the Hadoop HDFS metrics tab.
Enhanced insights for MapReduce
Viewing ResourceManager metrics
- Expand the Details section of the ResourceManager process group.
- Click the Process group details button.
- On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop ResourceManager metrics pages provide information about your nodes, applications, memory, cores, and containers.
- Further down the page, you’ll find a number of ResourceManager-specific charts.
Viewing MRAppMaster metrics
- Expand the Details section of an MRAppMaster process group.
- Click the Process group details button.
- On the Process group details page, click the Technology-specific metrics tab and select the MRAppMaster process.
- Click the Hadoop MapReduce tab.
To view NodeManager metrics
- Expand the Details section of the NodeManager manager process group.
- Click the Process group details button.
- On the Process group details page, click the Technology-specific metrics tab and select a NodeManager process.
- Click the Hadoop MapReduce tab.
NameNode metrics
Metric |
Description |
Total
|
Raw capacity of DataNodes in bytes.
|
Used
|
Used capacity across all DataNodes in bytes.
|
Remaining
|
Remaining capacity in bytes.
|
Total load
|
The number of connections.
|
Total
|
The number of allocated blocks in the system.
|
Pending deletion
|
The number of blocks pending deletion.
|
Files total
|
Total number of files.
|
Pending replication
|
The number of blocks pending to be replicated.
|
Under replicated
|
The number of under-replicated blocks.
|
Scheduled replication
|
The number of blocks scheduled for replication.
|
Live
|
The number of live DataNodes.
|
Dead
|
The number of dead DataNodes.
|
Decommission Live
|
The number of decommissioning live DataNodes.
|
Decommission Dead
|
The number of decommissioning dead DataNodes.
|
Usage – Volume failures total
|
Total volume failures.
|
Estimated capacity lost total
|
Estimated capacity lost in bytes.
|
Decommission Decommissioning
|
The number of decommissioning data DataNodes.
|
Stale
|
The number of stale DataNodes.
|
Blocks missing and corrupt – Missing
|
The number of missing blocks.
|
Capacity
|
Cache capacity in bytes.
|
Used
|
Cache used in bytes.
|
Blocks missing and corrupt – Corrupt
|
The number of corrupt blocks.
|
Capacity in bytes – Used, non-DFS
|
Capacity used, non-DFS in bytes.
|
Appended
|
The number of files appended.
|
Created
|
The number of files and directories created by create or mkdir operations.
|
Deleted
|
The number of files and directories deleted by delete or rename operations.
|
Renamed
|
The number of rename operations.
|
DataNode metrics
Metric |
Description |
Live
|
The number of live DataNodes.
|
Dead
|
The number of dead DataNodes.
|
Decommission Live
|
The number of decommissioning live DataNodes.
|
Decommission Dead
|
The number of decommissioning dead DataNodes.
|
Decommission Decommissioning
|
The number of decommissioning data DataNodes.
|
Stale
|
The number of stale DataNodes.
|
Capacity
|
Cache capacity in bytes.
|
Used
|
Cache used in bytes.
|
Capacity
|
Disk capacity in bytes.
|
DfsUsed
|
Disk usage in bytes.
|
Cached
|
The number of blocks cached.
|
Failed to cache
|
The number of blocks that failed to cache.
|
Failed to uncache
|
The number of blocks that failed to remove from cache.
|
Number of failed volumes
|
The number of volume failures occurred.
|
Capacity in bytes – Remaining
|
The remaining disk space left in bytes.
|
Blocks
|
The number of blocks read from DataNode.
|
Removed
|
The number of blocks removed.
|
Replicated
|
The number of blocks replicated.
|
Verified
|
The number of blocks verified.
|
Blocks
|
The number of blocks written to DataNode.
|
Bytes
|
The number of bytes read from DataNode.
|
Bytes
|
The number of bytes written to DataNode.
|
ResourceManager metrics
Metric |
Description |
Active
|
Number of active NodeManagers.
|
Decommissioned
|
Number of decommissioned NodeManagers.
|
Lost
|
Number of lost NodeManagers – no heartbeats.
|
Rebooted
|
Number of rebooted NodeManagers.
|
Unhealthy
|
Number of unhealthy NodeManagers.
|
Allocated
|
Number of allocated containers.
|
Allocated
|
Allocated memory in bytes.
|
Allocated
|
Number of allocated CPU in virtual cores.
|
Completed
|
Number of successfully completed applications.
|
Failed
|
Number of failed applications.
|
Killed
|
Number of killed applications.
|
Pending
|
Number of pending applications.
|
Running
|
Number of running applications.
|
Submitted
|
Number of submitted applications.
|
Available
|
Amount of available memory in bytes.
|
Available
|
Number of available CPU in virtual cores.
|
Pending
|
Amount of pending memory resource requests in bytes that are not yet fulfilled by the scheduler.
|
Pending
|
Pending CPU allocation requests in virtual cores that are not yet fulfilled by the scheduler.
|
Reserved
|
Amount of reserved memory in bytes.
|
Reserved
|
Number of reserved CPU in virtual cores.
|