Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster using Cloudera Manager. From installation and configuration through load balancing and tuning, Hadoop Administration training course is the best preparation for the real-world challenges faced by Hadoop administrators where they learn to configure, deploy, administer, maintain, monitor and troubleshoot a Hadoop Cluster.
Course ObjectivesWhat you will Learn?
- Determine the correct hardware and infrastructure for your cluster
- You will be setting your own Hadoop cluster with Single and Mulitnode Cluster
- Cluster configuration and deployment to integrate with the data center
- Brand the Hadoop environment making it more suitable for operational applications using MapReduce and YARN
- Internals of YARN, MapReduce, Spark, and HDFS
- Extensively work hand on with hdfsdfs, hdfsdfsadmin, hadoopdistcp, mapred, yarn commands
- Best practices for preparing and maintaining Apache Hadoop in production
- Troubleshooting, diagnosing, tuning, and solving Hadoop issues
- Cloudera Manager features that make managing your clusters easier, such as Aggregated logging
Configuration management
Resource management and reports
Alerts, and service management
Who is this for?
Apache Hadoop
- WhyHadoop?
- Fundamental Concepts
- Hadoop Components
- Cluster Management Solution
- Cloudera Manager Installation
- Hadoop Installation
- HDFS Features
- Writing and Reading Files
- NameNode Memory Considerations
- HDFS Security
- Web UIs for HDFS
- Hadoop File Shell
- YARN: Cluster Resource Manager
- MapReduce Concepts
- Apache Spark Concepts
- Frameworks on YARN
- Exploring YARN
- Managing Configurations
- Managing Role Instances and Adding Services
- Configuring the HDFS Service
- Configuring Hadoop Logs
- Configuring the YARN Service
- Insert Data From External Sources With Flume
- Insert Data From Relational Databases With Sqoop
- REST Interfaces
- Best Practices for Importing Data
- Planning Considerations
- Hardware Consideration
- Network Considerations
- Configuring Nodes
- Single Node Cluster Configuration
- Multi-Node Cluster Configuration
Hadoop Clients
- Hadoop Clients
- Installing and Configuring Hadoop Clients
- Installing and Configuring Hue
- Hue Authentication and Authorization
- Configuration Parameters
- Configuring Hadoop Ports
- HDFS for Rack Awareness
- HDFS High Availability
- Hadoop Security Concepts
- Hadoop Cluster With Kerberos
- Fair Scheduler
- Configuring Dynamic Resource Pools
- YARN Memory and CPU Settings
- Query Scheduling
- Managing Jobs
- How to stop and start jobs running on the cluster
- HDFS Status
- Move Data Between Clusters
- Adding and Removing Cluster Nodes
- Rebalancing
- Cluster Upgrading
- Name Node Metadata Backup
- Monitoring Features
- Monitoring Hadoop Clusters
- Troubleshooting Hadoop Clusters
At the end of your course, you will work on a real time Project. You will receive a Problem Statement along with a dataset to work. Once you are successfully through with the project (reviewed by an expert), you will be awarded a certificate with a performance based grading. If your project is not approved in 1st attempt, you can take additional assistance to understand the concepts better and reattempt the Project free of cost.
At the end of your course, you will work on a real time Project. You will receive a Problem Statement along with a dataset to work. Once you are successfully through with the project (reviewed by an expert), you will be awarded a certificate with a performance based grading. If your project is not approved in 1st attempt, you can take additional assistance to understand the concepts better and reattempt the Project free of cost.