The Big Data and Hadoop training course from Lemuria is designed to enhance your knowledge and skills to become a successful Hadoop developer. In-depth knowledge of core concepts will be covered in the course along with implementation on varied industry use-cases.
Course ObjectivesBy the end of the course, you will:
- Big Data Introduction
- Hadoop Introduction
- What is Hadoop? Why Hadoop?
- Hadoop History?
- Different types of Components in Hadoop?
- HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
- What is the scope of Hadoop?
CIOs are making Hadoop their platform of choice in 2016. For better career prospects, bigger job opportunities and financial growth, Hadoop is a must-know.
Check out these Lemuria blogs to know why Hadoop training is critical:
How essential is Hadoop training?
Gartner Predicts How Big Data Can Lead Economic Growth
You can master Hadoop, irrespective of your IT background. While basic knowledge of Core Java and SQL might help, it is not a pre-requisite for learning Hadoop. In case you wish to brush-up your Java skills, Lemuria offers you a complimentary self-paced course: "Java essentials for Hadoop".
Learning Objectives - In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop Architecture, HDFS, Anatomy of File Write and Read, how MapReduce Framework works.
Topics - Big Data, Limitations and Solutions of existing Data Analytics Architecture, Hadoop, Hadoop Features, Hadoop Ecosystem, Hadoop 2.x core components, Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework, Hadoop Different Distributions.
• Introduction of HDFS • HDFS Design • HDFS role in Hadoop • Features of HDFS • Daemons of Hadoop and its functionality • Name Node • Secondary Name Node • Job Tracker • Data Node • Task Tracker • Anatomy of File Wright • Anatomy of File Read • Network Topology • Nodes • Racks • Data Center • Parallel Copying using DistCp • Basic Configuration for HDFS • Data Organization • Blocks and • Replication • Rack Awareness • Heartbeat Signal • How to Store the Data into HDFS • How to Read the Data from HDFS • Accessing HDFS (Introduction of Basic UNIX commands) • CLI commands
MapReduce using Java (Processing the Data):-• The introduction of MapReduce. • MapReduce Architecture • Data flow in MapReduce • Splits • Mapper • Portioning • Sort and shuffle • Combiner • Reducer • Understand Difference Between Block and InputSplit • Role of RecordReader • Basic Configuration of MapReduce • MapReduce life cycle • Driver Code • Mapper • and Reducer • How MapReduce Works • Writing and Executing the Basic MapReduce Program using Java • Submission & Initialization of MapReduce Job. • File Input/Output Formats in MapReduce Jobs • Text Input Format • Key Value Input Format • Sequence File Input Format • NLine Input Format • Joins • Map-side Joins • Reducer-side Joins • Word Count Example • Partition MapReduce Program • Side Data Distribution • Distributed Cache (with Program) • Counters (with Program) • Types of Counters • Task Counters • Job Counters • User Defined Counters • Propagation of Counters • Job Scheduling
PIG:-• Introduction to Apache PIG • Introduction to PIG Data Flow Engine • MapReduce vs. PIG in detail • When should PIG use? • Data Types in PIG • Basic PIG programming • Modes of Execution in PIG • Local Mode and • MapReduce Mode • Execution Mechanisms • Grunt Shell • Script • Embedded • Operators/Transformations in PIG • PIG UDF’s with Program • Word Count Example in PIG • The difference between the MapReduce and PIG
SQOOP:-• Introduction to SQOOP • Use of SQOOP • Connect to MYSql database • SQOOP commands • Import • Export • Eval oCodegen etc… • Joins in SQOOP • Export to MySQL • Export to HBase
HIVE:-• Introduction to HIVE • HIVE Meta Store • HIVE Architecture • Tables in HIVE • Managed Tables • External Tables • Hive Data Types • Primitive Types • Complex Types • Partition • Joins in HIVE • HIVE UDF’s and UADF’s with Programs
HBASE:-After successfully completing this Module, you will be able to:
• Introduction to HBASE • Basic Configurations of HBASE • Fundamentals of HBase • What is NoSQL? • HBase Data Model • Table and Row • Column Family and Column Qualifier • Cell and its Versioning • Categories of NoSQL Data Bases • Key-Value Database • Document Database • Column Family Database • HBASE Architecture • HMaster • Region Servers • Regions • MemStore • Store • SQL vs. NOSQL • How HBASE is differed from RDBMS • HDFS vs. HBase • Client-side buffering or bulk uploads • HBase Designing Tables • HBase Operations • Get • Scan • Put • Delete
Zookeeper• Introduction Zookeeper • Data Modal • Operations
OOZIE• Introduction to OOZIE • Use of OOZIE • Where to use?
Flume• Introduction to Flume • Uses of Flume • Flume Architecture • Flume Master • Flume Collectors • Flume Agents
Hadoop Architecture and HDFSLearning Objectives - In this module, you will learn the Hadoop Cluster Architecture, Important Configuration files in a Hadoop Cluster, Data Loading Techniques, how to setup single node and multi node hadoop cluster.
Topics - Hadoop 2.x Cluster Architecture - Federation and High Availability, A Typical Production Hadoop Cluster, Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Single node cluster and Multi node cluster set up Hadoop Administration.
Learning Objectives - In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will understand concepts like Input Splits in MapReduce, Combiner & Partitioner and Demos on MapReduce using different data sets.
Topics - MapReduce Use Cases, Traditional way Vs MapReduce way, Why MapReduce, Hadoop 2.x MapReduce Architecture, Hadoop 2.x MapReduce Components, YARN MR Application Execution Flow, YARN Workflow, Anatomy of MapReduce Program, Demo on MapReduce. Input Splits, Relation between Input Splits and HDFS Blocks, MapReduce: Combiner & Partitioner, Demo on de-identifying Health Care Data set, Demo on Weather Data set.
Learning Objectives - In this module, you will learn Advanced MapReduce concepts such as Counters, Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format and XML parsing.
Topics - Counters, Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format, Xml file Parsing using MapReduce.
Learning Objectives - In this module, you will learn Pig, types of use case we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, PIG running modes, PIG UDF, Pig Streaming, Testing PIG Scripts. Demo on healthcare dataset.
Topics - About Pig, MapReduce Vs Pig, Pig Use Cases, Programming Structure in Pig, Pig Running Modes, Pig components, Pig Execution, Pig Latin Program, Data Models in Pig, Pig Data Types, Shell and Utility Commands, Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Specialized joins in Pig, Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank, Parameter Substitution ( PIG macros and Pig Parameter substitution ), Pig Streaming, Testing Pig scripts with Punit, Aviation use case in PIG, Pig Demo on Healthcare Data set.
At the end of your course, you will work on a real time Project. You will receive a Problem Statement along with a data-set to work. Once you are successfully through the project (Reviewed by an expert), you will be awarded a certificate with a performance-based grading. If your project is not approved in 1st attempt, you can take extra assistance for any of your doubts to understand the concepts better and reattempt the Project free of cost.
At the end of your course, you will work on a real time Project. You will receive a Problem Statement along with a data-set to work. Once you are successfully through the project (Reviewed by an expert), you will be awarded a certificate with a performance-based grading. If your project is not approved in 1st attempt, you can take extra assistance for any of your doubts to understand the concepts better and reattempt the Project free of cost.