Course details

Apache Hadoop Data Analyst

Hadoop Introduction:
  • Why we need Hadoop
  • Why Hadoop is in demand in market now a days
  • Where expensive SQL based tools are failing
  • Key points , Why Hadoop is leading tool in current It Industry Definition of BigData
  • Hadoop nodes
  • Introduction to Hadoop Release-1
  • Hadoop Daemons in Hadoop Release-1
  • Introduction to Hadoop Release-2
  • Hadoop Daemons in Hadoop Release-2
  • Hadoop Cluster and Racks
  • Hadoop Cluster Demo
  • New projects on Hadoop
  • How Open Source tools is capable to run jobs in lesser time Hadoop Storage - HDFS (Hadoop Distributed file system) Hadoop Processing Framework (Map Reduce / YARN) Alternates of Map Reduce
  • Why NOSQL is in much demand instead of SQL
  • Distributed warehouse for HDFS
  • Hadoop Ecosystem and its usages
  • Data import/Export tools
 
Hadoop Installation and Hands-on on Hadoop machine : Hadoop installation
  • Introduction to Hadoop FS and Processing Environment's UIs How to read and write files
  • Basic Unix commands for Hadoop
  • Hadoop FS shell
  • Hadoop releases practical
  • Hadoop daemons practical

ETL Tool (Pig) Introduction Level-1 (Basics)
Pig Introduction
  • Why Pig if Map Reduce is there?
  • How Pig is different from Programming languages Pig Data flow Introduction
  • How Schema is optional in Pig
  • Pig Data types
  • Pig Commands - Load, Store , Describe , Dump Map Reduce job started by Pig Commands
  • Execution plan
 
ETL Tool (Pig) Level-2 (Complex)
Pig- UDFs
  • Pig Use cases
  • Pig Assignment
  • Complex Use cases on Pig
  • Real time scenarios on Pig
  • When we should use Pig
  • When we shouldn't use Pig
 
Hive Warehouse
  • Hive Introduction
  • Meta storage and meta store
  • Introduction to Derby Database
  • Hive Data types
  • HQL
  • DDL, DML and sub languages of Hive
  • Internal , external and Temp tables in Hive
  • Differentiation between SQL based Datawarehouse and Hive
 
Hive Level-2 (Complex)
Hive releases
  • Why Hive is not best solution for OLTP OLAP in Hive
  • Partitioning
  • Bucketing
  • Hive Architecture
  • Hue Interface for Hive
  • How to analyze data using Hive script Differentiation between Hive and Impala UDFs in Hive
  • Complex Use cases in Hive
  • Hive Advanced Assignment

Introduction to Map Reduce
How Map Reduce works as Processing Framework End to End execution flow of Map Reduce job Different tasks in Map Reduce job
  • Why Reducer is optional while Mapper is mandatory? Introduction to Combiner
  • Introduction to Partitioner
  • Programming languages for Map Reduce
  • Why Java is preferred for Map Reduce programming
 
NOSQL Databases and Introduction to HBase
  • Introduction to NOSQL
  • Why NOSQL if SQL is in market since several years
  • Databases in market based on NOSQL
  • CAP Theorem
  • ACID Vs. CAP
  • OLTP Solutions with different capabilities
  • Which Nosql based solution is capable to handle specific requirements Examples of companies that uses NOSQL based databases
  • HBase Architecture of column families
 
Zookeeper and SQOOP
Introduction to Zookeeper
  • How Zookeeper helps in Hadoop Ecosystem
  • How to load data from Relational storage in Hadoop Sqoop basics
  • Sqoop practical implementation
  • Sqoop alternative
  • Sqoop connector
Flume , Oozie and YARN
How to load data streaming data without fixe schema
  • How to load unstructured and semi structured data in Hadoop Introduction to Flume
  • Hands-on on Flume
  • How to load Twitter data in HDFS using Hadoop
  • Introduction to Oozie
  • How to schedule jobs using Oozie
  • What kind of jobs can be scheduled using Oozie
  • How to schedule jobs which are time based
  • Hadoop releases
  • From where to get Hadoop and other components to install
  • Introduction to YARN
  • Significance of YARN

Apache Spark Basics
  • Introduction to Spark
  • Basics Features of SPARK and Scala available in Hue Why Spark demand is increasing in market
  • How can we use Spark with Hadoop Eco System Datasets for practice purpose
 
Emerging Trends of Big Data
  • YARN
  • Emerging Technologies of Big Data
  • Emerging use cases IoT, Industrial Internet, New Applications
  • Certifications and
  • Job Opportunities
Updated on 27 June, 2018

About Agilitics Pte. Ltd.

Agilitics Pte. Ltd. is Singapore headquartered, Data and Business Analytics focussed company. We are the real experts of the big data domain. 

Established in 2013, Head quartered at Singapore,

Agilitics Pte Ltd is a leading Big Data Analytics and Agile Consulting and Training solutions provider

Our Tagline is Agility + Analytics Delivered.

We offer a comprehensive range of Big data ecosystem and Agile management solution, services and expertise for Information Management, Data Analytics, Machine Learning, Artificial Intelligence and Smart City Solutions

See all Agilitics Pte. Ltd. courses
Courses you can instantly connect with... Do an online course on HADOOP starting now. See all courses

Is this the right course for you?

Didn't find what you were looking for ?

or