Course details
The internals of YARN, MapReduce, and HDFSDetermining the correct hardware and infrastructure for your
cluster
Proper cluster configuration and deployment to integrate with the data center
How to load data into the cluster from dynamically-generated files using Flume and from RDBMS using Sqoop
Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster
Best practices for preparing and maintaining Apache Hadoop in production
Troubleshooting, diagnosing, tuning, and solving Hadoop issues
Course Outline
Introduction
The Case for Apache Hadoop
Why Hadoop?
Core Hadoop Components
Fundamental Concepts
HDFS
HDFS Features
Writing and Reading Files
NameNode Memory Considerations
Overview of HDFS Security> Using the Namenode Web UI
Using the Hadoop File Shell
Getting Data into HDFS
Ingesting Data from External Sources with
Flume
Ingesting Data from Relational Databases with Sqoop
Best Practices for Importing Data
YARN and MapReduce
What Is MapReduce?
Basic MapReduce Concepts
YARN Cluster Architecture
Resource Allocation
Failure Recovery
Using the YARN Web UI
MapReduce Version 1
Planning Your Hadoop Cluster
• General Planning Considerations
Choosing the Right Hardware
Network Considerations
Configuring Nodes
Planning for Cluster Management
Hadoop Installation and Initial Configuration
Deployment Types
Installing Hadoop
Specifying the Hadoop Configuration
Performing Initial HDFS Configuration
Performing Initial YARN and MapReduce Configuration
Hadoop Logging
Installing and Configuring Hive, Impala, and Pig
Hive
Impala
Pig Hadoop Clients
What is a Hadoop Client?
Installing and Configuring Hadoop Clients
Installing and Configuring Hue
Hue Authentication and Authorization
Cloudera Manager / APACHE Ambari
The Motivation for Cloudera Manager /Apache Ambari
Cloudera Manager/ Apache Ambari Features
Express and Enterprise Versions
Cloudera Manager / Apache Ambari Topology
Installing Cloudera Manager / Apache Ambari
Installing Hadoop Using Cloudera Manager / Apache Ambari
• Performing Basic Administration Tasks Using Cloudera Manager / Apache Ambari
Advanced Cluster Configuration
Configuring Hadoop Ports
Explicitly Including and Excluding Hosts
Configuring HDFS for Rack Awareness
Configuring HDFS High Availability
Hadoop Security
Why Hadoop Security Is Important
Hadoop’s Security System Concepts
What Kerberos Is and How it Works
Cluster Maintenance
Checking HDFS Status
Copying Data Between Clusters
Adding and Removing Cluster Nodes
Rebalancing the Cluster
Cluster Upgrading
Cluster Monitoring and Troubleshooting
General System Monitoring
Monitoring Hadoop Clusters
Common Troubleshooting Hadoop Clusters
Common Misconfigurations
Conclusion Updated on 27 June, 2018
About Agilitics Pte. Ltd.
Agilitics Pte. Ltd. is Singapore headquartered, Data and Business Analytics focussed company. We are the real experts of the big data domain.
Established in 2013, Head quartered at Singapore,
Agilitics Pte Ltd is a leading Big Data Analytics and Agile Consulting and Training solutions provider
Our Tagline is Agility + Analytics Delivered.
We offer a comprehensive range of Big data ecosystem and Agile management solution, services and expertise for Information Management, Data Analytics, Machine Learning, Artificial Intelligence and Smart City Solutions
- JavaScript Full stack web developer virtual internship Virtual Bootcamp + Internship at LaimoonAED 1,449Duration: Upto 30 Hours
- Big Data Hadoop: SQL & NoSQL Skill-UpSAR 51Duration: Upto 23 Hours
- SAR 79
SAR 357Duration: 28 Hours