تفاصيل الدورة

This course is for people who wantto learn how to do things, not just to fill their heads with importantconcepts, paradigms, and heaps of information they kind of know but have no idea how to use.

This course works you through the full Big Data process:

  • Data Input
  • ETL
  • Predictive Modelling using Machine Learning
  • Data Visualization
  • Deployment to AWS using AWSLambda andAmazon EMR bundle


Apache Hive is an easy SQL based tool that allows to process large amounts of data on Hadoop fast. Hive gained popularity immediately after Hadoop MapReduce became widely used as it allows to work with data by means of SQL queries.It is used by many organisations to process their data. This course shows a number of interesting Hive queries and explains what Hive UDFs are.

ApacheHiveMall is a Machine Learning library of tomorrow. Like Hive it allows to use complex machine learning algorithms knowing SQL only. No need to code, compile and debug!It is reallyeasy to use for programmers and non-programmers. Apache HiveMall Machine Learning library implements many useful Machine Learning algorithms (Supervised classification, LDA, RandomForest, etc.)using Hive UDFs. This course focuses on Text Classification when presenting HiveMall.

Hive +HiveMall is no less (or maybe even more) attractive and efficient than Spark +SparkMLib. Also, as HiveQLis more or less SQL. Knowing SQLand knowing only SQL will allow many non-developers to enter BigData world.

AWSLambda is a must to know now. Ishow how to use it with Java to make it suitable to be a part of a BigData pipeline. AWS Lambda +AmazonEMR+Hive combination is also explained.

Solr and Hueis a search engine and visualisation dashboard combination.ElasticSearch and Kibanais another such combination.Both technologies use thesame idea:use connectors to push data from Hive or Spark directly to Solr or ElasticSearch. Hue andKibana use properties and inner data representations oftheir corresponding search engines to display data on a dashboard. This course shows how to integrate Hive with both technologies.

Instead of being comprehensive this course assumes a bit of prior knowledge of the topic.It teaches bypresentingsolutions for the problems that occurred repeatedly during the time i worked on different BigDataprojects. Itshows how mastering small things gives youan ability to create a simple solution to almost every problem from concept to delivery.

We start with importing data to ApacheHive correctly, andslowly progress to an ability to quickly deliverresults of your work as an AWSservice, a Search Engine service, or a Hue dashboard.

The course shows data processing with Hive (also teaching how to write User Defined Functions for Hive of different levels of complexity:UDF, GenericUDF, UDAF and UDTF), it shows anapplication ofMachine Learning to Text Classificationusing HiveMall, and then exporting data from Hive to Solr &Hue or ElasticSearch &Kibana.You will also learn how to write anAWSLambda that runs Hive.

All together that gives you an ability to builda simple dataprocessing pipeline. A datapipeline that is simple, robust and ready to be delivered and used in no time.

تحديث بتاريخ 14 November, 2018
دورات يمكنك الالتحاق بها على الفور... خذ دورة عبر الإنترنت على Amazon Web Services (AWS) ابتداءً من الآن. See all courses