تفاصيل الدورة

Overview

This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Duration    
3 days

Course Objectives

• Recognize use cases for data science on Hadoop
• Describe the Hadoop and YARN architecture
• Describe supervised and unsupervised learning differences
• Use Mahout to run a machine learning algorithm on Hadoop
• Describe the data science life cycle
• Use Pig to transform and prepare data on Hadoop
• Write a Python script
• Describe options for running Python code on a Hadoop cluster
• Write a Pig User-Defined Function in Python
• Use Pig streaming on Hadoop with a Python script
• Use machine learning algorithms
• Describe use cases for Natural Language Processing (NLP)
• Use the Natural Language Toolkit (NLTK)
• Describe the components of a Spark application
• Write a Spark application in Python
• Run machine learning algorithms using Spark MLlib
• Take data science into production
  تحديث بتاريخ 27 June, 2018

المتطلبات

Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP 

دورات يمكنك الالتحاق بها على الفور... خذ دورة عبر الإنترنت على Data Science ابتداءً من الآن. See all courses

هل هذه الدورة التدريبية الاختيار المناسب لك؟

قيِم هذه الصفحة

لم تجد ما كنت تبحث عنه؟

أو