Udemy Apache Spark : Best Practices for High Performance Udemy
Price: USD 25

    Course details

    Apache Spark is an open source framework that provides highly generalizable methods to process data in parallel. On its own, Spark is not a data storage solution. Spark can be run locally, on a single machine with a single JVM (called local mode). More often Spark is used in tandem with a distributed storage system to write the data processed with Spark (such as HDFS, Cassandra, or S3) and a cluster manager to manage the distribution of the application across the cluster. Spark currently supports three kinds of cluster managers: the manager included in Spark, called the Standalone Cluster Manager, which requires Spark to be installed in each node of a cluster, Apache Mesos; and Hadoop YARN.

    Various components of spark

    Spark core

    Spark Sql

    Spark Streaming

    Spark Mlib

    Spark GraphLib

    Updated on 22 March, 2018
    Courses you can instantly connect with... Do an online course on Data Science starting now. See all courses

    Rate this page