Course details
Apache Spark is an open source framework that provides highly generalizable methods to process data in parallel. On its own, Spark is not a data storage solution. Spark can be run locally, on a single machine with a single JVM (called local mode). More often Spark is used in tandem with a distributed storage system to write the data processed with Spark (such as HDFS, Cassandra, or S3) and a cluster manager to manage the distribution of the application across the cluster. Spark currently supports three kinds of cluster managers: the manager included in Spark, called the Standalone Cluster Manager, which requires Spark to be installed in each node of a cluster, Apache Mesos; and Hadoop YARN.
Various components of spark
Spark core
Spark Sql
Spark Streaming
Spark Mlib
Spark GraphLib
Updated on 22 March, 2018
- JavaScript Full stack web developer virtual internship Virtual Bootcamp + Internship at LaimoonAED 1,449Duration: Upto 30 Hours
- Data Analytics NextGen LearningUSD 12Duration: 12 Hours
- Certificate in Power BI VskillsUSD 49Duration: 1 To 2 Months