Course details
Apache Spark is an open source framework that provides highly generalizable methods to process data in parallel. On its own, Spark is not a data storage solution. Spark can be run locally, on a single machine with a single JVM (called local mode). More often Spark is used in tandem with a distributed storage system to write the data processed with Spark (such as HDFS, Cassandra, or S3) and a cluster manager to manage the distribution of the application across the cluster. Spark currently supports three kinds of cluster managers: the manager included in Spark, called the Standalone Cluster Manager, which requires Spark to be installed in each node of a cluster, Apache Mesos; and Hadoop YARN.
Various components of spark
Spark core
Spark Sql
Spark Streaming
Spark Mlib
Spark GraphLib
Updated on 22 March, 2018
- JavaScript Full stack web developer virtual internship Virtual Bootcamp + Internship at LaimoonAED 1,449Duration: Upto 30 Hours
- Python + JavaScript + Microsoft SQL Course LineAED 88Duration: Upto 23 Hours
- Data Scientist Diploma (Master's level) City of London College of EconomicsAED 1,098Duration: Upto 6 Months