تفاصيل الدورة

Apache Spark is an open source framework that provides highly generalizable methods to process data in parallel. On its own, Spark is not a data storage solution. Spark can be run locally, on a single machine with a single JVM (called local mode). More often Spark is used in tandem with a distributed storage system to write the data processed with Spark (such as HDFS, Cassandra, or S3) and a cluster manager to manage the distribution of the application across the cluster. Spark currently supports three kinds of cluster managers: the manager included in Spark, called the Standalone Cluster Manager, which requires Spark to be installed in each node of a cluster, Apache Mesos; and Hadoop YARN.

Various components of spark

Spark core

Spark Sql

Spark Streaming

Spark Mlib

Spark GraphLib

تحديث بتاريخ 22 March, 2018
دورات يمكنك الالتحاق بها على الفور... خذ دورة عبر الإنترنت على Data Science ابتداءً من الآن. See all courses