Course details

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. In Spark 2.0.2, SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames) but on large datasets. SparkR also supports distributed machine learning using MLlib.

You will learn how to create spark cluster in Databricks.

You will learn how to create dataframes and grouping data and aggregating data.

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. A Hadoop frame-worked application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage


Prerequisites:

You should have basic knowledge of Spark and R


  • Who have  some R experience that wants to learn about big data solutions
  • Who are interested in SparkR and Hadoop
  • Who are interested in Spark and cluster computing


Updated on 20 February, 2018
Courses you can instantly connect with... Do an online course on Big Data starting now. See all courses

Rate this page