Course details

This course explores several modern machine learning and data science techniques in R. As you probably know, R is one of the most used tools among data scientists. We showcase a wide array of statistical and machine learning techniques. In particular:

  • Using R's statistical functions for drawing random numbers, calculating densities, histograms, etc.
  • Supervised ML problems using the CARET package
  • Data processing using sqldf, caret, etc.
  • Unsupervised techniques such as PCA, DBSCAN, K-means
  • Calling Deep Learning models in Keras(Python) from R
  • Use the powerful XGBOOST method for both regression and classification
  • Doing interesting plots, such as geo-heatmaps and interactive plots
  • Train ML train hyperparameters for several ML methods using caret
  • Do linear regression in R, build log-log models, and do ANOVA analysis
  • Estimate mixed effects models to explicitly model the covariances between observations
  • Train outlier robust models using robust regression and quantile regression
  • Identify outliers and novel observations
  • Estimate ARIMA (time series) models to predict temporal variables

Most of the examples presented in this course come from real datasets collected from the web such as Kaggle, the US Census Bureau, etc. All the lectures can be downloaded and come with the corresponding material. The teaching approach is to briefly introduce each technique, and focus on the computational aspect. The mathematical formulas are avoided as much as possible, so as to concentrate on the practical implementations.

This course covers most of what you would need to work as a data scientist, or compete in Kaggle competitions. It is assumed that you already have some exposure to data science / statistics. 

Updated on 18 February, 2018
Courses you can instantly connect with... Do an online course on Data Science starting now. See all courses

Rate this page