Coursework for BerkeleyX CS120x Distributed Machine Learning with Apache Spark (edX), the second course of the Data Science and Engineering with Spark XSeries.
For learning and evaluation purposes, the course uses Python notebooks in Databricks Community Edition.
./lab0
: Lab 0 - Running Your First Notebook. Public notebook here./lab1a
: Lab 1a - Math and Python review. Public notebook here./lab1b
: Lab 1b - Word Count Lab: Building a word count application. Public notebook here./lab2
: Lab 2 - Linear Regression Lab. Public notebook here./lab3
: Lab 3 - Click-Through Rate Prediction Lab. Public notebook here./lab4
: Lab 4 - Principal Component Analysis Lab. Public notebook here