Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Big Data Analysis with Scala and Spark

Coursera 강좌 중 Heather Miller 교수님의 "Big Data Analysis with Scala and Spark"를 수강하며 학습한 내용을 정리하였습니다.

Week 01

  1. Data-Parallel to Distributed Data-Parallel
  2. Latency
  3. RDDs, Sparks's Distributed Collection
  4. RDDs: Transformation and Actions
  5. Evaluation in Spark: Unlike Scala Collections!
  6. Cluster Topology Matters!
  7. Weekly Summary

Week 02

  1. Reduction Operations
  2. Pair RDDs
  3. Transformations and Actions on Pair RDDs
  4. Joins
  5. Weekly Summary

Week 03

  1. Shuffling: What it is and why it's important
  2. Partitioning
  3. Optimizing with Partitioners
  4. Wide vs Narrow Dependencies
  5. Weekly Summary

Week 04

  1. Weekly Summary

Scala Programming

  1. (작성중)Scala basic
  2. Scala vs Python in Apache Spark
  3. Set up Scala

Set Up

  1. Scala and Spark Version
  2. Apache Zeppelin
  3. Use SBT
  4. Jupyter Notebook

Reference