Big Data Bootcamp

Georgia Tech Big Data Bootcamp training material


Welcome to the Big Data Bootcamp. This training material is has been developed by Sunlab and Polo Club. By the end of the training, you will learn about the big data tools that are part of the Hadoop and Spark ecosystems.

The training material sample data is for healthcare applications, but you can adapt what you learned to other domains. There is no requirement of healthcare background knowledge.

To get started, please setup the learning environment first.

Content Summary

Content of the training material is divided into two chapters Hadoop and Spark.

Hadoop Ecosystem

  1. HDFS Basics
  2. MapReduce Basics
  3. Hadoop HBase
  4. Hadoop Streaming
  5. Hadoop Pig
  6. Hadoop Hive

Spark Ecosystem

  1. Scala Basics
  2. Spark Basics
  3. Spark SQL
  4. Spark Application
  5. Spark MLlib
  6. Spark GraphX