Docker in Local OS

Georgia Tech big data bootcamp training material

If we want to start this environment, we should

  • prepare a docker environment in local machine
  • pull docker image sunlab/bigdata:0.04.1 or sunlab/bigdata:0.05

1. Install Docker

There is an official tutorial for docker here:


OSX users can also install Docker via HomeBrew

$ brew install Caskroom/cask/virtualbox
$ brew install docker-machine
$ brew install docker

To keep the Docker service active, we can use brew's service manager

$ brew services start docker-machine
==> Successfully started `docker-machine` (label: homebrew.mxcl.docker-machine)

check the status:

$ brew services list
Name           Status  User Plist
docker-machine started name   /Users/name/Library/LaunchAgents/homebrew.mxcl.docker-machine.plist

We can create a default instance as this link:

$ docker-machine create --driver virtualbox --virtualbox-memory 4096  default

At least 4GB memory for vm is required.

Execute the following command before using other docker commands.

$ eval $(docker-machine env default)

CentOS 7

Just simply install

$ sudo yum install docker
$ sudo service  docker start
$ chkconfig docker on

Some Common issues :

  1. When using SELinux + BTRFS, you may meet an error message as follow:
# systemctl status docker.service -l
SELinux is not supported with the BTRFS graph driver!

Modify /etc/sysconfig/docker as follow:

# Modify these options if you want to change the way the docker daemon runs

Restart your docker service

  1. Storage Issue: Error message found in /var/log/upstart/docker.log
[graphdriver] using prior storage driver \"btrfs\"...

Just delete directory /var/lib/docker and restart docker service


2. Pull and run Docker image

(1) Start the container with:

Ver 0.05 for Spark 2.0, etc. (Jupyter and Zeppelin will be added soon)

docker run -it --privileged=true -m 4096m -h bootcamp1.docker sunlab/bigdata:0.05 /bin/bash

Ver 0.04.1 for Spark 1.5 with Jupyter and Zeppelin

docker run -it --privileged=true -m 4096m -p 2222:22 -p 8888:8888 -p 8889:8889 -h bootcamp1.docker sunlab/bigdata:0.04.1 /bin/bash

it will expose three ports: 22, 8888, 8889 to host environment

-p host-port:vm-port

means you will visit host-port in your host environment and it will forward the message to vm-port in docker container. You can change this parameter host-port if you meet error like "port already in use".

Each vm-port is linked to:

8888 - Jupyter Notebook
8889 - Zeppelin Notebook

After you run Docker and start services as Step 2, you can access each Notebook with your web browser via:

In Linux (Ubuntu, CentOS, Fedora, ...)

You just need to visit "localhost:8888", or other port number if you changed host-port


You should get the Docker's IP first with command as follow:

$ printenv  | grep "DOCKER_HOST"

then, you can visit or , or you changed.

(2) Start all necessary services

sudo service sshd start
sudo service zookeeper-server start
sudo service hadoop-yarn-proxyserver start
sudo service hadoop-hdfs-namenode start
sudo service hadoop-hdfs-datanode start
sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-mapreduce-historyserver start
sudo service hadoop-yarn-nodemanager start
sudo service spark-worker start
sudo service spark-master start
sudo service hbase-regionserver start
sudo service hbase-master start
sudo service hbase-thrift start

If you have chosen 0.04.1 to use zeppelin, start one more service

sudo service zeppelin start

To use Jupyter Notebook:

jupyter notebook --ip=

parameter --ip= makes Jupyter allow you to visit this service via your web browser and it will open a web service listening port 8888.

(3) Stop all services

You can stop services if you want:

sudo service zookeeper-server stop
sudo service hadoop-yarn-proxyserver stop
sudo service hadoop-hdfs-namenode stop
sudo service hadoop-hdfs-datanode stop
sudo service hadoop-yarn-resourcemanager stop
sudo service hadoop-mapreduce-historyserver stop
sudo service hadoop-yarn-nodemanager stop
sudo service spark-worker stop
sudo service spark-master stop
sudo service hbase-regionserver stop
sudo service hbase-master stop
sudo service hbase-thrift stop

and if you used Zeppelin,

sudo service zeppelin stop

(4) Detach or Exit

To detach instance for keeping it up,

ctrl + p, ctrl + q

To exit,


(5) Re-attach

If you detached a instance and want to attach again, check the CONTAINER ID or NAMES of it.

$ docker ps -a
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS                      PORTS                                                    NAMES
c6b265ebd7d2        sunlab/bigdata:0.04.1   "/tini -- /bin/bash"   9 minutes ago       Up 4 seconds      >8888-8889/tcp,>22/tcp   berserk_hypatia
cd6b3e243157        sunlab/bigdata:0.04.1   "/tini -- /bin/bash"   23 minutes ago      Created                                                                              loving_hoover
92169a84b9a1        sunlab/bigdata:0.04.1   "/tini -- /bin/bash"   2 hours ago         Exited (0) 22 minutes ago                                                            peaceful_franklin

Then attach it by:

$ docker attach <CONTAINER ID or NAMES>

(5) Destroy instance

If you want to permanently remove instance

$ docker rm <CONTAINER ID or NAMES>