Skip to content

Latest commit

 

History

History
54 lines (38 loc) · 1.61 KB

File metadata and controls

54 lines (38 loc) · 1.61 KB

Automated containerized Apache Spark Cluster

Description

This repository contains instructions to install ApacheSpark cluster using docker containers. Following sections explain steps to build the Apache Spark cluster using this repository.

How to run

Step 0: Pre-requisites

  1. All host machines should have Docker engine installed (Follow Docker installation guide https://docs.docker.com/engine/install/
  2. Add master host and worker hosts by swarm network.

---->Initiate docker swarm (Only on Master host) with below command

$ docker swarm init --advertise-addr <master-ip-address>  #Replace master-ip-address with your IP address.

---->Add worker hosts to docker swarm (Only on worker hosts).Docker init command generates a token and complete line of command (looks like as shown below) which can be executed on all worker hosts to add them to Swarm network.

$ docker swarm join --token <token from swarm init output > master-ip-address:2377

Step 1: Clone this git repository.

$ git clone https://github.com/gprasad09/Apache-Spark-Cluster-Project.git

Step 2: Build and start the spark cluster by executing the command below.

$ ./start.sh

Step 3: Launch Jupyter notebook

$ ./jupyter.sh

Step 4: Execute port forward command so that you can access Spark UI and Jupyter on your local machine.

$ ssh  -L 8888:localhost:8888 -L 8080:localhost:8080 userid@master-host-address

Step 5: Stop the cluster once you are done with you work on Apache Cluster

 $./stop.sh