Skip to content

[ECE NTUA] Advanced Topics in Databases - Course project (2022-2023)

Notifications You must be signed in to change notification settings

dbakalis8/Advanced_Topics_in_DataBases_NTUA_2022-2023

Repository files navigation

Advanced Topics in Databases 2022-2023 NTUA

Installation

First of all, you must have a local network with 2 or more vms connected.

To install Hadoop and Spark follow the steps below:

First, install Hadoop follow the link -> Hadoop

Then, set up Yarn Cluster follow the link -> Yarn

Finally, install Spark with pdf adove or follow the link -> Spark

How to execute the program (with 2 workers)

1.Connect to master vm:

ssh (master vm connection string)

2.Start Hadoop and Spark in master vm:

start-dfs.sh
start-master.sh

3.Upload data in Hadoop Distributed File System (HDFS), according to the example below:

hadoop fs -put ./yellow_tripdata_2022-01.parquet hdfs://master:9000/par/yellow_tripdata_2022-01.parquet

4.Start a worker in master:

start-worker.sh spark://192.168.0.2:7077

5.Start a worker in slave by typing the following instructions in the master vm:

ssh (slave vm connection string)
start-worker.sh spark://192.168.0.2:7077

6.Submit the task in Spark environment (in the master vm and in the directory of the file):

spark-submit (filename)

7.Results

See the results in the terminal

Team members:

Dimitris Kalathas, Dimitris Bakalis

The assignment is in greek.

About

[ECE NTUA] Advanced Topics in Databases - Course project (2022-2023)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages