Big Data Computing

Homeworks:

This repository contains all the assignments of the Big Data Computing course.
The purpose of those homeworks is to get acquainted with Spark and with its use to implement MapReduce algorithms.
The last 2 homeworks are focused on the k-center with z outliers problem, a robust version of the k-center problem which is useful in the analysis of noisy data.

Homework 1:
In the first homework I developed a Spark program to analyze a dataset of an online retailer which contains several transactions made by customers, where a transaction represents several products purchased by a customer.
Homework 2:
In the second homework I implemented the 3-approximation sequential algorithm for the k-center with z outliers problem. This algorithm proposed by Charikar et al. is simple to implement but has superlinear complexity.
Homework 3:
In the third homework I implemented a 2-round MapReduce coreset-based algorithm for the k-center with z outliers problem, where the use of the inefficient 3-approximation is confined to a small coreset computed in parallel through the efficient Farthest-First Traversal.
This efficient implementation was run on a big dataset (about 1.2M points in 7 dimensions) on the CloudVeneto cluster.

Plots:

I have also developed some programs that generate plots of the results of the different clustering algorithms.
Here I leave two examples of the results obtained on an Uber dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
HW1		HW1
HW2		HW2
HW3		HW3
WordCount		WordCount
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Computing

Homeworks:

Plots:

About

Languages

federicochiarello/BigDataComputing

Folders and files

Latest commit

History

Repository files navigation

Big Data Computing

Homeworks:

Plots:

About

Topics

Resources

Stars

Watchers

Forks

Languages