GitHub - xuyimeng/MapReduce-Framework: MapReduce Framework based on Storm that is flexible for any MapReduce work. Built with a number of workers and a single master.Used BerkeleyDB as temporary data storage in case of big data processing

xuyimeng / MapReduce-Framework Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

MapReduce Framework based on Storm that is flexible for any MapReduce work. Built with a number of workers and a single master.Used BerkeleyDB as temporary data storage in case of big data processing

1 star 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.settings		.settings
WebContent		WebContent
build/classes		build/classes
classes		classes
database		database
jetty		jetty
lib		lib
resources		resources
src		src
store		store
target		target
.classpath		.classpath
.project		.project
README		README
build.xml		build.xml
master.war		master.war
submit-hw3.zip		submit-hw3.zip
words.txt		words.txt
words.txt.0		words.txt.0
words.txt.1		words.txt.1
worker.war		worker.war

Repository files navigation

Author name:  ____Yimeng Xu__________
**********************************************************************
Introduction for this project:
This project build a distributed framework which essentially emulates Apache Storm.
— and the process also emulates MapReduce. The framework is tested by simple WordCount.

Storm conduct computation over units of data in streams. I extend it to MapReduce-style architecture with two types of nodes: a number of workers and a single master. The communication between master and slave nodes is done by servlet and servlet container(Tomcat/Jetty). 

Map and reduce functions are implemented within StormLite bolts. Workers run spouts and bolts and store the data that MapReduce framework is working on. The master coordinates the workers and provides a user Interface. A key issue is that in a MapReduce job, the inputs end and the reduce can only start once all inputs have been read. 

**********************************************************************
instructions for building and running the solution:

First run masterservlet.java on Tomcat/Jetty
Then run WorkerServer.java by specifying three args in run configuration:
         input argument: [Master IP:port],[storage dir],[worker port]
Then in browser at localhost:8000/status could see status of workers and submit job

About

MapReduce Framework based on Storm that is flexible for any MapReduce work. Built with a number of workers and a single master.Used BerkeleyDB as temporary data storage in case of big data processing

java storm mapreduce-jobs

Readme

Activity

1 star

2 watching

0 forks

Report repository