GitHub - ghelmling/Wikipedia-noSQL-Benchmark: A simple benchmark of noSQL databases for both read/update and MapReduce performances

Please see http://www.nosqlbenchmarking.com for more informations

You can now build this project using Maven. As a few dependencies where not available in any maven repository, I have decided to provide them myself in the lib directory. You still have to add those jar to your local maven repository, to do so execute the script addToLocalRepo.sh or execute each of the three lines that it contains. Then you only have to run “mvn package” to build the jar that contains everything needed. It will be in the folder target and it will be called “Wikipedia-nosql-benchmark-1.0-SNAPSHOT.jar”.

For Cassandra and HBase, there is a little more configuration needed. For Cassandra edit the file storage-conf.xml to reflect you
cluster configuration. For HBase edit the file hbase-site.xml to reflect your cluster configuration and don’t forget to edit hbaseDB.java
to change the localisation of your configuration file on the line 39. Please note that this should be the path to the configuration file
on the machine on which you plan to run the jar file.

If you want to benchmark the read and update performances call the jarfile like this :

java -jar jarfile.jar dbType totalNumberOp readPercentage IP1 IP2 IP3 …

dbType can currently take the following values :

cassandra
scalaris (please note that the current implementation should be optimized)
voldemort (no MapReduce implementation so only read/update can be benchmarked)
terrastore (partial implementation, needs work)
riak
mongodb
hbase

totalNumberOp is the total number of operations that will be requested to the cluster.
readPercentage is the proportion of requests that will be read-only
IP1 IP2 IP3 … is a list of IP that can be as long as desired. For the systems without a load balancer, I put in the list the IP of each
active node in the cluster. For systems with a load balancer, I simply put the IP of the load balancer as many time as there are active
nodes in the cluster.

If you want to benchmark the MapReduce performances , call the jar file like this :

java -jar jarfile.jar dbType search numberOfRuns IP
dbType can take any value listed just before provided that this database has a MapReduce implementation.
search is a constant
numberOfRuns is the number of time that the search benchmark will be run
IP is the IP of one of the nodes of the cluster that accept MapReduce jobs

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.settings		.settings
bin		bin
lib		lib
src		src
.classpath		.classpath
.directory		.directory
.gitignore		.gitignore
.project		.project
APACHE-LICENSE.txt		APACHE-LICENSE.txt
GPL-LICENSE.txt		GPL-LICENSE.txt
README		README
README.textile		README.textile
addToLocalRepo.sh		addToLocalRepo.sh
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

About

Licenses found

Releases

Packages

License

Licenses found

ghelmling/Wikipedia-noSQL-Benchmark

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Packages