Skip to content

A simple benchmark of noSQL databases for both read/update and MapReduce performances

License

Apache-2.0, GPL-3.0 licenses found

Licenses found

Apache-2.0
APACHE-LICENSE.txt
GPL-3.0
GPL-LICENSE.txt
Notifications You must be signed in to change notification settings

ghelmling/Wikipedia-noSQL-Benchmark

 
 

Repository files navigation

Please see http://www.nosqlbenchmarking.com for more informations

You can now build this project using Maven. As a few dependencies where not available in any maven repository, I have decided to provide them myself in the lib directory. You still have to add those jar to your local maven repository, to do so execute the script addToLocalRepo.sh or execute each of the three lines that it contains. Then you only have to run “mvn package” to build the jar that contains everything needed. It will be in the folder target and it will be called “Wikipedia-nosql-benchmark-1.0-SNAPSHOT.jar”.

For Cassandra and HBase, there is a little more configuration needed. For Cassandra edit the file storage-conf.xml to reflect you
cluster configuration. For HBase edit the file hbase-site.xml to reflect your cluster configuration and don’t forget to edit hbaseDB.java
to change the localisation of your configuration file on the line 39. Please note that this should be the path to the configuration file
on the machine on which you plan to run the jar file.

If you want to benchmark the read and update performances call the jarfile like this :

java -jar jarfile.jar dbType totalNumberOp readPercentage IP1 IP2 IP3 …

dbType can currently take the following values :

  • cassandra
  • scalaris (please note that the current implementation should be optimized)
  • voldemort (no MapReduce implementation so only read/update can be benchmarked)
  • terrastore (partial implementation, needs work)
  • riak
  • mongodb
  • hbase

totalNumberOp is the total number of operations that will be requested to the cluster.
readPercentage is the proportion of requests that will be read-only
IP1 IP2 IP3 … is a list of IP that can be as long as desired. For the systems without a load balancer, I put in the list the IP of each
active node in the cluster. For systems with a load balancer, I simply put the IP of the load balancer as many time as there are active
nodes in the cluster.

If you want to benchmark the MapReduce performances , call the jar file like this :

java -jar jarfile.jar dbType search numberOfRuns IP
dbType can take any value listed just before provided that this database has a MapReduce implementation.
search is a constant
numberOfRuns is the number of time that the search benchmark will be run
IP is the IP of one of the nodes of the cluster that accept MapReduce jobs

About

A simple benchmark of noSQL databases for both read/update and MapReduce performances

Resources

License

Apache-2.0, GPL-3.0 licenses found

Licenses found

Apache-2.0
APACHE-LICENSE.txt
GPL-3.0
GPL-LICENSE.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published