NoSQL

Definition

node: In the context of distributed data store, a node represents a server in such a system which is linked to other servers.

Introduction

In the context of a distributed data store the CAP theorem takes place which states that it is impossible for such data store to simutaneously provide more than two out of the following three guarantees:

Consistency: every node in the system have access to the same data at the same moment.
Availability: every request receives a response without garantee that it contains the most recent version of the data.
Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

Source: Wikipedia.

Usually NoSQL database system are of: AP or CP.

When to use NoSQL?

Pros

Becomes relevant when having more than 10 millions lines in the database, below that mark relational databases can perfectly be as good as NoSQL performance-wise (indexes all the things).
It is easier to build a distributed and high-availability data source system using NoSQL than with relation databases.
Dynamic database structure. You don't have to define a structure thus two entities of the same type can have different structure.
Easy to increase computational and storage capacity by adding new servers (horizontal scaling).
It is easy and simple to request the system.

Cons

No transactional mode.
No join. In the context of distributed servers this would have a serious impact on the performance otherwise.
Dynamic database structure. As the structure is not forced upon the developer you can do whatever you want but it can lead to numerous problem if not done right. There are no integrity contraints.
Can lead to race conditions for data update depending on the location of the server and the time needed to access it. The update to win the race is usually the one that performs the closest to the node storing the data.
It is the developer's job to ensure data integrity through its code as the database system doesn't do it by itself.
No query language standard. When you switch to another NoSQL database you'd have to rewrite your app because they do not share the same language syntax.

Context

Capturing and requesting incoming data from a multitude of data sources such as surveillance network.
Data Service: Web oriented services, high performance/scalability, customer oriented services.

Database design

As NoSQL database systems do not offer join feature you need to duplicate data to make the relation between them. Depending on the way you choose to structure your data it can have an impact on either read or write performance. You have to decide whether you want data integrity or better read performance.

To handle relations between entities, you can either:

Fully embed the entities you are linked to in your document (good read performance and slow write performance) such as:

// document Person: Bob
{
  "id": "47c24738-abe7-4391-9a1a-84dac6da35de",
  "firstname": "Bob",
  "lastname": "Henri",
  "mother": {
    "id": "e415d0e8-919c-4b80-a82e-d0aecc6f4e06",
    "firstname": "Clara",
    "lastname": "Henri"
  }
}

// document Person: Clara
{
  "id": "e415d0e8-919c-4b80-a82e-d0aecc6f4e06",
  "firstname": "Clara",
  "lastname": "Henri",
}

So if you wanted to update Henri's mother information you'd have to update in both places. You should start to see how it can drastically impact your performance when you have to do multiple updates in different location. Let's imagine we are not talking about the mother relationship but rather the friends which can also have sub-properties plus they can be spread over different nodes that might not be on the same continent. The complexity of the update would be exponential. The longer the update the least you are sure to request up-to-date data. That's the reason you gain your read performance because each document includes everything you'd need to know but it comes with a write performance tradeoff.

Embed only the id of your document inside other linked documents (slow read performance and good write performance). If we continue with our previous example we would have:

// document Person: Bob
{
  "id": "47c24738-abe7-4391-9a1a-84dac6da35de",
  "firstname": "Bob",
  "lastname": "Henri",
  "mother": {
    "id": "e415d0e8-919c-4b80-a82e-d0aecc6f4e06"
  }
}

// document Person: Clara
{
  "id": "e415d0e8-919c-4b80-a82e-d0aecc6f4e06",
  "firstname": "Clara",
  "lastname": "Henri",
}

The update would be much faster because the data would be stored only in one document thus all other linked documents would also be updated if an update is performed on the source document. However it would slow down the read action because you would need to jump between documents to find the information you are looking for. Again in a distributed data source context this could be quite an expensive operation to perform.

In order to make the best design decision you'd need to know that upfront. For example, Facebook uses NoSQL database to store messages and posts because if they lose some information or if they are not all up-to-date that's not really important however your critical information that they sell are stored in relation databases where you can ensure data integrity.

Best solutions

Key/value

Redis
Oracle NoSQL Database
Riak-TS

Document oriented

MongoDB
CouchDB
CouchBase

Column oriented

Cassandra

Graph oriented

Neo4J

Search Engines

ElasticSearch

Conclusion

NoSQL will not replace relational databases but rather they solve problem that relation db are not good at.

NoSQL databases should not be used for management software such as ERP or accountability software where data integrity is a must have and not really a tradeoff you can afford.

Aside: NewSQL

A new brand of database system rose: NewSQL. It combines the best of both worlds: SQL and NoSQL.

Examples:

MemSQL
Google Scanner (only available in cloud)
CockRoach
PostgreSQL: there are extension to handle JSON data structure (key/value and document).
- As of 9.3, you can use JSON data types:
  - JSON: stored as string and as such you should not query on their content which would have big performance impact.
  - JSONB: stored as binary and content is searchable through @> operator with far less performance impact.
VoltDB
MariaDB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOSQL.md

NOSQL.md

NoSQL

Definition

Introduction

Categories

Key/value

Document oriented

Column oriented

Graph oriented

Search Engines

Time Series database

When to use NoSQL?

Pros

Cons

Context

Database design

Best solutions

Key/value

Document oriented

Column oriented

Graph oriented

Search Engines

Conclusion

Aside: NewSQL

Files

NOSQL.md

Latest commit

History

NOSQL.md

File metadata and controls

NoSQL

Definition

Introduction

Categories

Key/value

Document oriented

Column oriented

Graph oriented

Search Engines

Time Series database

When to use NoSQL?

Pros

Cons

Context

Database design

Best solutions

Key/value

Document oriented

Column oriented

Graph oriented

Search Engines

Conclusion

Aside: NewSQL