In this blog, I will be discussing about NoSQL databases, how different are NoSQL Databases compared with Relational databases, different types of NoSQL Databases.
Contrary to their name, NoSQL databases are not databases without SQL (Structured Query Language) capabilities nor are they a single product or technology. NoSQL databases are a group of data storage and manipulation technologies that do not have relational capabilities. Some of the NoSQL databases, in fact, do permit querying in SQL or SQL-like languages but they do not have fixed schemas. So, a more appropriate name for these set of products could be NoREL (No Relational) or the acronym NoSQL can be thought of as a short hand for ‘Not Only SQL’.
Traditionally, Relational Databases (RDBMS) have been used to store data required for processing in applications. However, over the past couple of decades, as data started exceeding the processing capacity of traditional databases, there became a need to have alternative storage and retrieval mechanisms. Coupled with the advent of Big Data, the problem of having to process large amount (Volume) of unstructured data (Variety) in real-time (Velocity) became even more acute.
Since Relational Databases could not address this need, it led to the popularity and prevalence of NoSQL databases. NoSQL databases provide us with mechanisms to store and retrieve data for Big Data Analytics along with capabilities for schema-less data structures, horizontal scaling, high availability and alternative query methods.
Differences between Relational and Non-Relational Databases (NoSQL Databases)
- Relational Databases are set theory based systems where data is stored two-dimensional tables whereas NoSQL databases are a set of technologies that were conceived to solve the challenges of distributed and parallel computing in scalable Internet applications.
- Relational Databases use schemas for storing their data (every row of data in a table has the same set of information) whereas there are no set schemas in NoSQL databases. NoSQL databases provide alternate mechanisms for storing data such as a Key-Value pair or a Graph. (More on that later)
- Relational Databases guarantee that all transactions will conform to ACID (Atomicity, Consistency, Isolation and Durability) properties whereas NoSQL databases do not provide any such guarantees. In fact, NoSQL databases only guarantee Eventual Consistency, meaning that the data item will eventually be consistent with the latest updated value.
Considerations for Data Storage
- Relational Databases are useful where the data is structured and largely uniform whereas NoSQL databases are well suited to process huge volumes of unstructured or complex data that’s required to scale out horizontally.
Eric Brewer from University of California, Berkeley presented a theory known as the CAP Theorem which identifies three important considerations for building applications in a distributed environment – Consistency, Availability and Partition Tolerance (hence the name - CAP Theorem).
Further, it states that, in distributed applications, you can only guarantee two of the above three considerations simultaneously. While typical Relational Databases guarantee Consistency and Availability, the architecture of NoSQL databases are more oriented towards either providing Consistency and Partition Tolerance or Availability and Partition Tolerance. Nathan Hurst has a nice visual representation of where the various available data stores lie on the CAP Theorem considerations.
Different types of NoSQL Databases
This is the simplest form of NoSQL Databases. A Key-Value (KV) store is implemented using a hash table (or a map) where a unique key points to particular value or data. Due to their simplicity, Key-Value databases are very efficient for accessing data. Some of the common examples of Key-Value databases are Redis, Riak and Voldemort
Column-Oriented or Wide Column Databases
The column-oriented databases are an extension of Key-Value data stores where data from a given column is stored together. The columns are grouped into column families and are stored as a key-value pairs within the respective families. The column families act as a key for the columns it contains and the row key acts as the key for the data store. HBase and Cassandra are two well-known examples of a Column-Oriented Database.
In document databases, the data is stored as documents represented in JSON or XML format. These documents are a collection of key-value pairs and its possible to have a nested structure of these key-value pairs within a document. Document databases can be indexed on its unique identifier or any other key within the document. These documents are highly flexible and provide means for adhoc querying and replication. Couple of major open source document databases are – MongoDB and CouchDB.
Graph databases, as their name suggests, are based on the Graph Theory and provide means of dealing with highly interconnected data. In these databases, data is represented as nodes and then relationships are defined between these nodes. Using these relationships, traversing through the nodes becomes easy and efficient. Neo4J, Polyglot and infiniteGraph are some examples of graph databases.
Coupled with Relational Databases, NoSQL Databases provide us with another way to store, retrieve and manage data, specifically unstructured data. Its important to realize that one single type of data store (Relational or Non-relational NoSQL Databases) will not be able to address all of your data requirements.
There are various flavors of NoSQL databases available and its best to understand your data requirements, the usage patterns, the service level agreements and the available resources before making a decision on the data storage setup.
In the coming blog posts, I will delve deeper into each of the categories of NoSQL databases with specific examples using some of the popular products. This should help in understanding the capabilities and the feature sets provided by the various NoSQL databases.