A new class of databases emerged that follow BASE (basically available, soft state, and eventual consistency). Characteristics and have been dubbed as NoSQL databases. A few examples of NoSQL databases include Amazon’s Dynamo and Google’s BigTable. Eventual consistency can be tolerated as long as the application can tolerate stale data. Applications such as instant messaging, for example, are usually tolerant of the eventual consistency limitations.
The typical characteristics of NoSQL databases include
- No strict schema requirements
- No strict adherence to ACID properties for transactions
- Consistency traded in favor of availability
The trade off with NoSQL databases is between ACID properties (most notably consistency) and performance and scalability.
Types of NoSQL Databases
Document Stores
In contrast to RDBMSs, where data is stored as records with fixed-length fields, document stores store a document in some standard format or encoding. These DBs because popular because they can store a variety of data and their schema can change with time. The encoding may be in XML or JSON, as well as any arbitrary binary format such as PDFs or Office documents. These are typically called binary large objects (BLOBs).
Graph Databases
In graph databases, graph structures such as vertices and edges are used to represent and store data. Graph databases can be used to store data that has network-like properties between elements (e.g., a social network graph).
Key-Value Stores
Key-value (KV) store is a database model that maps keys to (possibly) more complex values. This type of database has the most amount of flexibility because keys can be mapped to arbitrary values or structures such as lists. The key-value pair constitutes an individual record in this model, and the keys are typically stored as a hash table. Hash lookups are fast and can be distributed easily, and for this reason, key-value stores can be scaled horizontally.
Columnar Databases
Columnar databases are a hybrid of RDBMSs and KV stores. Like relational databases, they store values in groups of zero or more columns, and as in key-value stores, values are queried by matching keys. However, in columnar databases, data is physically transposed and stored in column order instead of in row-order as in traditional RDBMSs. Operations such as modification of a subset of columns or aggregation of a column across all rows become more efficient, as entire rows do not have to be read to obtain the value of a single column.
Following are some of the advantages of NoSQL databases.
- Data flexibility: NoSQL databases are designed with non-relational models and hence typically do not enforce a rigid schema. Document stores allow arbitrary information to be stored in some form of encoding (XML/JSON, etc.) or even in binary. Graph databases do not have schemas but a set of properties that are used in different kinds of edges or nodes. In key-value stores, for example, the value associated to a key can be a single value or a larger, more complex data structure such as a hash or list. In columnar stores, it is fast and easy to alter a table to add more columns if required.
- Scalability: Several NoSQL systems employ a distributed architecture from the ground up, unlike RDBMSs whose fundamental designs have not changed in decades. This means that NoSQL systems are built for high scalability. For example, Yahoo! has deployed a 1000+ node HBASE cluster with 1 PB of data, and such large data stores are not uncommon with companies such as Google, Amazon, and Facebook.
- Performance: By relaxing some of the ACID guarantees, NoSQL systems can take advantage of parallel access to data and provide faster performance than their traditional SQL counterparts.
Disadvantages of NoSQL databases include the following
- Application developers can no longer rely on ACID guarantees and have to design for lack of consistency guarantees. They must account for the possibility of stale data from the database during reads or writes that are not fully committed to disk before the operation.
- NoSQL has lock-in because of a lack of standards. Even if data formats may be standardized through XML or JSON, each NoSQL product may have its own query/response formats. By comparison, moving between RDBMSs is easier because the data formats and query languages are largely standardized.
0 Comments