NoSQL Databases in 500 Words or Less

September 25, 2012 1 Comment

By: Martin Fowler, Chief Scientist, Thoughtworks and

Pramod Sadalage, Principal Consultant, Thoughtworks

Co-authors “NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence”

For a long time, relational databases have been the go-to data storage option. However, with organizations grappling with massive amounts of unstructured data being generated at a massive rate, a new data storage technology has emerged as “king” of Big Data, NoSQL. The growing need for rapid access to lots of unstructured data has led to the growing use of NoSQL databases, which process large volumes of data on clusters of machines more efficiently than relational databases.

Characteristics of NoSQL databases

NoSQL moniker is generally applied to a number of recent non-relational databases such as Cassandra, MongoDB, Redis, Neo4j and Riak. The common characteristics of NoSQL databases include:

Not using the relational model
Running well on clusters
Open-source
Built for 21^st century web estates
Schemaless
Horizontally scalable

Further, there is a common approach to categorizing NoSQL databases according to their data models. These include:

Key-Value Databases – Key-value stores are simple hash tables, primarily used when all access to the database is via a primary key. These are the simplest NoSQL data stores to use from an API perspective. Some of these databases include: Riak, Redis or MemcachedDB.
Document Databases – Document Databases store and retrieve documents. These are self-describing, hierarchical tree data structures, which can consist of maps, collections, and scalar values. Some of these databases include MongoDB, CouchDB, Terrastore, RavenDB and others.
Column-Family Stores – Column family stores, such as Cassandra, HBase and Amazon SimpleDB, allow you to store data with keys mapped to values and the values grouped into multiple column families, each column family being a map of data.
Graph Databases – Graph databases such as Neo4J, Infinite Graph or OrientDB allow users to store entities, also known as nodes, and relationships between these entities.

Why NoSQL?

The two main reasons for using NoSQL technology are to improve programmer productivity by using a database that better matches an application’s needs and to improve data access performance via some combination of handling larger data volumes, reducing latency, and improving throughput.

But, the MOST important outcome of the rise of NoSQL is the acceptance of database technologies beyond relational databases. NoSQL is only one set of data storage technologies, and other data storage technologies should be considered whether or not they bear the NoSQL label. Other options include file systems, event sourcing, memory image, version control, XML databases and object databases. This has led to a new era of “Polyglot Persistence.”

Polyglot persistence is about using different data storage technologies to handle varying data storage needs. It can apply across an enterprise or within a single application.

Only by working with NoSQL and others – and discovering their strengths and weaknesses – can IT architects understand these new data storage technologies. In the future, organizations will use many data technologies. Data professionals will need to be familiar with these different approaches and know how to match them to different problems.

To learn more about NoSQL and other data storage technologies, check out “NoSQL Distilled” at http://martinfowler.com/nosql.html

BIO:

Pramod J. Sadalage, Principal Consultant at ThoughtWorks, enjoys the rare role of bridging the divide between database professionals and application developers. He regularly consults with clients who have particularly challenging data needs requiring new technologies and techniques. He developed pioneering techniques that allowed relational databases to be designed in an evolutionary manner based on version-controlled schema migrations. With Scott Ambler, he coauthored Refactoring Databases (Addison-Wesley, 2006).

Martin Fowler, Chief Scientist at ThoughtWorks, focuses on better ways to design software systems and improve developer productivity. His books include Patterns of Enterprise Application Architecture; UML Distilled, Third Edition; Domain-Specific Languages (with Rebecca Parsons); and Refactoring: Improving the Design of Existing Code (with Kent Beck, John Brant, and William Opdyke). All are published by Addison-Wesley.

One Comments to “NoSQL Databases in 500 Words or Less”

Xenofon says:

September 25, 2012 at 2:43 pm

I am currently working my way through your book. As far as I can comment the matter at hand, I see, that it will be all about a balanced approach and about choosing the right technology for the job.
Judging by the degree of SQL knowledge I have found in Java developers in the last more than 10 years (…), I rather doubt the decisions will be made with the right criteria. Not wanting to learn SQL is the wrong motive and will lead to very bad decisions and LOTs of failures.