NoSQL Databases in 500 Words or LessSeptember 25, 2012 1 Comment
By: Martin Fowler, Chief Scientist, Thoughtworks and
Pramod Sadalage, Principal Consultant, Thoughtworks
For a long time, relational databases have been the go-to data storage option. However, with organizations grappling with massive amounts of unstructured data being generated at a massive rate, a new data storage technology has emerged as “king” of Big Data, NoSQL. The growing need for rapid access to lots of unstructured data has led to the growing use of NoSQL databases, which process large volumes of data on clusters of machines more efficiently than relational databases.
Characteristics of NoSQL databases
NoSQL moniker is generally applied to a number of recent non-relational databases such as Cassandra, MongoDB, Redis, Neo4j and Riak. The common characteristics of NoSQL databases include:
- Not using the relational model
- Running well on clusters
- Built for 21st century web estates
- Horizontally scalable
Further, there is a common approach to categorizing NoSQL databases according to their data models. These include:
- Key-Value Databases – Key-value stores are simple hash tables, primarily used when all access to the database is via a primary key. These are the simplest NoSQL data stores to use from an API perspective. Some of these databases include: Riak, Redis or MemcachedDB.
- Document Databases – Document Databases store and retrieve documents. These are self-describing, hierarchical tree data structures, which can consist of maps, collections, and scalar values. Some of these databases include MongoDB, CouchDB, Terrastore, RavenDB and others.
- Column-Family Stores – Column family stores, such as Cassandra, HBase and Amazon SimpleDB, allow you to store data with keys mapped to values and the values grouped into multiple column families, each column family being a map of data.
- Graph Databases – Graph databases such as Neo4J, Infinite Graph or OrientDB allow users to store entities, also known as nodes, and relationships between these entities.
The two main reasons for using NoSQL technology are to improve programmer productivity by using a database that better matches an application’s needs and to improve data access performance via some combination of handling larger data volumes, reducing latency, and improving throughput.
But, the MOST important outcome of the rise of NoSQL is the acceptance of database technologies beyond relational databases. NoSQL is only one set of data storage technologies, and other data storage technologies should be considered whether or not they bear the NoSQL label. Other options include file systems, event sourcing, memory image, version control, XML databases and object databases. This has led to a new era of “Polyglot Persistence.”
Polyglot persistence is about using different data storage technologies to handle varying data storage needs. It can apply across an enterprise or within a single application.
Only by working with NoSQL and others – and discovering their strengths and weaknesses – can IT architects understand these new data storage technologies. In the future, organizations will use many data technologies. Data professionals will need to be familiar with these different approaches and know how to match them to different problems.
To learn more about NoSQL and other data storage technologies, check out “NoSQL Distilled” at http://martinfowler.com/nosql.html
Pramod J. Sadalage, Principal Consultant at ThoughtWorks, enjoys the rare role of bridging the divide between database professionals and application developers. He regularly consults with clients who have particularly challenging data needs requiring new technologies and techniques. He developed pioneering techniques that allowed relational databases to be designed in an evolutionary manner based on version-controlled schema migrations. With Scott Ambler, he coauthored Refactoring Databases (Addison-Wesley, 2006).
Martin Fowler, Chief Scientist at ThoughtWorks, focuses on better ways to design software systems and improve developer productivity. His books include Patterns of Enterprise Application Architecture; UML Distilled, Third Edition; Domain-Specific Languages (with Rebecca Parsons); and Refactoring: Improving the Design of Existing Code (with Kent Beck, John Brant, and William Opdyke). All are published by Addison-Wesley.APPLICATION INTEGRATION, DATA and ANALYTICS , Fresh Ink