SQL versus NoSQL – Why Not Have the Best of Both Worlds?August 17, 2012 No Comments
By: Will Johnson & Sid Probstein, Attivio
A recent Developer Zone article, Why Startups Should Not Choose NoSql, advises startups to “stick to a SQL solution” instead of choosing technologies such as Cassandra, MongoDB, HBase, Redis, etc. The author writes that this is not only because building good NoSQL data and query models is very difficult; but also that agility is key. As a startup learns what it has to change one or two years later, the challenge of extending these models is simply daunting.
The article goes on to explain how the tools typically found in relational databases make such evolution relatively easy: if you need a new entity type, create a new table. If you need a new relationship, write a new JOIN.
While it might be true, as the article points out, that many startups don’t require the web scale benefits that NoSQL offers, what if you are one of the lucky ones with an instantly viral service? After all, are you going to sink your blood, sweat and tears into a business and assume that you will not be wildly successful? Not believing in and being prepared for that kind of success is a self-fulfilling disaster in the making.
In that context, the NoSQL solutions certainly seem more reasonable. Simplified horizontal scaling for user/query capacity and distributed/sharded repository management then become important, but that still doesn’t address the ease of use and NoSQL agility shortcomings the article presents.
Some of the leading commercial suppliers have solved some of these problems, but they either have other limitations or are extremely complex and expensive to set up.
A hybrid system in which you model using a relational database, and then push the data out to a sharded system to serve queries, is yet another option. It’s not a difficult evolution, and you get the benefit of good tooling and modeling support early on. Then you can treat NoSQL as a “downstream” system that you just feed and query. The issue here is that now you’ve done twice the work, implementing a relational database on day one and building a great application for your users, but then you have to throw much of it away when you move to NoSQL. Anyone who has worked in a startup knows that throwing away something that ‘kinda works’ and starting from scratch is tough to do….until it’s too late.
So is there a “Goldilocks” or “just right” solution for startups? Consider a unified information access (UIA) platform, and specifically one that supports a vast range of SQL to help developers get up and running quickly, as well as provide the flexibility to prototype dashboards in a BI tool like QlikView, Tableau or Tibco Spotfire.
And what about those urgent line-of-business requests saying “We need to show the relationship between cats, dogs and mice in the next five minutes”? To support this kind of business agility, you need ad-hoc JOIN capability to connect relationships between records of any type. So far so good, but that sounds like a relational database, doesn’t it? How about including full-text search capability, something unavailable in most RDBs. This solution would indeed be “just right”.
So now what happens if, as we mentioned, you succeed and demand for your service soars? Well, now you’re ready for it. An indexing structure that supports massive, linear scalability, on the order of say an index with 300 million documents and 1.7 TB of data, sharded across three servers, and replicated to three additional machines to provide fault tolerance and additional query capacity. Provided you have selected a UIA platform that supports non-collocated JOINs, you can STILL query the entire data set and you don’t have to manage the content on each shard, either. Then, as your startup evolves, pivots and ultimately connects with your audience, you’ll already have the ingestion, modeling, SQL, full-text search and scalability you need.
Will Johnson, Chief Architect
After graduating from MIT with a degree in Computer Science, Will Johnson worked for Altavista and FAST for more than 7 years. At Altavista he developed AV’s real time indexing solution used by news aggregators who demanded instantaneous access to news as it arrived. Additionally, he was one of two engineers responsible for developing the Altavista QIndexer product that was used by the large majority of AV’s customers. At FAST, Will developed high-speed database connectors as well as search UI’s and tool sets used across the organization. In addition he worked on many of the largest and most complex customer engagements and deployments around the world, specializing in distributed systems for many of the largest internet publishers and directories, as well as internal knowledge management systems.
Sid Probstein is the Chief Technology Officer of Attivio, responsible for product & technology strategy, implementation and delivery. Sid has over 20 years of experience in managing R&D organizations and delivering award-winning, high-value enterprise software and solutions. Previously, he was CTO at GetConnected, Inc. (GCi), a market-leading transaction processing platform enabling the sale of digital services. He was also Vice President of Technology at Fast Search & Transfer, a global enterprise search company that is now part of Microsoft Corporation. Prior to Fast, Sid was Vice President of Engineering at Northern Light Technology, where he was responsible for production of the first enterprise version of the search engine. He also served as Director of Software Engineering at Freemark Communications, and a Principal Architect/System Manager at John Hancock Financial Services.APPLICATION INTEGRATION, Fresh Ink, Inside the Briefcase