Cheap and DeepJuly 31, 2012 No Comments
By John Bantleman, CEO, RainStor
I recently had the privilege of attending an internal event at a major U.S. Bank on the topic of Big Data. What differentiated this from many other Big Data events was that the focal point was placed on the issues and requirements surrounding the theme of “Cheap and Deep.” The challenge of storing and retrieving multiple source data over long periods of time cannot be solved if the economic cost is the delimiter in meeting the analytical demand. Through time I have spent with other customers in different industries, it has become clear that “Cheap and Deep” is what everyone is looking for. One major U.S. retailer has a petabyte plus data warehouse but can only store 5Qs of transactional detail. In order to understand what happens when Christmas falls on a Saturday, they need seven years of data. Another customer, a major U.S. bank has a petabyte plus warehouse but today is forced to keep three petabytes on tape with limited (to zero) ability to bring that data back on line to support regulatory and analytical (quant) requirements. A Major Global Investment Bank is seeing trading data volumes grow at close to 100 percent, which is outstripping infrastructure and driving up costs. The resulting increase in regulatory scrutiny and requirements for deeper data analytics is further compounded by the stipulation that this data will need to be available for the next 7-10 years. Homeland security requirements to capture Internet activity and broader communication records also require companies to manage petabytes of data every year. Communications providers who are among the largest data managers on the planet now forecast a 10-100x increase in data volume based upon full implementation of 4G and LTE in the coming two years.
What is clear is that the requirement to manage massive data at scale is a continuing theme. The ability to simply delete, roll to tape or summarize the data doesn’t match the business imperatives. The solutions that have traditionally been applied to address these needs (data warehouses) are just too expensive.
Interestingly these are less about the unstructured data (variety) capabilities of Hadoop and much more about the ability to scale to an unprecedented Volume and Velocity of data. In many cases, we see the attraction of unlimited scalability in Hadoop, which is of genuine interest, but the lack of SQL access and other standard interfaces becomes an issue. In other areas such as scale out NAS and Object Stores, we see the requirement to leverage SQL technology on low cost commodity storage and virtualized (cloud) servers as being a key requirement.
Big Data is clearly about Hadoop, but its not only about the technology stack; the primary consideration should be the ability to store unlimited data over virtually unlimited time periods, which is what businesses now refer to as “Cheap and Deep.” The availability of low cost massively scalable infrastructure along with the ability to access data in a way that satisfies all business needs and requirements, offers a completely different strategy on how data is managed, and it opens the way for organizations to store and manage data without traditional limits!
John Bantleman has more than 20 years of experience in the management of software companies. Prior to overseeing RainStor, John transformed LBMS into a $45 million business prior to its successful NASDAQ flotation in 1997. Today’s LBMS’ technology is now part of CA’s product portfolio. The following year John was instrumental in the launch of Evolve, and drove the company through to a successful IPO on NASDAQ. Returning to the U.K. in 2003, John spent 12 months working on the advisory boards of venture capital organizations such as Apax Partners. He joined RainStor Inc. as Chairman in 2004 and became CEO at the start of 2007 and relocated back to the US to head-up worldwide operations in 2009.Fresh Ink