Drilling Into the Data Iceberg: How Unified, Intelligent Storage Can Fix the ‘Dark Data’ ProblemSeptember 17, 2015 No Comments
Featured article by Mohit Aron, CEO, Cohesity
A recent sci-fi movie could teach us something about today’s storage conundrum. Last year, Scarlett Johansson starred in Lucy—in which her character is implanted with drugs that leak into her system, allowing her to use “100 percent of her brain” instead of ten percent. The movie explores the infinite possibilities that could take flight if humans engaged all parts of their brain.
While the science behind this idea is questionable, it sheds light on a key data issue that companies today face: how to intelligently manage and use every byte of corporate data—even the vast stores of untapped data hidden beneath the surface.
New research indicates that most organizations are struggling to solve this challenge, no doubt exacerbated by the fact that the amount of data that needs to be stored is growing at breakneck speed. New data generated in 2020 will be 44 times greater than what was generated in 2009. What if enterprises were able to explore this “dark data” to realize its full potential?
Shedding Light on ‘Dark Data’
“Dark data” is information that is stored by an organization, but is not used effectively, if at all. Examples include server log files on website visitor behavior, customer call detail records, mobile geo-location information picked up by wireless routers, and more. How are companies managing, tracking, and mining this data?
According to a recent survey conducted by Cohesity, the majority of companies reported using three or more solutions to handle storage, yet they lack key metrics on how most data is accessed and utilized. These findings highlight a growing problem for many companies in which “dark data” becomes an unmanageable burden. To regain control, IT professionals need much greater insight into all data stored across different departments.
The Data Iceberg Hiding Below the Surface
To understand how companies manage data today, an iceberg serves as a useful analogy. Typically just 20 percent of stored data – the tip of the iceberg – is used for mission-critical operations. IT professionals have a pretty good window into what’s being stored here and how it’s being accessed. However, the other 80 percent of a company’s data is held in secondary storage, usually across multiple solutions designed to meet demands for data protection, fileshares, archiving, DevOps and analytics. Despite its size and importance, IT professionals have little insight into the data stored here and how it’s used.
The market has largely focused on primary storage solutions, leaving secondary storage to grow unchecked. Gartner predicts the market for data protection and archiving solutions will surpass $16 billion in 2015. Add to that the secondary storage required for fileshares and test and development, which is a big chunk of the $36 billion general external storage market, along with the new high-growth market for big data analytics storage that is estimated to reach $17 billion this year, according to IDC, and the total market for secondary storage likely exceeds $50 billion. However, this space has largely been overlooked by storage innovation due to the high focus on flash and performance, the technology that companies use to handle this data has seen few updates. Unified storage has proven to be highly successful in primary applications with clustered, distributed storage solutions for ERPs and CRMs, but none of these benefits have crossed over to secondary storage applications. For companies that have different legacy systems implemented, it’s difficult to receive a clear picture of the collected data.
Drilling into the Data Iceberg
By not accessing all the data at their disposal, enterprises pass up opportunities to innovate for their customers. For instance, airports are currently looking for ways to utilize the data accumulated from smart devices, such as alerts on plane arrivals, baggage claim services, and requesting services, to create more efficient processes. However, to achieve this, airlines need direct access the secondary storage solutions that hold this data. Customer information mined from secondary data can help enterprises run more smoothly and be less error-prone.
However, only 28 percent of respondents stated they have a good understanding of which users are accessing company data and more than 60 percent said they don’t have good information about how many copies of data exist. While a majority of respondents had access to a few fundamental metrics, including types of files stored, capacity utilization trends, and the amount of data being protected, this level insight was hardly widespread. None of these metrics were available to more than 70 percent of the IT professionals that responded to the survey.
In order to manage growing data demands, companies need a much clearer window into their existing storage. If most IT professionals cannot say confidently how many copies of data are stored across the enterprise, it becomes very difficult to plan for growth. Fortunately, intelligent storage – which automatically shows key metrics on stored data – can help companies make smarter decisions about their storage needs. By showing which data is being accessed, how it’s being used, and what is just taking up space, intelligent storage empowers companies to provision resources in a much more efficient manner.
Shrinking Data Demands on a Unified Platform
Gaining insight into all data stored across the enterprise can reveal opportunities to use storage more efficiently, but it takes a unified approach to data storage to realize these opportunities. Siloed IT architectures for different use cases (i.e. data protection, test and development, and analytics) typically require multiple versions of the same data existing without reconciliation for redundant data duplication, and multiple solutions to serve various copies of data. A single platform for all secondary use cases makes these copies unnecessary.
Furthermore, a unified platform eliminates the headaches associated with managing multiple solutions. IT professionals surveyed by Cohesity rated “managing the complexity of different products” as a top concern, second only to growing storage costs. Combining secondary storage on a single platform allows enterprises to understand and plan for future storage needs intelligently and reap the web-scale economics of Amazon-like grow-as-you-go scalability.
Digging deeper into “dark data” is vital to provide a competitive advantage to enterprises. By shining a light on “dark data,” enterprises can fill in the gaps from information gathered from primary data sources. For example, Macy’s uses data on real-time demand and inventory data to constantly adjust pricing for 73 million items. Accessing large amounts of customer and market information quickly helps the company maintain its position as a leader in the industry.
The storage market has shifted dramatically in the past decade. In the early 2000s, the emphasis was on making point storage solutions more powerful and less expensive. However, converged storage architecture has allowed companies to take an entirely new approach to managing data. There’s a vast iceberg of data that enterprises have overlooked: a unified, intelligent platform can bring it to the surface.
Cohesity was founded in June 2013 by CEO Mohit Aron, who is regarded as the pioneer of hyper-convergence, the first architecture to converge compute and storage to simplify virtualization. Aron founded the infrastructure company Nutanix to bring hyper-convergence to market and served as its CTO before leaving to build Cohesity. Aron worked as a Staff Engineer at Google from 2003 to 2007, where he helped design the company’s innovative Google File System. He also served as Architect at AsterData, a leading big data analytics company that was later acquired by Teradata. He holds a Ph.D. in Computer Science from Rice University.DATA and ANALYTICS , Fresh Ink