Toxic Data Puts A Crimp in Big Data Panacea

September 12, 2012 No Comments

By David A. Kelly, Upside Research

In recent posts, I’ve talked about the potential “marriage-made-in-heaven” for cloud and big data, most notably being put forth by Hadoop and its brethren (VIEW POST). I’ve also touched a bit on how big data is being used by savvy CIOs to more effectively identify and address security issues within a company (VIEW POST). Today, I wanted to spend a little time on the idea of “toxic data” and the impact it can have on the big data explosion.

Toxic data, as one recent definition states, is any data that has leaked out of an organization that might become harmful. Essentially, it’s important information that the company has lost control of. There are a wide range of types of data that can become toxic—from the personal data like a social security number, credit cards, or health care information, to corporate considerations, such as business plans, sales figures, or even product designs. Think of sensitive customer or corporate records and you have an idea of the type of data that could become “toxic” if it got into the wrong hands.

We’re all familiar with the data breaches that have captured headlines recently. From eHarmony and Linked In reporting significant password breaches for their customers to larger breaches like one earlier this year with Global Payments that impacted 1.5 million credit card accounts (and temporarily got the company removed from Visa’s approved vendor list), data breaches are becoming more prevalent in today’s distributed transaction environment. The cost to U.S. business is substantial, with some estimates at more than $6.5 billion for 2011 alone. Industry experts expect that number to continue to rise as the amount of data continues to proliferate and the security controls are not the in place to prevent theft.

Considering the potential damage to brand for a company who doesn’t properly protect its data, it is important to find ways to mitigate the potential exposure of sensitive data before it happens, rather than trying to go back and protect data that is already “out there.” Securing the data at the outset is a great place to start. Naturally, there are a number of up-and-coming players who seek to fill this niche market.

Since much of what is being done in the big data in the cloud realm involves Hadoop, and to a lesser degree Cassandra and MongoDB, it makes sense to look to Hadoop’s security model to see where efforts are being made to tighten security and prevent toxic data leaks. Focusing on session encryption, access control, key management, audit capabilities and policy management are all ways to secure the massive stores of big data that companies are capturing and attempting to manipulate. And, vendors like Gazzang and Vormetric are stepping forward with tools that address many of these areas.

For companies who store and manage large amounts of sensitive customer information, it is absolutely imperative to have the correct security measures in place to prevent toxic data leaks. Doing so can save your brand and avoid costly incidents. If you are tempted by Hadoop as a means of leveraging all of your data, be sure you have, from the outset, implemented the appropriate security controls to keep your company away from the “toxic data” demons.