Inside the Briefcase

Ironclad SaaS Security for Cloud-Forward Enterprises

Ironclad SaaS Security for Cloud-Forward Enterprises

The 2015 Anthem data breach was the result of...

The Key Benefits of Using Social Media for Business

The Key Benefits of Using Social Media for Business

Worldwide, there are more than 2.6 billion social media...

Gartner IT Sourcing, Procurement, Vendor and Asset Management Summit 2018, September 5 – 7, in Orlando, FL

Gartner IT Sourcing, Procurement, Vendor and Asset Management Summit 2018, September 5 – 7, in Orlando, FL

Register with code GARTITB and save $350 off the...

Infographic: The Three Pillars of Digital Identity: Trust, Consent, Knowledge

Infographic: The Three Pillars of Digital Identity: Trust, Consent, Knowledge

8,434 adults were surveyed to gauge consumer awareness of...

FICO Scales with Oracle Cloud

FICO Scales with Oracle Cloud

Doug Clare, Vice President at FICO, describes how Oracle...

Of Dark Data, Beware You Must

April 4, 2013 No Comments

Big data there is. To master it you must learn, but of dark data, beware you must.

A Data Padawan, on his quest to become a Data Jedi, many dangers he will encounter.  As big data slips from the peak of inflated expectations and into the trough of disillusionment at intergalactic speed, temptations to stray beyond the limits of the Trade Federation abound.  Dark data that beyond these limits resides, if properly mastered, incredible opportunities for Data Jedis will create, for the Force to unleash and for their organization’s bottom line to levitate.

Dark data is usually defined as data that is kept “just in case” but hasn’t (so far) found a proper usage, or can be harvested and leveraged beyond its primary (intended) usage.

Examples abound but could include:

- Measurements collected by the hundreds of sensors built all over a car (or the Millennium Falcon). These measurements are handy for the mechanic (or for Chewbacca) when the car/spacecraft is in the shop. But the manufacturer can also use it to diagnose patterns of failures, optimize performance, or even perform preventive maintenance.

- Access logs from facilities doors (or from the shield of the Death Star). Beyond their primary use (to prevent unauthorized access by Rebel vessels), such logs allow to analyze visitor flow, optimize elevator traffic, better regulate HVAC, protect from total destruction, etc.

- Unstructured data, such as audio, video, 3D holograms, Death Star blueprints, etc. – stored on servers, in the Cloud or in R2 droids, that can be mined for information beyond the intended message they mean to convey.

The first challenge faced by the Data Padawan is to identify which data is available, and where. By definition, dark data is data that was not meant to be used in that particular way. It’s usually not stored in databases or systems managed by IT, and rarely inventoried in the enterprise’s metadata catalog (when such a catalog exists). Rather, logs are often kept as files stored on disk/in memory inside the system itself, or in an embedded database.  Another obstacle is dark data collection. Connectivity to the systems can be difficult, because of protocols, security/permissions, firewalls, or even simply lack of APIs.

The next step in the Data Padawan’s apprenticeship is to process this dark data, and to produce value – the kind of value that develops the Force of the organization. Thankfully, many tools and technologies are available. Hadoop and NoSQL databases, data integration and data quality tools generating native MapReduce code, optimized SQL query systems for Hadoop such as Hive/Stinger or Impala, all make the life of the Data Padawan easier. Because frankly, while a light saber may come in handy for slicing and dicing data, it is a bit crude for detailed analysis…

There remains one major obstacle on this quest: the dark data island. A dark data system is not, cannot be, an isolated system. Dark data must be used in conjunction with the rest of the information system. Dark data applications must be connected and must exchange with other databases, applications, analytical platforms, etc. Only then will dark data embrace the Force, and forgo its Dark Side. To become simply data.

And only then, a Data Jedi the Padawan will become.

May the Force of data be with you.

YvesM casual2 lores Of Dark Data, Beware You Must

Master Yves de Montcheuil is a Data Jedi and the Vice President of Marketing at Talend, the recognized leader in open source integration. Yves holds a master’s degree in electrical engineering and computer science and has 20 years of experience in software product management, product marketing and corporate marketing. He is also a presenter, author, blogger, social media enthusiast, Star Wars fan, and can be followed on Twitter: @ydemontcheuil.

DATA and ANALYTICS , Fresh Ink

Leave a Reply

(required)

(required)


ADVERTISEMENT

Gartner