Steroids are not the Answer to Big Data

June 25, 2012 No Comments

Yves de Montcheuil, Vice President of Marketing, Talend

Until recently, very few companies could reap the rewards that came with the harvesting and analysis of their big data. But as previously discussed, for those who could, big data is already old news and the rewards have been fantastic. So just how did these organizations harvest big data? With steroids-driven IT stacks that provide lots of computing power, massively parallel architectures, columnar databases, etc. They also used steroids-driven “data scientists” – individuals who could not only master this IT stack but also shift through the data, discover patterns and relationships, and more importantly, write all the code required by these operations.

What are data scientists, by the way? Definitions differ. I know some people who refer to them as “data analysts who live in California,” “Ph.Ds. in MapReduceology” or even “parallel programming gurus.” Or as some market analysts put it, “data analysts on steroids.” There’s that word again – steroids!

For companies that can afford IT stacks and data scientists on steroids, these approaches are clearly performance enhancers, and lawful ones at that. Indeed, business isn’t sports, and there is no premise that everyone should start with the same chances. An organization that can afford steroids does gain an unfair competitive advantage. Who can blame them?

However, with the advent of big data technology such as Hadoop, the technology cliff shrinks. Highly scalable data management platforms become accessible to any organization. Open source plays its democratization role by letting any organization adopt and deploy technology, regardless of their budget, and of their level of expertise. It levels the field. Injecting steroids into IT areas becomes an investment affordable to anyone.

As a result, the competitive differentiation shifts to the talent involved. If the technology playing field is level, organizations must have the best data scientists to (re-)gain their competitive edge. And that means more steroids for data scientists.

There is a downside, of course. For starters, steroids are unhealthy! They are unhealthy for data scientists, unhealthy for businesses, and unhealthy for the general acceptance of big data. In the short run, they may pay off performance-wise. But in the long run, they have unpredictable side effects.

Instead, big data technology needs to move even further into the democratization cycle, as it is now both affordable and scalable. It needs to also become easy to use. Today, the bar is too high before an organization can reap the rewards of big data. One has to install and configure Hadoop clusters, understand the (often subtle) differences between HDFS, Hive and HBase, use Sqoop and Flume to load data, populate HCatalog to get metadata, learn HiveQL and Pig Latin to process data, keep up to date with new ecosystem projects and ongoing patches for each of the above.

Is your head spinning? Maybe one of those purple pills would help…

The biggest challenge, though, is the following: How can we make Hadoop simpler and easier, without decreasing its power? By removing this complexity, the requirement for the elusive data-scientist-on-steroids goes away. Once this is achieved, any data analyst can become a data scientist – without ingesting these purple pills.

–Yves de Montcheuil

Yves de Montcheuil is the Vice President of Marketing at Talend, the recognized leader in open source integration. Yves holds a master’s degree in electrical engineering and computer science and has 20 years of experience in software product management, product marketing and corporate marketing. He is also a presenter, author, blogger, social media enthusiast, and can be followed on Twitter: @ydemontcheuil.