Explaining Hadoop to the Non Data Geeks at the OfficeDecember 19, 2013 No Comments
Remember when you first heard the term Hadoop? More than likely a distributed file system wasn’t the first thing that came to mind, but since then, the IT space has become intimately familiar with Hadoop, big data and the additional tools built on top (from Pig to Hive as a Service). For those of us closely following the developments of big data technology, we start to throw these terms around with the full meaning of the technology hidden in IT lingo.
Imagine the reaction you would get if you made a statement such as the following to a non-IT co-worker or a business minded CEO. “I think we should sign up with a cloud provider that offers big data as a service including the entire Hadoop ecosystem as it would be a great companion to our MySQL database.” Yes, those are blank stares looking back at you.
Think about how much backstory is in just that one statement: what Hadoop is, why Hadoop matters, what the Hadoop ecosystem is, why it matters, what cloud computing is… you get my point. For the average person, the IT world is a completely different landscape, which makes it difficult to explain to business executives why an investment in Hadoop would be a good idea or perhaps explain to the marketing department how Hadoop is going to help them reach out to the customer better. So how do you explain Hadoop to those non data geeks out there exactly?
Explaining what Hadoop is
When explaining what Hadoop actually is, it’s best to focus on the two main components of Hadoop: the distributed file system and the system that processes the data, MapReduce.
What is the distributed file system? The technical aspects of HDFS aren’t all that important to explain, but what is important to explain is the problem HDFS solves. A good comparison to make is when you need to save a file on a computer. If that file is bigger than what the memory of your computer is able to hold, then that file will have to either be made smaller or not stored altogether. HDFS, on the other hand, acts like you are connecting many different computers together so that when a file is too big for one computer it spills over to the next one.
Alright, now what is MapReduce? MapReduce is the framework Hadoop uses to make use of the data that it stores. In a traditional database, data must be moved from where it is stored over to the software that processes it. The problem with this is it takes a long time to upload large data files, much like it takes a long time to upload a large email attachment. Hadoop removes this problem by bringing the processing software to the data.
This Forrester video explains HDFS and MapReduce in a similar way if a visual would be helpful.
Explaining why Hadoop Matters
When it comes to executives, in particular, an explanation of what Hadoop is often won’t be enough. You’ll need to explain the actual business benefit of implementing a Hadoop solution. The exact answer to this question will vary from company to company; however, there are some basic ideas to follow.
- Hadoop’s ability to store and process large data sets can resolve an enterprise’s data storage problem in a much more affordable manner as the volume of data continues to grow at exponential rates.
- Hadoop eliminates the need to silo data, allowing companies to find new insights that were previously hidden.
- Hadoop’s ability to process unstructured data allows companies to use information from social media and other web activity to better target consumers and improve products.
- Hadoop offers benefits to particular industries, such as detecting fraud in the financial industry.
Prepare yourself for this particular conversation by thinking about the current problems your company faces as well as missed opportunities and then deciding if Hadoop can help with those problems.
By eliminating jargon and focusing on business benefits, your push for a Hadoop solution will, hopefully, lead to productive dialogue rather than blank stares and confusion.
Gil Allouche is the Vice President of Marketing at Qubole. Gil began his marketing career as a product strategist at SAP while earning his MBA at Babson College and is a former software engineer.APPLICATION INTEGRATION, CLOUD COMPUTING, DATA and ANALYTICS , Fresh Ink, SECURITY