Inside the Briefcase

Augmented Reality Analytics: Transforming Data Visualization

Augmented Reality Analytics: Transforming Data Visualization

Tweet Augmented reality is transforming how data is visualized...

ITBriefcase.net Membership!

ITBriefcase.net Membership!

Tweet Register as an ITBriefcase.net member to unlock exclusive...

Women in Tech Boston

Women in Tech Boston

Hear from an industry analyst and a Fortinet customer...

IT Briefcase Interview: Simplicity, Security, and Scale – The Future for MSPs

IT Briefcase Interview: Simplicity, Security, and Scale – The Future for MSPs

In this interview, JumpCloud’s Antoine Jebara, co-founder and GM...

Tips And Tricks On Getting The Most Out of VPN Services

Tips And Tricks On Getting The Most Out of VPN Services

In the wake of restrictions in access to certain...

Hadoop May Fade, Big Data Will Endure

August 24, 2015 No Comments

Featured article by Moshe Kranc, Chief Data Officer, Ness SES

If you are an IT manager, the number of Big Data products contending for your attention these days is simply mind-blowing. Hive, Impala, HBase, MongoDB, Cassandra, Drill, Redis, Couchbase, Aerospike, Flink, Spark, Tez, YARN, Mesos, Pig, Storm, Heron – If you haven’t heard all these buzzwords yet, rest assured that you will soon. How do you choose the right platform to solve your Big Data problem?

You may look to reduce risk by going as “mainstream” as possible. Since most of these products are based on the Hadoop infrastructure, start with that anchor. Unfortunately, this logic is flawed. In order to explain, a bit of history is required.

Almost 20 years ago, Doug Cutting faced two issues in creating a web search engine: how to reliably store all that information, and how to create a massive lookup index. Thus was born Hadoop, which included a distributed, highly available file system, as well as the Map-Reduce framework for massively parallel computations. Hadoop is by no means a new technology, so we can evaluate it with the benefit of hindsight and perspective.

Map-Reduce was indeed revolutionary – previously intractable problems could now be solved in a matter of minutes. But, it did not take advantage of memory to improve performance, and it was terrible at handling incremental changes, e.g., adding the index for a single new tweet to the existing full web index. In time, the original Map-Reduce framework was replaced by Tez, which uses a directed acyclic graph for parallel processing, based on Microsoft’s 2010 Dryad paper. But, Tez has been upstaged by another product based on Dryad: Spark. Spark’s implementation is more general purpose, e.g., data at various stages of computation can be efficiently checkpointed and restored. Spark can run in the Hadoop ecosystem (where it will soon replace Tez), or it can run in its own stand-alone environment. More and more projects are choosing Spark as their Big Data solution, and then, as a secondary decision, choosing between Spark on Hadoop or Spark standalone. Over 25% of Spark projects today run outside of Hadoop, and the percentage is rising.

The Hadoop File System is also showing its age. For example, it requires an active namenode in order to function, and it uses Zookeeper to monitor the namenode’s availability. As a result, it can experience “brown-outs” of up to a minute while Zookeeper detects that the active namenode has crashed. Hadoop has evolved mechanisms to improve availability, but other Big Data file systems, such as Cassandra’s, achieve high availability without the risk of brown-out, and with finer granularity over data consistency.

The trend is clear. Hadoop as a concept revolutionized the world of data processing, and ushered in the era of Big Data. But, Hadoop as a product ecosystem is certainly showing its age, and, for many use cases, it has been upstaged by more modern technologies. So, unfortunately, Hadoop is not necessarily the “safe” choice for your Big Data use case. In the long run, Hadoop may lose out to newer products like Spark and Cassandra, which had the benefit of learning from Hadoop’s growing pains.

Leave a Reply

(required)

(required)


ADVERTISEMENT

Gartner

WomeninTech