Looking to the Future of Computing in a Big Data Environment

March 3, 2014 No Comments

Featured Article by Jill King, Adaptive Computing

Ask a hundred pundits, and you’ll get a hundred definitions of big data. Some suggest a specific size (“anything over 50 TB is big data”); others like to talk about the 3 Vs (volume, velocity, variety) or the 4 Vs (3 Vs + veracity). But I think the simplest definition is best:

Big data is any data too overwhelming to mine for insight with naive methods.

Notice that I said “naive methods”—not “easy methods” or “familiar methods” or “old methods.” If you can think of a straightforward and practical way to get what you want out of the data, off the top of your head, it’s not big data. Even if your solution is expensive, big, or time-consuming. On the other hand, if using the data requires thoughtful weighing of tradeoffs and expenses, discussions with stakeholders, the creation of custom tools, trial and error, or the resetting of expectations, then you’ve met a big data test.

Big data takes some head-scratching.

The other half of the definition is also significant—”mine for insight.” If all you want to do is dump data onto massive tape libraries and archive it for a decade, it’s not really in the big data sweet spot. You may be wrestling data, and it may be big, but you’re not really pursuing the problem that’s got the whole tech industry buzzing.

Big data’s raison d’etre is insight.

Which leads me to cloud. I claim that tackling big data without a cloud-centric worldview is sort of like building a skyscraper without doing a soil study first: you might make some initial progress, but sooner or later you’ll discover that you need to understand and thoroughly adapt an (inadequate) foundation. At a minimum, you’ll experience false starts and thrashing; in many cases, you may never place a capstone.

My reason for this claim goes back to the two bolded assertions above. Cloud is all about dynamic environments, agility, adjusting, experimenting… If you’re going to do some head-scratching, you want to do it without massive CAP-EX, so you can learn while the price is affordable. That’s cloud.

Cloud is also about flexible applications—scaling out, plumbing connections when they’re needed, renting access to world-class tools you could not otherwise afford… And that’s what you need for insight. Most of us don’t have the deep pockets to build or buy the computational horsepower of Google Big Query, or of Amazon’s Dynamo DB or CloudSearch or Elastic MapReduce. But with cloud, we can rent it. This makes entire categories of insight accessible to mere mortals.

The CIA didn’t hire Amazon to create an internal cloud just so they could run an intranet and internal wiki. They are building insight factories out of their intelligence, and they need a cloud to make it work.

The following industries all need to perform their appropriate jobs in minutes or hours, not days, weeks or even months like the current logjam in the datacenter.

Oil and gas companies simulating subterranean modeling to aid their oil exploration efforts
Hospitals conducting research for childhood ailments such as cancer or autism
Satellite imaging company assisting first responders to keep people out of harm’s way
Weather organizations chartered with warning the public of dangerous environmental conditions
Aircraft manufacturers putting its latest design through a flight simulation

However, these data sets demand intensive processes and analytics before we can achieve that “eureka” moment, which historically has been a very manual, time-consuming process.

As this collision intensifies, companies must tackle Big Data with a whole datacenter approach rather than siloed, specialized environments that may overutilize one area of your datacenter while others remain underutilized. Using all available resources within the datacenter, including virtual machines, bare metal, hybrid cloud, big data and HPC environments, allows IT to streamline the simulation and analysis process, creating a solid foundation to leverage Big Data for game-changing results.

To perform simulations and analyze the accompanying Big Data, it’s key to manage and optimize all available datacenter resources and operate them as one, turning the logjam into an orderly workflow that greatly increases throughput and productivity. This enables companies to solve Big Data challenges faster, more accurately and most cost-effectively, allowing them to go to market with new products rapidly and remain competitive in the marketplace.

The industry is entering a new age where Big Data will transform cloud, datacenters and supercomputing into a hub for creative uses. Not all compelling tech problems live at this nexus, but an amazing number do—and the collision will only continue to intensify.