In the Quest to Achieve Big Data Nirvana

In the Quest to Achieve Big Data Nirvana, Data Integration is Mistakenly Being Overlooked

November 15, 2012 No Comments

Big Data is everywhere and the buzz keeps getting louder and louder. Take, for example, a simple Google search. If you were to Google “Big Data,” your search would net over 1.18 billion results, and that’s a single search run today. Imagine running that same search next week, next month, and next year. Just like the buzz around Big Data, it’s a safe bet that the amount of results will increase exponentially.

For organizations currently struggling with their data processes, it’s easy to get wrapped up in the hype, especially since Gartner predicts that unstructured data will grow 80% over the next five years. While many CIOs and CEOs are jumping into the conversation, many are struggling to handle the influx of data well. The problem isn’t information overload, but failure to harness, prioritize, and understand the data that is flowing in. A common practice that is exasperating the issue is an isolated focus on analytics strategies to close the gap. However, data integration is also at the center of this information revolution, yet it continues to be overlooked as a crucial component to the Big Data engine. Data integration not only presents significant performance and scalability challenges but also remains largely isolated from the business users with minimal communication and collaboration.

Last year, Syncsort asked business leaders what impact Big Data was having on their data integration strategy. It was clear that for many organizations their data integration tools have failed to keep up with the rapidly evolving data performance requirements. For example, 68 percent of the executives surveyed pointed to data integration tools as impeding the organization’s ability to achieve strategic business objectives.

Even today, conventional IT approaches still do not generate the results that businesses expect in today’s Big Data era. In a survey conducted by Enterprise Strategy Group, they found that data integration complexity was cited as the number one data analytics challenge.

Today, data integration is evolving from old ‘tried and true’ technology to tools that can optimize data performance to deliver relevant, actionable insights and intelligence to drive the business forward. It’s become a strategic imperative for many businesses to be able to leverage the insights that they can glean from their key asset – the data that they’re generating on a daily basis from a variety of sources. There’s a tremendous amount of integration work that needs to be scaled in a cost-effective manner to efficiently handle this recent data deluge. Data integration’s role in meeting the Big Data demand should not be approached as a one-time tactical requirement, but a fundamental strategy to provide the insight that will drive optimal business outcomes through faster, better decisions.

As a first step to remedying integration challenges, organizations should take a closer look at how they approach information processing since trying to process larger data volumes will only continue to increase the amount of big data noise and hinder the ability to uncover valuable insights. This is where the adage, “all data is not created equal,” comes to play and how having the right data integration tools can help execute and materialize an organization’s strategy.

A key approach that can successfully integrate Big Data and eliminate the complexity associated with data integration is by bringing all the data transformations back into a high-performance, in-memory ETL layer. This gives organizations the ability to optimize existing data integration environments but also enhance emerging big data frameworks such as Hadoop. To successfully implement this approach, organizations should ensure that it encompasses these four fundamental principles:

* Organizations should think about performance in strategic, rather than tactical, terms. This requires a proactive approach which relies on performance and scalability being at the core of any decision throughout the entire development cycle. To start, organizations must attack the root of the problem with tools that are specifically designed for performance.
* Next, organizations need to look at improving the efficiency of their data integration architecture which will optimize hardware resource utilization while minimizing infrastructure costs and complexity.
* Productivity is achieved through self-optimization techniques which means removing as much manual tuning of data transformations as necessary. A constant tuning of databases can overextend valued time and resources so simplicity is key.
* Cost savings can then be realized through a combination of performance, efficiency, and productivity. This is a result of deferring server costs, eliminating temporary data staging areas, and improving IT staff productivity to refocus resources on value-added initiatives. With a reduction of costs, organizations can now leverage Big Data for competitive advantage.

This approach also comes with the ultimate benefit being that the business user gains quicker access to cleaner, more relevant data to drive business insights and optimize decision making. This is particularly important given the pressure Big Data places on organizations to make sense of the increasing volume, velocity and variety of data.

At the end of the day, when evaluating a strategy to capture, understand and explore data to see where the value is within that data, cost efficiency and scalability need to be factored in to the decision. Data integration should not be approached from just a tactical ‘means to an end’ mentality, but rather be seen as a strategy to provide faster insights that will drive business results. By staying focused on the bottom line, organizations can cut through the Big Data hype to create significant business opportunity from their information assets.

Jorge A. Lopez, Senior Manager, Data Integration at Syncsort Incorporated, has more than a decade of experience in the Data Integration and Business Intelligence market. He is based in Reston, Virginia and can be reached via email at jlopez@syncsort.com.