Will PL/SQL Skills Become Obsolete in the Big Data Era?

July 26, 2012 1 Comment

By: Jorge A. Lopez, Senior Manager – Data Integration, Syncsort Incorporated

Conventional data integration tools are struggling to keep pace with the increasing data performance and scalability requirements of Big Data. In fact, underperforming tools are forcing many IT developers to build complex and lengthy PL/SQL scripts to perform critical data processing tasks within databases. PL/SQL scripts are present in nearly every organization today, but couple constantly evolving business requirements with increasing data volume and variety, and the multitude of PL/SQL lines of code becomes nearly impossible to manage. As business requirements continue to change and data volumes grow exponentially, the need to tune, maintain and extend thousands of PL/SQL lines can hinder business agility and dramatically increase costs.

PL/SQL was never designed with data integration in mind. Instead, its original purpose was to solve functional issues, including variables, conditions, loops and exceptions, involving large volumes of data with relatively smaller result sets. Today, PL/SQL scripts are being pushed to the limit as data sources have increasingly grown, not just in volume but also in variety and velocity. As scripts grow to over thousands of lines, IT staff must spend countless hours manually tuning for what amounts to no net gain. This maintenance can hurt organizations over time in terms of performance and resources since PL/SQL requires specialized skills due to the complexity of the scripts. Moreover, using PL/SQL forces organizations to maintain large temporary staging areas to perform processing, resulting in significantly higher storage and database costs.

In this challenging environment, organizations can take control of their data integration processes by bringing all data transformations back into a high-performance, self-tuning Extract, Transform, and Load (ETL) engine. The benefits of a proper ETL technology include providing fast, efficient, simple, cost-effective data integration, which translates into benefits across the entire organization. This not only includes operational, financial and business gains, but also the ultimate benefit in how the business user achieves quicker access to cleaner, more relevant data to drive Big Data insights and optimize decision making.

The first step towards fast, efficient, simple, cost-effective data integration requires organizations to think about performance in strategic rather than tactical terms. Performance and scalability should be at the core of any decision throughout the entire development cycle. This in turn carries through to efficiency which should be improved by optimizing hardware resource utilization to minimize infrastructure costs and complexity. In terms of productivity, it’s critical to look at technologies with self-optimization techniques. Allowing designers to focus on business rules without needing to constantly tune for performance can free up many hours and resources to reduce the ubiquitous IT backlog.

Perhaps most importantly, cost savings are realized by a combination of performance, efficiency, and productivity. A high-performance, high-efficiency ETL technology can eliminate costly staging areas while also delivering significant server, database, and storage savings. Similarly, a self-optimized engine can improve developer productivity since which allows for a more strategic allocation of time on IT projects.

Many organizations are turning to alternative data integration methods, including this high-performance, high-scalability ETL approach, to eliminate the need for manual coding of PL/SQL scripts. As a result, organizations are able to get closer to their data, benefiting from faster time-to-insight in order to make better business decisions. Take for example, the National Education Association (NEA), which replaced its PL/SQL environment with a high-performance ETL approach. As the largest professional organization and labor union in the United States, the NEA collects data from many sources to operate, aggregate, improve performance, and better interact with members, constituents and the community. However, growing amounts of data left the NEA IT department facing long processing times and disparate data systems. It became a challenge for the team to change any type of data on demand due to the complex processing involved. In addition, many experienced IT staff members were becoming eligible for retirement, leaving the NEA in need of finding a way to transfer that knowledge to newer staff to ensure a seamless transition.

By deploying a data integration strategy that leveraged a high-performance ETL approach, the NEA removed complex processing from the database and made way for faster joins. The NEA was able to accelerate performance, maximize business agility and reduce costs. The NEA saw up to a 25x improvement in processing times for key data integration tasks, and was able efficiently train its staff so there was no loss of knowledge if the team member who originally created the PL/SQL scripts retired. Now, the IT team is able to move its processes forward and can easily understand the data and load scripts with quick turnaround.

In today’s data-driven world, organizations are feeling pressure to make sense of the increasing volume, velocity and variety of data while maintaining cost and operational efficiencies. With PL/SQL present in nearly every organization, implementing a high-performance ETL approach can help significantly by reducing the cost and complexity of developing and maintaining your data integration environment. With this approach, organizations can take control of Big Data and maximize ETL performance to increase the ROI of existing IT investments. This makes it possible to maximize business agility to gain faster time-to-insight and optimize decision making.

Jorge A. Lopez, Senior Manager, Data Integration at Syncsort Incorporated, has more than a decade of experience in the Data Integration and Business Intelligence market. He is based in Reston, Virginia and can be reached via email at jlopez@syncsort.com.