IT Briefcase Exclusive Interview: Open Source Data Management with Charles Zedlewski, ClouderaMarch 18, 2013 No Comments
In the below interview Charles Zedlewski from Cloudera outlines the key benefits an optimally scalable, flexible, and affordable open source data management platform can offer to businesses today.
- Q. How do you see Big Data transforming the way people view data management today?
A. Change is happening on every level from the technology to the processes to the addressable business problems. With the benefit of hindsight, I think it’s surprising how much of the money & time allocated to traditional data management has been in service of reporting. That’s certainly important and necessary but many of the use cases for a big data platform like Cloudera’s are more about creating competitive advantage and lowering infrastructure costs.
- Q. What advice can you offer to businesses trying to overcome current Big Data challenges?
A. I think the challenges with adopting big data technologies are not dissimilar from the challenges organizations have had in the past adopting other technologies that were new and quickly becoming popular. The first important step is to identify a use case that’s a really good fit for the technology. Trying to fit a square peg in a round hole is just going to negatively color everyone’s impression of the technology and impair it’s adoption.
If that first use case is a good fit, there’s a high likelihood that it will be a success and that will generate a lot of internal demand for additional use cases. That brings me to a second bit of advice which is to be deliberate on how your roll out this technology to a wider range of use cases because you can always acquire software faster than you can acquire expertise. One of the most common ways for our customers to use our services group is to help them develop a process and a concentration of expertise to help the organization expand their use of the technology in a way that keeps some control on the quality and the downside risk.
- Q. What differentiates an open source data management platform from a more traditional approach to data management?
A. Cloudera’s data management platform CDH is 100% open source and distributed under an Apache license. The fact that is open source gives customers a number of built-in advantages.
The first is the software is very easy to try out. Customers can start using the software without concern for the cost or the friction of reviewing license agreements.
The second advantage is extensive functionality. CDH includes Apache Hadoop as well as a number of related components like Apache HBase and Apache Hive. Today there are dozens of additional open source projects that build off these open source frameworks. Free tools & libraries for the platform are readily available.
The third advantage is customers have the benefit of portability. There are multiple competing distributions that all build from the same open source projects and so you have the option to switch among them if you decide you don’t like the service or functionality provided by a particular vendor.
- Q. What are the key benefits to having an optimally scalable, flexible, and affordable data management platform, and how does Cloudera work to provide this to their customers?
A. The technical strengths of an Apache Hadoop-based platform like Cloudera’s have made it a strong fit for 3 broad categories of workloads:
1. Data processing – this includes things like processing machine data (e.g. sessionizing logs), scaling back-office processes (e.g. reconciling trades) and optimizing data warehouses (e.g. offloading batch processing workloads).
2. Advanced analytics – these are workloads where customers are taking advantage of the platform’s scale & flexibility to build predictive models that are helpful in recommending products, targeting ads or identifying fraud.
3. Real time analytics – in these situations customers are using the platform to not just build a predictive model but also to respond in real-time to millions of new events. An example would be responding in less than a second to a new event that indicates to an organization that they might be dealing with a cybersecurity threat.
- Q. Can you provide some use case examples of how Cloudera’s services are being implemented today?
A. There are several hundred production deployments of Cloudera’s platform so it’s hard to pick just a few. Nokia is using our platform to build rich maps that are used by millions of users every day in their cars and phones. Opower is using our platform to lower the electricity bills for millions of households. CBS is using our platform to recommend more relevant news stories to their readers.
- Q. How is Cloudera working to expand upon their current services and revolutionize data management platform capabilities?
A. We’re always looking at new ways to expand the usefulness of this new data management platform. A recent example is Cloudera Impala, an open source parallel query engine that lets users perform interactive speed SQL natively on Hadoop data. By running queries 10-30X faster than was previously possible, Impala has completely changed the user experience when it comes to analytics. This is bringing a lot of new users to the Hadoop platform, especially those who have been weaned on BI tools. Before Impala, the BI experience on Hadoop just wasn’t very usable (extremely slow) and we don’t ever want to see usability become a barrier to adoption.
Charles Zedlewski, VP of Product at Cloudera
Charles (@zedlewski) joined Cloudera from SAP where he held various management roles in product strategy, management and operations. During his time at SAP Charles led the development of a half a dozen new and follow-on releases for products that supported some of SAP’s major growth initiatives in GRC, Sustainability, and EPM. Many of these products received substantial critical acclaim and collectively generated more than a hundred million dollars in new product revenues. Prior to SAP Charles held product roles at BEA Systems and venture backed software startups. Charles holds a bachelor’s degree from Carleton College and an MBA from MIT.
CLOUD COMPUTING, DATA and ANALYTICS , Fresh Ink, OPEN SOURCE