A New Data Warehouse Architecture for the Brave New BI World

Feb 18, 2015 | App Modernization, Cloud, Data, Fresh Ink

Featured article by Paul Moxon, Senior Director of Product Management at Denodo Technologies

Big Data, Internet of Things, Data Lakes, Streaming Analytics, Machine Learning…these are just a few of the buzzwords being thrown around in the world of data management today. They provide us with new sources of data, new forms of analytics, and new ways of storing, managing and utilizing our data. The reality however, is that traditional data warehouse architectures are no longer able to handle many of these new technologies and a new data architecture is required.

The traditional data warehouse architectures are designed to replicate data from operational systems into a central data warehouse on a scheduled basis – daily, weekly, monthly, etc. During this replication process, the data would be transformed and cleaned and often aggregated. The resultant data would also conform to the pre-defined data model of the target data warehouse. (This is referred to as ‘schema on write’ i.e. the data must conform to the pre-defined target schema as it is written to the data warehouse). Once the data was in the warehouse, it was consider to be trustworthy and could be used to create the numerous reports that oiled the organizational workings. However, the whole process of replicating and cleansing the data took time to create and execute. And time is something that is in short supply in today’s business world.

These traditional architectures are not designed to cope with the vast amount of data and the varying data formats that can be generated today – whether data from mobile devices, sensor data, web data, and so on. Compared to the structured, clean conformity of the data warehouse, the new data and data sources represent anarchy! All kinds of data can be thrown into your Big Data store without needing to clean it or make it conform to a defined data schema. The idea of the Big Data stores is “store the data and we’ll work out how to read it later”. This is classic ‘schema on read’ i.e. you work out the format of the data when you come to read it. This turns the traditional data warehouse architectures on their head.

Based on the seeming incompatibility between the traditional and requirements for handling this new data, some vendors and analysts are suggesting that organizations should adopt a ‘data lake’ architecture in which all data is poured into the ‘lake’ (based on Hadoop) to provide a single enterprise wide data repository. However, the reality is that exiting data warehouses are not going anywhere soon. Companies are simply not going to throw away up to three decades worth of investment that they have made in these technologies and tools. The data warehouse will remain as a source of clean, verified corporate data for the foreseeable future.

So, the new architecture needs to extend the existing data warehouse to accommodate and incorporate all the good things that are coming from the ‘new data’ world. Leading industry experts Claudia Imhoff and Colin White have expounded the idea of an Extended Data Warehouse architecture which encompasses both the traditional data warehouse and the exciting new data environment. The new Extended Data Warehouse architecture (shown in Figure 1) contains a number of key components, namely:

Traditional EDW Environment – This is exactly what is says…the traditional data warehouse environment complete with BI and reporting tools. The data stored in this environment is typically aggregated, highly structured, conforms to the data warehouse schema, and has been cleansed and verified. This is the environment for traditional reporting and analysis e.g. production reporting, historical comparisons, customer analysis, forecasting, etc. In many companies, the data in this environment is the oil that keeps the organization running.

Figure 1- Extended Data Warehouse Architecture

Investigative Computing Platform – The investigative computing platform contains technologies such as Hadoop, in-memory computing, columnar storage, data compression, etc. and is intended to provide the environment for managing and analyzing massive amounts of detailed data – only some of which might actually be useful. This is the environment where the data scientists perform data mining, predictive modeling, cause-and-effect analysis, pattern analysis, and other advanced analytical investigations.

Data Integration Platform – The data integration platform is the place where the heavy lifting of extracting, cleaning, transforming, and loading the data into the data warehouse is performed. Traditionally this has been done as a batch load (ETL/ELT), but can also be via trickle feed (Change Data Capture) and data virtualization can be used within the data integration platform. As the data integration platform is being used to load data into the trusted data warehouse, this environment requires more formal data governance policies to manage data security, privacy, data quality, and so on.

Data Refinery – The data refinery allows the users of the investigative computing platform to access and filter the data that they need for their analysis. The data refinery – as its name suggested – refines the raw data to provide useful data to be analyzed. Because of the nature of the investigative computing platform, the data refinery is more flexible with its governance policies – quick access and fail-fast analysis is more important that strict governance. Data virtualization is a technology that is a key part of the data refinery.

Others – In addition to the above components, there are the operational systems and real-time analysis engine, feeding off the operational systems and real-time streaming data, to support use cases such as real-time fraud detection, stock trading analysis, location-based offers, etc.

All of these components can’t work in isolation – that just results in data silos that actually inhibit the organization’s ability and agility to react to business events. Investigative analysis cannot be performed in a vacuum, the analysis must be performed in the context of the business and that context is typically contained within the information stored in the traditional data warehouse. So, if you have a data scientist building predictive customer behavior models, the customer data that they need for the context of their model is typically stored within the data warehouse. Figure 2 illustrates this type of scenario.

Figure 2- Extended Data Warehouse Architecture Interactions

Data virtualization is the glue that binds the various components together. Using a data virtualization layer, the data scientists can access any data that they need for their modeling and analysis – whether it is in the traditional data warehouse, in newer data sources such as Hadoop or NoSQL databases, or completely external to the organization. The data is presented in the form that is most useful to the data scientists. If they are using advanced visualization tools, such as Tableau, the data appears as if it is a relational table. If they are writing their own statistical algorithms using R, the data looks like a JDBC data source. If they are creating mobile or web applications, the data is available as a web service, and so on. The agility and ease-of-use of accessing this data through the data virtualization layer means that they data scientist spends more time analyzing the data and less time trying to figure out how to get the data and get it in a usable format. After all, you want your valuable (and expensive) data scientists and statisticians providing insights in to the business and not spending their time writing data access and conversion utilities!

If you are interested in learning more, listen to Claudia Imhoff talk about the Extended Data Warehouse Architecture and examples of how companies are using data virtualization to implement an investigative computing platform, combining ‘new data’ with more traditional data from existing systems. To listen, visit “Extended Data Warehouse – a New Data Architecture for Modern BI”

Paul Moxon is Senior Director of Product Management responsible for product management and solution architecture at Denodo Technologies, a leader in Data Virtualization software. He has over 20 years of experience with leading integration companies such as Progress Software, BEA Systems, and Axway. For more information contact him at pmoxon@denodo.com.

← Previous Next →

Newgen Recognized in the Document Mining and Analytics Platforms Landscape Report by an Independent Research Firm

Fresh Ink, News

December 16, 2025: Newgen Software, a global provider of an AI-first digital transformation platform, announced its recognition among “Notable Vendors” in Forrester’s “The Document Mining and Analytics Platforms Landscape, Q4 2025,” report. It provides an overview of...

Top 10 Cybersecurity Stories This Week: Android Zero-Days Under Active Exploitation, Oracle Identity Manager Takeover, and SCADA System Vulnerabilities

Newgen Recognized as a ‘Niche Player’ in 2025 Gartner® Magic Quadrant™ for Business Orchestration and Automation Technologies

AI, Fresh Ink, News

October 29, 2025: Newgen Software, a global provider of AI-enabled end-to-end automation at scale, announced that it has been recognized as a ‘Niche Player’ in the 2025 Gartner® Magic Quadrant™ for Business Orchestration and Automation Technologies (BOAT). The...

Top 10 Cybersecurity Stories This Week: F5 BIG-IP Nation-State Breach, CISA Emergency Directive, and Record $2.5B Jaguar Land Rover Attack

BeyondTrust Experts Reveal Top Cybersecurity Predictions for 2026 and Beyond

Fresh Ink, News

From AI fragmentation and identity debt to biological computing, BeyondTrust forecasts the technologies and threats that will shape the next decade Experts predict a surge in agentic AI adoption, identity exploitation, and global regulatory shifts redefining digital...

« Older Entries

Explore the Latest in Tech Innovations

A New Data Warehouse Architecture for the Brave New BI World

Newgen Recognized in the Document Mining and Analytics Platforms Landscape Report by an Independent Research Firm

Top 10 Cybersecurity Stories This Week: Android Zero-Days Under Active Exploitation, Oracle Identity Manager Takeover, and SCADA System Vulnerabilities

Top 10 Cybersecurity Stories This Week: ChatGPT Zero-Click Vulnerabilities, Habib Bank $2.5TB Breach, and Cisco Firewall Attack Resurgence

Newgen Recognized as a ‘Niche Player’ in 2025 Gartner® Magic Quadrant™ for Business Orchestration and Automation Technologies

Top 10 Cybersecurity Stories This Week: F5 BIG-IP Nation-State Breach, CISA Emergency Directive, and Record $2.5B Jaguar Land Rover Attack

BeyondTrust Experts Reveal Top Cybersecurity Predictions for 2026 and Beyond

Explore Our Resources

Connect With Us

Company Information