Sunday, February 1, 2015

Data Warehousing - The Inmon View

The Inmon view of Data Warehousing is the original view. In fact the Inmon definition is still cited in research papers. This definition sees the Data Warehouse as "a subject oriented, integrated, time-varying, and nonvolatile collection of data that is used primarily in organizational decision making." This seems like a serviceable definition until we pick it apart a little. First, if we consider a major subset of that definition " "a subject oriented, integrated... collection of data that is used primarily in organizational decision making." there is nothing that distinguishes data warehousing from enterprise databases. So, the adjectives "time-varying and nonvolatile" must be the difference. And indeed they are. But in this definition the key elements are buried in a flurry of other generic attributes.

Even if we highlight and embolden these terms, they still fail to capture the essence of a Data Warehouse. For example, in a traditional transaction processing system, time stamped transactions would satisfy "time-varying" so what is special about the Data Warehouse? And nonvolatile suggests, correctly, that the information is not updated. This is not entirely true. But in the cases where it is true, why is it true?

One final problem we have with the traditional definition is the name "Data Warehousing" itself. This is a problematic metaphor that fails to capture the essence of a Data Warehouse. The term was selected many years ago to give the impression of high volume, low cost storage where you go into the warehouse to retrieve information that is not readily at hand. Thus, Data Warehouses, over time, came to be seen as large junk heaps of historical data leaving us with some nontrivial philosophical problems such as "what does the data refer too?" and design problems such as "what are we trying to achieve with the Data Warehouse?"