ETL and Mashups
This post expands a bit on my recent observation in Re: What Not to Build that new web data systems and associated services might act as incubators for socially constructive innovations in education.
Today in ProgrammableWeb, John Musser previewed a book by Michael Ogrinz called Mashup Patterns. In the book, Ogrinz discusses 34 types of mashups arranged in 5 categories. It was the language that Ogrinz used to describe the 5 categories that got my attention. He calls these categories harvesting, enhancing, assembling, managing, and testing.
Ogrinz is talking about the web, of course. But in the world of enterprise data warehouses, there is an interesting parallel in something called ETL. ETL is short-hand for extraction, transformation, and load. Data is first extracted from one or more data source systems. It is cleaned, aggregated, or otherwise manipulated in the transformation step. And then it is loaded into the warehouse where it is available to users, business analysts, statisticians, and policy researchers.
Data warehouses act as a middle layer between the data source systems and people who use the data in their jobs. This architectural change solves many problems, allowing data from disparate systems to be combined into a single place, inviting folks to ask new business questions, and also creating an enterprise vocabulary that helps people compare apples to apples. Most importantly, warehouses provide relatively simple and easy access points to reliable data.
I’m not suggesting that data warehouses should serve as web data systems or that mashups should comprise part of the ETL. I’m really not suggesting anything at this point, but merely using the serendipity of Ogrinz’ words to think about the basic design and development issues intrinsic in any data system build-out.
