Business Intelligence Automation

How to build a data warehouse? I used to work with SSIS on Microsoft SQL Server to generate all my pipelines. This was fast and convenient and a lot of people in the industry are working like that. Define a source, do some transformations and load them to a target table. Very easy, very fast.

But what can we do, if we don’t want to work over and over again on transformations which are looking always the same? Dan Linstedt pointed me to the Capability Maturity Model to find out how mature a data warehouse is. Translated to BI this would mean:

Maturity LevelNameDefinition
1InitialProcess unpredictable
Process poorly controlled
2ManagedImplementation Standards
Pattern Based Architecture
3DefinedDefined Development Process
Defined Deployment Process
Defined Other Processes
4MeasuredEstimates and Actuals Captured
Measured Lead Times
Measured Complexity
Measured Defects
5OptimizedAutomation Tools
Parallel Teams
Rapid Delivery

After some work I reached level 3 easily. But my automating efforts to deploy fast my data flow according to my pattern, turned out to be very time consuming and difficult. My ambitions where very high.

Many people have been trying to accomplish that too and some of them have sold there software what they found out about automating things. All are building a meta data repository and deriving from that what needs to be done to get to a desired result.

So I did an extensive research on that topic to find out, that such a meta data tool doesn’t exist and my own definitions were never finished and I added more and more „exceptions“ to that. Why should I spend time researching it while others have already solved it?

So I turned to commercial tools and tested some. Finally I decided to go with WhereScape. When we bought it, it had only a couple of supported databases and wizards helping to get the patterns right. But it is turning more and more into a „target-less“ meta data repository which can talk to any database and any data structure through templates. In the following posts I’m going to share some of my findings, how to process data fast.