Virtual Data Warehouse

So far, we have been discussing the creation of stored procedures to pump data into our hubs, satellites, facts and all the other objects.

In an ideal world, I would specify my objects and good is. All data would come into their place automatically. Unfortunately, that doesn’t happen too often.

Currently we create tables and according to the template produce stored procedures to load the data. Then we create jobs to process them. We would need to check that they are in the right order, that we didn’t forget a task and that we maintain the dependencies. A lot of auxiliary work.

To remove all this auxiliary work, we would need to specify our objects as views instead of tables. So far WhereScape doesn’t offer this choice. Even not in stage objects. Would that be really achievable?

In our templates, I have written a lot of code and tried to give „intelligence“ to it. What could I reuse to create views based on them? It’s the SELECT statement of the INSERT statement. No UPDATES and no delta loading. This pure SELECT statement I placed in a block template which can be reused in a procedure and in a DDL statement.

You will find now in my library all objects for

  • Stage
  • Stage Merge
  • Hubs
  • Links
  • Satellites
  • Dimensions
  • Facts

which you can create now as

  • TABLE (with a stored procedure) or
  • VIEW.

Unfortuately, there is a little inconvenience with it. WhereScape warns currently when we want to recreate an object which has a DDL creating a VIEW. You would need to delete that manually from the database and recreate it with WhereScape.

Hopefully you have a machine which is powerful enough to calculate only views based on our history (DataStore) layer a.k.a. Persistent Staging Area.

In my case we have too much data to only work with views. What I do is using stage views as much as possible. Especially in multi-step stage preparation. No need to fiddle creating nesting jobs for them. And in loading business vault satellites I use stage views only. For the raw vault we would need a stage table to load hubs, links and satellites in parallel. But for all the cleaning and calculation while our only target is another satellite, there is no need to create a table for that. A view would be faster.

I reached now my main goal of having all objects created as tables or views. If you liked so far my contribution, please leave a message.