Data Modeling

In my previous company I had been working on loading the data, preparing the data and creating directly dimensions and facts to be consumed by our analysts. This was tiresome and buggy as this style of modeling resulted in a lot of redesigns and re-engineering. In the meantime I learned different techniques to accomplish what I needed.

In short, here an overview of some techniques:

3rd Normal Form

Best method to model Business Processes. Every piece of software makes use of this modelling style.

Setting a business keys to every table. If the system needs a relationship to another table just add a column pointing to it (foreign key). Straight forward industrial thinking.

But useless for BI. If the source system is modeling after that, I’m very happy about it. Because somebody thought about the business process and added constrains to make sure that the data is accurate. But modeling our BI with that style is of no use. The user has already an application on top of their data.

If you have a closer look onto the picture, you will discover, that a table has different things in it: Business Keys, Relationships, Attributes. I found a much better model, to separate those information in their own objects.

Data Vault

Best method to model Data Knowledge and aggregate information for each subject (NSA-style).

This model consists of Hubs, Links and Satellites. A HUB consists ONLY of a business key. Relationships are built with LINKS and any data is saved in SATELLITES. That’s it! As you can see adding value to any Hub doesn’t need to re-engineer everything. Adding a satellite is the only thing needed. Google for more information. There is plenty of information out there.

For me this was a game-changer. Now I don’t need to work with tables of 100s of columns. I split data into satellites and this is my work of unit. Getting the perfect result into that data. And if something is wrong, change it there. No need to change it all over our data pipeline.

Dimensional modeling

Best method for end users to access data and understanding it. This is the standard approach to deliver self-service data.

Save countable values in FACTS and link DIMENSIONS to it. Nowadays in modern systems this is flattened down to one table consisting of all facts and dimensions in hundreds of columns.

Since I manage detail data in satellites, it is just a way of joining all those information together into a satellite.