Love-hate relationship with WhereScape

[Update see below]

Well, I don’t know if this is the right way. But I want to share my experience about it with others.

First, I want to say that WhereScape enabled me to reach my goals faster. At least now. It took a year to get to full speed. This is due to the fact that I had first to discover, how it works and what is important to me. The first sentence of the sales consultants was: „We can do whatever you want!“.

You have to take that really literally! Don’t underestimate it. It is a blessing that you can form this tool to your likings. It is a curse that it takes time and has no blue print attached. Well, WhereScapes has own templates which are filled with options, accumulated from the beginning to present by user requests. Either I click through the whole process over and over again or I develop my own templates to remove manual work.

Object creation

It is a dream to design transfers. Create the first load object and then drag the objects or the columns from one step to next. E.g. Load → Stage → Data Store → Stage → Satellite → Stage → Dimension → Stage → Export. So easy with drag & drop. From one or multiple source tables. And then just building the right JOIN statements and an optional WHERE and GROUP BY clause.

The objects are created with the right data type with a mouse click. Stored Procedures are created without further input. Indexes are added automatically if needed. I just have to worry about business logic implementation.

But to get there has taken us a long time. And it has some limitations.

32 bit vs 64 bit

The application is still running in 32 bit. For a meta data management platform this is ok. It gets nasty when I have to work with execution.

My loading works with SSIS, ODBC or PowerShell. In ODBC I have to use a 32 bit ODBC driver. Calculate yourself, how much memory I can use for a data transfer from a database? Depending on row size crashes happen at around 50 to 100 Mio. records.

PowerShell is another beast. Windows has 3 versions of it. The 32 bit Version in the 64 bit Windows directory and vice versa. And then there is a third version as well. Figuring out which works in combination with WhereScape for a given job is tiresome. And if you need to extend PowerShell with more functionality, most modules are in 64 bit. So I have to start a 32 bit PowerShell, switching to a 64 bit version and then start executing.

Object name length

This limitation is really bad! RED has a limit of 64 characters for any object like tables, columns, indexes, stored procedures, etc. Only „parameters“ (kind of global variable) have a length of 256 characters but the system can address only 64 of them?!?

The 64 characters is for all objects. So if I create a table with 64 characters, I still get errors because of other objects appending to its string.

  • Stored Procedure: update_{object name}
  • Index: {object name}_idx_BK

With this calculation I have only 57 characters left.

In 3D there is a limit at 1024 characters. Enough space to evolve.

I hate abbreviated object names. Nobody except me is going to understand them. If you think of something like stage_{satellite}_{source system}_{source table}_{calculation step}, you can see that I might run out of space easily. And when I need to compose a multi source mapping and adding src_{hub}_ to the front, this runs out even faster.

3D to RED deployment

As described, there is a discrepancy of 1024 to 64 characters for object names. When bringing objects back to RED, object names are cut at 64 characters. For custom indexes it gets cut at 40 characters already. I don’t know why.

So I need to take care about object naming inside 3D. 3D is a nice program to design. But if I need to take care about object length for later usage in RED, it gets cumbersome.

My aim is currently to get a rough idea in 3D and then do the transfers in RED directly. With my customized templates this is as fast as working in 3D.

Object defaults

In 3D, I can set default column names with default data types. This is not possible in RED. E.g. I improved performance by switching from CHAR(32) to BINARY(16). Or, all of my date system columns should be datetime2(7). Not possible to set. Unfortunately until recently I had to manually modify them or using a QA query to modify the meta data in the database regularly. When I started writing this blog entry, I thought why not using triggers for that?!? Well, after such along time without it, I have that now in place.

In my opinion, users shouldn’t interfere with the application database directly. This is really really bad behavior, but unfortunately necessary.

Pebble

Pebble is the templating language WhereScape introduced. The intention was right to give the user the flexibility to develop their own transfers. It has taken me a year to develop fail prove transfers to get industries standards mapped. It is a trade off swapping manual time with development time. But satisfying when I get more insights, how things should work.

With successful usage the appetit grows to get more manual work incorporated into my templates. Then the problems are starting to appear. Pebble doesn’t scale with complexity. Macros suddently stop running, variables are getting forgotten while compiling. If you don’t check each and every creation, suddenly a bad stored procedure is there. Very unstable. Then time is wasted to find alternative ways to accomplish the same output. Another developer would question my bad design. But it has to work and I can’t wait!

For small and easy templates without logic, Pebble is running well.

Object Types

I like the split for every type of table I need: Stage, Data Store, Hub, Link, Satellite, Dimension, Fact, Export, etc. I love the concept of having a load object for getting outside data into the data warehouse. This is far more logical than having a stage table directly and load that content to the target.

Stage is my workhorse. It is there to prepare the data from target to target. Load is also a target.

7 object types = 7 templates to rule them all. But bringing logic inside it was not easy. A long and steep learning curve.

Stage Table and Stage View

See the Key Features graphic on their website: https://www.wherescape.com/solutions/automation-software/wherescape-red/

You see that there are options for:

  • Stage View
  • Stage Table

Stage View doesn’t exist!

All target objects like Stage, Data Store, Hub, Satellite are tables. And there is an additional object type called View. This is the only view.

But when developing a transfer, I don’t want to recreate the object as a view when I want to find out, if this runs faster. If I have a stage table, I want to be able to switch between a view and a table.

Objects are grouped for each target type. So switching between the stage menu node and view menu node is cumbersome.

There is an option to create a DDL statement. I developed a template for that. But once a view, it stays a view. I can’t switch back to a table without going manually to the database and dropping the view.

Data Vault

WhereScape is proud of their Data Vault skills. But it has taken them very long time to get that right. To add columns to a HashKey or ChangeHash is a dream!

The defaults are running ok. But they forgot that hubs are loaded from multiple stage tables. And just until recently I had to circumvent that with creating fake objects and using fancy templates. And in my opinion Links are not built correctly. The Unique Constrain of Hub Hash Keys are not enforced.

Advanced forms don’t work:

  • Multi-Active Satellites
  • Links with peg-legged Hubs (e.g. sales order item id for which I don’t want to form a hub)
  • Point-in-time (no template)
  • Bridge (no template)

Meta Data

The application collects all the meta data necessary to implement an object. It is really an open application with nearly total freedom. You are absolutely free to implement bugs and stuff which doesn’t work.

At first, every team member evolved their own development style. Through discussions and documentation we agreed on some principles and layout of our objects. But things are getting forgotten or are not getting checked.

The need arose to write some QA queries to check for issues. This proved to be very unreliable as there are too many bits and pieces to configure for each object.

This is the downside of having total freedom. Other programs which are guiding you through the development steps might be easier and more consistent to work with.

Jobs

When developing, I create tables and stored procedure. Through lineage I see which object depend on others. This is really a great feature.

When adding a job based on my work, I click the needed objects together. There it should end. But… I have to group them by execution layer and setting the dependencies manually?!? By default it maps EVERY object from one layer to EVERY object in the next layer. Then I need to delete the obsolete mappings. I mean, the system knows the dependencies of each object. Why the hell I need to teach the program it again?!? I see the need of a manual overwrite. But the first draft should be done by itself.

Another issue is that I can set the dependencies on another job. This is great and runs well. But sometimes, especially when developing, I want to force start this job. If the dependent job didn’t run, this job will not run either. Either I delete the dependency or I add another copy of the same job without a dependency. The later we do currently.

Support

The support is based in New Zealand. If I have an issue which I want to show somebody, I will have to schedule here in Europe a slot in the evening to midnight.

Usually this is not necessary. The help provided is fast. For bugs and feature requests, you will receive a number and then the issue is forgotten. Sometimes you see according to the low number it is known for years. The development focus is on delivering more fancy features than ironing out old bugs.

In every release I get newly introduced bugs of stuff which worked before. Then I get an info that they will fix it for the next release. Release cycles are roughly 9 to 12 months.

Support for Pebble doesn’t exist. Somehow understandable, as this is an area for a lot of user questions and resources would be bound. And I believe they already experienced themselves, that Pebble is unpredictable and don’t want to waste time on it. Or maybe we are too small and they make rather big players happy. I don’t know.

Implementation

I envy those companies which are able to implement WhereScape in one week. The success stories of WhereScape are full of this. Maybe they sent a full team of success engineers in and get them to full speed or did that themselves.

Our success engineer was great. Helping us here and there with the applications default functionality. Maybe our „We can do whatever you want“ was too complicated. I can’t imagine how others are doing it. I mean, I have only 7 object types. Loading e.g. a Fact or a Data Store should be everywhere the same. My aim was to get standards implemented accurately, automatically and without too much manual work. And this has taken us a lot of time!

Final Words

I love WhereScape for their flexibility. I hate it because I have to (re-)invent everything. Their default is too „free“ and only basic. I love the drag & drop of columns. I love it for creating tables, views and stored procedures instantly. I can’t do without it anymore. I hate it for their limitations and bugs. I hate/love the job engine. I love it for the speed of deploying objects to production. I hate Pebble.

I love/hate WhereScape!

[Update 09.12.2019]

I see that this blog post is sometimes on the top of google search. In the meantime I have to rectify my view about WhereScape.

Some of the limitations above are still true. But overall I love WhereScape. With my perfectly tuned templates we are terribly fast in developing data products. The effect is, that we don’t maintain any release cycles anymore. Sure we can do that, but my users love the rather prompt help I can offer. Telling the issue, showing a draft, finishing up.

I can’t imagine to work with any box moving applications like SSIS. Rather write the query, select the template and it is done. And then going to the next issue.