OPTION(RECOMPILE)

Again, a technical topic for SQL Server. At the beginning as an analyst you are going to learn to write standard SQL queries to get your desired results.

With time your skills evolve, and you learn how to optimize the queries, maybe using a temporary table or a common table expression. The more you learn, the more you are going to use. And suddenly it appears, as if your queries are stuck. You rewrite parts of the query, try to ease the pain of the SQL engine.

And some when you come across OPTIONs, which can be added to your table, your joins, your query or your stored procedure. It will tell the SQL engine how it should optimize the execution plans. Reading and optimizing execution plans is a huge playing field. I’m far from being an expert in that field. I would like to discuss and explain certain behaviors I discovered in BI and BI only. No OLTP optimization or single record stuff. In BI I work with data sets.

As you have discovered, my loading functions are all created with Stored Procedures. My typical layout looks like this:

CREATE PROCEDURE [update_stage]
AS

-- Initialize
TRUNCATE TABLE [STAGE]

-- Temporary Table for Hash Calculation

CREATE TABLE #temp
INSERT INTO #temp
SELECT *
FROM [SOURCE]
JOIN ...

-- Insert new records

INSERT INTO [STAGE]
SELECT *
,HASHBYTES(...)
FROM #temp

This is my typical layout. The temporary table is needed, because sometimes SQL Server doesn’t know if it should calculate 100 Mio. hashes and filter after that the needed 100000 or vice-versa.

When I compile now the stored procedure, SQL Server tries to already create an execution plan which is saved with the stored procedure. If tables are around, it gets the counts and other meta-data and tries to figure out the best way to process.

Usually in my business vault satellites, the first load will be the biggest. The following loads will be delta. So, SQL Server sees all the data and thinks, wow, a lot to process. Well, THIS execution plan would be the best to process THAT lot of data.

On the following loads, it sees the data and the already existing execution plan and thinks, okay, we have a plan, so let’s get it done. It takes the assumption of the previous run and applies it to the small delta load. And this can lead to longer running queries or even „hanging“ queries. Again, I’m not expert, I just observe.

Now comes the OPTIONs into play. In particular OPTION(RECOMPILE). If you google for information about it, there are plenty of voices who suggest, to not use it. Instead improve your query writing skills and build better indexes and so on. What they don’t mention is, that they are writing those queries for OLTP application. There is speed king. If you have tons of transactions and need every time to RECOMPILE, it is a waste of time. Totally agreed.

But in BI we have little bit more time. At least on my system. Still needs to be fast. But I don’t need to update my objects every minute. I’m willing to wait a second to get a decent execution plan. Longer it hardly takes.

So, what options do we have:

RECOMPILE Stored Procedure

CREATE PROCEDURE [update_stage]
WITH RECOMPILE
AS
...

WITH RECOMPILE on stored procedure level is a very drastic measure. Always recompile everything. Each and every piece. Therefore, no cache and no stats. No way to sniff what SQL Server has actually done when the procedure did run.

RECOMPILE Query

CREATE PROCEDURE [update_stage]
AS

-- Temporary Table for Hash Calculation
INSERT INTO #temp
SELECT *
FROM [SOURCE]
JOIN ...
OPTION(RECOMPILE)

-- Insert new records
INSERT INTO [STAGE]
SELECT *
,HASHBYTES(...)
FROM #temp
OPTION(RECOMPILE)

As you can see, each statement has its own OPTION(RECOMPILE). In that way, we are still able to analyze the stats afterwards – if you ever want to.

It is just nicer to still have some control than using the club-like option WITH RECOMPILE on stored procedure level.

Conclusion

Since I have introduced this measure to my procedures, I’m not surprised anymore with waits or „hanging“ queries.