This series of posts attempt to look at the pros and cons of using Stored Procedures as data access method for Tableau reports. The points made here are valid as of Tableau 8.2.
Part 1 gives a general introduction and identifies areas where some of SProcs advantages become irrelevant when it comes to Tableau
Part 2 looks at metadata considerations
Part 3 looks at Execution Plan considerations
Part 4 looks at how Tableau uses stored procedures to answer different queries
Behind the scenes look
Tableau has a more involved process when getting data from a stored procedure compared to a ad-hoc query it generates from the tables and relations present in it’s Data Source.
A high level outline of the process looks like this:
- Some housekeeping queries such as getting the session_id, setting datefirst etc.
- Getting the input parameter information for the proc in use
- Preparing the Execution Plan for the procedure in use
- Getting the metadata of the output result set
- Some temp tables creation for internal use by Tableau
- Execution of the procedure and getting the data returned by the proc
- Creation of a temporary table whose definition matches the metadata of the proc’s output result set (from Step 4 above)
- Getting the metadata/column information of the above temp table (Step 7)
- Creating individual Insert statements for each row returned by the output in Step 6 into the temp table from Step 7
- Wrapping the individual insert statements into a single transaction and executing the insert statements
- Getting DISTINCT COUNTS for each column in the Data Source’s Dimensions area
- Creating required indexes on the temp table from Step 7
- Creating the Query that would satisfy the request against the temp table from Step 7
- Preparing a execution plan for the query from Step 13
- Executing the plan and thus selecting data from the temp table from Step 7
- The above process is repeated for any new change to the input parameters.
As can be seen, there is more to getting data from a stored procedure compared to a ad-hoc query Tableau generates from the tables and relations present in it’s Data Source. Specifically, everything from Step 7 to 15 represent extra work that is not present in the ad-hoc query scenario.
Tableau makes use of this temp table in the background for some of its internal functioning. However, the drawbacks on the database end are as follows:
- Increases the load on temp db. And if you have a lot of users running stored procedure based queries, allocation contention can be seen
- Since the process writes back all the rows returned from the output of the proc into the temp table, consider the amount of writes when big fact tables are involved. Instead of simply reading the data, we are reading-writing-and-again-reading the data.
- Thankfully, the insert statements are wrapped in one transaction thus limiting tempdb log growth.
- Creation of indexes (and statistics) on the columns in the tempdb results in more resource uses.
- Creation of new execution plans for different queries that are run against this temp table. To some extent this negates the “one-time execution plan” advantage we get from using a stored procedure.
- Another potential disadvantage is as follows:
- Lets say the stored procedure has the following SELECT in itSELECT A.Month_Name, B.Store_Name, C.Customer_Name, F.Amount FROM dbo.Fact F JOIN ON dbo.Month A (JOIN Condition) JOIN ON dbo.Store B (JOIN Condition) JOIN ON dbo.Customer C (JOIN Condition)
- If you need a query that looks to SUM the Amounts by Month_Name from the above query, Tableau executes the whole stored procedure, writes the output into a temp table and then from the temp table it issues as query as follows:SELECT Month_Name, SUM(Amount) FROM #temp_table GROUP BY Month_Name
- As can be seen, in this case, the second query but against the tables involved is more efficient; SQL Server is forced to read all the columns that form the output of the procedure even if they are not required in the final output required. And if you are using Column Store Indexes, it defeats the purpose of them.
Thus when deciding between using a stored procedure or a ad-hoc query that Tableau generates from the tables and relations present in it’s Data Source, it should be evaluated if the extra work done is worth the effort. Typically, this can be true in two scenarios: one is when the query is quite complex and cant be written as a single SELECT statement and the second scenario is when the Compilation time for the single SELECT Statment is so high that this extra work is less time consuming. The second scenario is less practical than the first. Thus, it is mostly justified when the query is quite complex and cant be written as a single SELECT statement.
So far, we have considered the various advantages that Stored Procedures give us in general. However, as we have observed, most of those advantages become irrelevant for various reasons that are out of the database developer’s control. The one advantage that Stored Procedures still give is that SQL Server doesn’t have to prepare the execution plan multiple times. However, the way Tableau goes about executing a stored procedure and making use of that data makes certain things inefficient on the database end and even the one advantage that we gain by using Stored Procedure is rendered negligible in most cases.
Keep in mind the various side effects and other considerations when using Stored Procedures and use them when absolutely needed.
Let know if your experience has been different.