Partition Management in Hybrid SSAS Tabular Models

Andrew Smith Posted on

SSAS tabular models can be constructed in hybrid mode, where one partition of each table is set as a DirectQuery partition and (optionally) one or more others are set as in-memory partitions.

There are several business scenarios that tabular hybrid models can satisfy – in this post I look at one in particular, and provide an end-to-end view of how it is constructed.

The scenario is that one business unit wants to see real-time data for the current day, and another business unit wants to see complete days’ data from yesterday to as far back as data exists. DirectQuery mode provides for real-time access via tabular models, since it translates incoming DAX queries from client tools into T-SQL that’s executed live upon the SQL Server back-end. In-memory mode, by contrast, runs on data that’s been processed into the tabular model’s cache at some time in the past, so it may be a day/week/month etc. since it was last refreshed.

This scenario can be fulfilled by using the right combination of tabular partitions, tabular model query mode and tabular model connection string settings. Considering fact table partitions first, I have:

tabularhybridpartitions

Here we can see the DirectQuery partition (there can only be one) for today, and in-memory partitions for all preceding years for which we have data. The latest year (2016 in my example) is only complete up until yesterday. The T-SQL for the Today partition is:

SELECT F.[ProductKey], F.[OrderDateKey], F.[SalesOrderNumber], F.[SalesOrderLineNumber], F.[OrderQuantity], F.[UnitPrice], F.[TotalProductCost], F.[SalesAmount]
FROM [dbo].[FactInternetSales] AS F
WHERE F.OrderDateKey = CAST(FORMAT(SYSDATETIME(), ‘yyyyMMdd’) AS INT)

This means that DirectQuery access to the model is limited to today’s data only, which (in this scenario) has no overlap with any of the in-memory partitions. The year partitions all have the same T-SQL structure, e.g. for 2016:

SELECT F.[ProductKey], F.[OrderDateKey], F.[SalesOrderNumber], F.[SalesOrderLineNumber], F.[OrderQuantity], F.[UnitPrice], F.[TotalProductCost], F.[SalesAmount]
FROM [dbo].[FactInternetSales] AS F
WHERE F.OrderDateKey >= 20160101
AND F.OrderDateKey <
  CASE WHEN YEAR(SYSDATETIME()) = 2016 THEN
   CAST(FORMAT(SYSDATETIME(), ‘yyyyMMdd’) AS INT)
  ELSE
   20170101
  END

So if the partition is for the current year it will load up until yesterday, else the full year is loaded.

Once the model is up and running the in-memory partitions aren’t processed using this SQL on a regular basis, since the “Process Add” operation described below is used to populate each year day-by-day as time goes on. It’s still necessary to have this SQL though, in case any year partition needs a full reload, e.g. because some historical data in the back-end has been changed. In cases where data volumes are high, this year granularity for partition reloads may be too coarse, in which case smaller partitions should be used instead, e.g. quarterly or monthly. As each of these smaller-partitioned years becomes older and unlikely to change, its partitions could then be merged to become a single year partition.

DirectQuery partitions have a Processing Option property. For the scenario we’re considering the fact table DirectQuery partition has this set to “Never process this partition”:

tabularhybriddirectquerypartition

which means it never gets loaded into the in-memory cache, i.e. it’s only visible to DirectQuery connections.

Aside from the fact table, all the dimension tables just have one partition each, which is a DirectQuery partition that has its Processing Option set as follows:

tabularhybriddirectquerydimpartition

This makes the dimension data available to both DirectQuery and in-memory connections.

Hybrid models have a Query Mode parameter that can either be set to “DirectQuery with In-Memory” or “In-Memory with DirectQuery”. The difference is that if set to the former, client connections to the model will access it as DirectQuery by default, whereas if set to the latter, client connections to the model will access it as In-Memory by default. The default can be over-ridden in the client’s connection string to the model, i.e. using either DirectQueryMode=InMemory or DirectQueryMode= DirectQuery.

For my test setup, I used Power BI Desktop to access the model as DirectQuery (to view today’s data only), and Excel to access it as in-memory (to view history prior to today). Power BI Desktop (and Power Query) have no means to add properties to the SSAS Tabular connection string, so I set my tabular model’s Query Mode parameter to “DirectQuery with In-Memory”. Thus Power BI will use the model as Direct Query by default, and then I just needed to add DirectQueryMode=InMemory to the Excel tabular model connection string (via Excel’s Connection Properties dialog) so that it accessed the in-memory history partitions.

Of note for hybrid scenarios is that DirectQuery connections can only use DAX to query the model, whereas as in-memory connections can use either DAX or MDX. Power BI (and Power Query) use DAX and so are suitable as DirectQuery clients. Excel pivot tables by contrast use MDX to query tabular models and so must use in-memory connections. (Other significant differences between DirectQuery and in-memory mode are given in BOL here.)

With that all working, an important remaining task is to process each completed day’s data into the latest year partition, just after midnight each day. This can be achieved using the SSAS tabular “Process Add” processing option for partitions, which adds new data into an existing partition. The running of “Process Add” each night would be an automated task, that can be run using any of: (i) SSIS, (ii) a program that uses the Analysis Management Objects (AMO) object model, or (iii) anything that can run an XMLA script, such as SQL Server Agent (see this blog post for more information on all 3 options).

I chose the XMLA/SQL Server Agent option, creating the XMLA script by using the XMLA given in the previously mentioned blog post as a template, and updating it with the GUIDs from my deployed model (these were acquired by clicking the model fact table’s “Partitions…” right-click option in SSMS and clicking “Script Action”). The SQL in the script just loads data from yesterday:

tabularhybridxmla

I entered this into a SQL Server Agent job step that has type SQL Services Analysis Services command, making sure that the account running the job was assigned membership to a tabular model role having the Process Database permission.

When “Process Add” runs on a partition, any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated. So no “Process Recalc” is required afterwards, as would be the case after a “Process Data”.

In addition to running “Process Add” on the latest fact table partition, a “Process Full” is also required each night for dimension tables that are subject to updates at the back-end, so that these updates are processed into the in-memory model. This can also be scripted in XMLA and run via SQL Agent.

Finally, as Cathy Dumas notes in her blog post here, it is a good idea to do a “Process Defrag” on tables when doing incremental processing, in order to maximise compression of the in-memory partitions. (Doing a “Process Full” is another option). A regular scheduled “Process Defrag” can be set up via SQL Server Agent using XMLA – example script is:

tabularhybridxmladefrag

There is maintenance required at year end with the design described above – a new partition for the coming year must be created and the PartitionID GUID in the “Process Add” XMLA script above needs updating to reference the new partition. The partitions for subsequent years could be created in advance in the model via SSDT though, leaving just the XMLA script needing updating once per year. As this resides within SQL Server Agent it could be updated by a DBA – no access to SSDT or deep knowledge of the model is required. (The new year partitions could be created via SSMS too, but care must be taken so that they aren’t overwritten the next time the model is deployed – see here for details.)