Whilst completing my review on the Expert Cube Development with SSAS Multidimensional Models, I came across the Distinct Count Measure within SSAS and how this can affect query performance as well as processing performance.
You can find the reference on page 114
And here is a link to the actual book that I am reviewing:
What I wanted to do is instead of using the Distinct Count Function which we can use within SQL Server Analysis Services (SSAS), (which as we know has some performance issues along with creating its own measure group), I wanted to find a way where I could use the SUM Function in our Measure group but still return the distinct count.
And then use this distinct count using a dimension to slice by
In our example below we want to extract from the AdventureWorksDW2012 Analysis Services Database
You can get a copy of this here: http://msftdbprodsamples.codeplex.com/releases/view/55330
· I used the AdventureWorksDW2012 Data File and AdventureWorks Multidimensional Models SQL Server 2012
· We are going to get the distinct number of Customers (CustomerKey) from the [dbo].[FactInternetSales]
o So this will enable us to get a distinct count based on any of the attributes from our Customer Dimension
· NOTE: The thing to note is that you will always want to create your distinct calculation on your lowest granularity.
· As with our example in our [dbo].[FactInternetSales] the lowest level was the Date or OrderDateKey.
· In order to follow how this works, we will be using the Order date of 03 March 2008.
o The distinct number of Customers on 03 March 2008 is 51 Customers.
1. The first thing is we will be creating our distinct calculation and then insert this into a staging table.
a. NOTE: What you could do is to use your staging table later as part of the loading into your Fact Table.
i. But with our example we are going to update our Fact Table with our calculations.
ii. We also insert the data into a Staging table so that we can use it as part of our Update Statement.
2. Next below is the TSQL syntax that we used to create our distinct number of Customers Calculation, with an explanation afterwards
a. The only part that we are going to concentrate on is the actual calculation.
b. We will start the explanation from the inside out, due to this being the logical way that I built this up.
i. The part highlighted in RED is where we first are selecting the Distinct CustomerKey
ii. The next section highlighted in PURPLE is where we are doing a count of the Distinct CustomerKey
iii. The next section highlighted in GREEN is where we are converting our values to a Float
iv. The final part of the calculation is highlighted in BLUE, where we are dividing our distinct count by the number of rows.
1. / Count(1)
2. This is so that we can then get the correct calculation.
v. We also have the OrderDateKey, because this is our lowest level of granularity, we need to group by this so that the number of rows is grouped per day to work out the calculation correctly.
vi. NOTE: The reason that we are converting this to a float is so that later when we sum the row details it will sum back up to the correct number.
3. For the simplicity to understand below is the complete TSQL Syntax where we will be truncating and inserting our data into our Staging table in our AdventureWorksDW2012 database
a. As you can see above we are also grouping the calculation by OrderDateKey so that we can use this to insert the calculation for multiple days.
4. With our example explained above if we had to look at our calculation for our Distinct Customers on the 03 March 2008 it would be the following:
1. The first thing is that you need to ensure that you have your column created in your Fact Table.
a. I manually created the following column in the dbo.FactInternetSales Table:
2. Next is our Update TSQL Statement where we are updating the column we created in Step 1 above, with an explanation below:
a. From above you can see that we are joining from our Fact table to our Staging table and using the OrderDateKey in our join.
i. NOTE: Once again this is our lowest level of granularity
3. As with our example if we now have a look at our Fact Table for 03 March 2008 we will see the following for our DistinctCustomers column.
1. Open up your SSAS Project in SQL Server Data Tools (SSDT)
2. Next open the Data Source View and refresh it.
a. As our example we opened up the Adventure Works DW.dsv
b. Once open we clicked Refresh, which you will then see the following:
c. Click Ok.
3. Next open your cube where you want to add your new Distinct Measure to.
a. As with our example we opened the Adventure Works cube
4. Under Measures if you click on the plus sign, and next to Internet Sales, right click and select New Measure
5. This will open the New Measure Window
a. Now as shown below under Usage: Sum
b. Source Table: Internet Sales Facts
c. Source Column: DistinctCustomers
e. Then click Ok.
6. You will now see your measure under the Internet Sales Measure Group
b. Now we are going to rename the Distinct Customers measure so that we can use this name in our calculation member which will be explained in the next steps.
i. NOTE: The reason we are creating a calculation is because we need to round up our values from our database in order to make the distinct count a whole number.
c. Right click on the Distinct Customers Measure and select rename
d. We are renaming it to Distinct Customers – Calc
i. NOTE: The reason that we give it this name is so that we know it is used as part of a calculation.
e. The final thing to do is to right click on our Distinct Customers – Calc and select properties.
i. Then change the Visible Property to False
f. Then save your cube.
1. The final step is to create our calculated member so that when we display the values to the client tool that the numbers look correct.
2. Click on the Calculations Tab.
3. Then click on New Calculated Member
4. We then configured it with the following as shown below:
a. Name: [Distinct Customers]
b. Parent Hierarchy: Measures
c. Expression: Round([Measures].[Distinct Customers – Calc],0)
i. NOTE: Here we are using the Round function and setting it to not keep any decimal points by specifying the zero (0).
1. This will also enable if the value is higher than 0.5 to round up to 1.
2. This is so that when the Calculated Measure is displayed it will always be a whole number.
d. Format String: Standard
e. Visible: True
f. Non-empty behavior: Distinct Customers – Calc
g. Associated Measure Group: Internet Sales
5. Now finally process your cube.
a. NOTE: If you are using the Adventure Works MultiDimensional project in SSDT, they are by default set to Query binding for the partitions and due to this you will have to add our new column to the following Measure Groups:
i. Part to Add:
ii. Measure Group partitions:
iii. If you do not add the column name in there the processing will fail.
1. Finally we now can view our new Distinct Count Measure that we created.
2. As with our example I am expecting to see for 03 March 2008 the Distinct Customers to be 51
3. And this was the goal of this exercise to enable to get a Distinct Count that can be used on our Dimensions using the SUM Function
If you wanted to get a distinct count for another dimension, you would then need to create another column as well as all of the steps above.
1. For our above example if we wanted to get the distinct count of Currency (CurrencyKey) we would need to modify our query to include the following:
2. And then we would create all the column names and details in SSAS as detailed above.