Using Aggregate Functions

It is very common to sum measures when you aggregate values along dimension hierarchies, but sometimes you need to apply a different aggregation method . For example, if you want to calculate average sales per customer, you divide total sales by number of customers . You can

Creating Calculated Members Outside of the Measures Dimension . . 164 Calculation Scripting . . . . 167 Creating KPIs . . . . 171

sum sales amount to get total sales, but to get the number of customers, you need to count customers, making sure to count each customer only once, regardless of how many purchases each customer has made . Suppose you want to compare customer sales and credit balances . For a single day, you can sum all customer values to get daily sales and credit balances . But suppose you want to look at monthly values . You want to sum daily sales, but if you sum daily credit balance, you will significantly overstate the value . Instead, you will probably want to use the credit balance on the last day of the month as the value for monthly credit balance . Analysis Services provides a variety of aggregate functions to enable complex aggregation rules to be applied to a measure . The four categories of aggregate functions are additive, pseudo-additive, non-additive, and semi-additive . You determine which category an aggre-gate function belongs to based on how the values for the parent members in the dimension hierarchies are derived from the values of their children .

When a measure uses an additive aggregate function, the value of a parent hierarchy mem-ber is equal to the sum of the values of all its children . If a measure uses a pseudo-additive aggregate function, it does not perform addition, but you can determine the value of a par-ent dimension member by applying an aggregation rule to the values of the children of the member . For example, if a measure uses the Max aggregate function, the value of a parent is the maximum value of the children . Because the additive and pseudo-additive aggregate functions allow you to calculate the value of a parent from the value of its children, without examining any other data, these aggregate functions provide the fastest query performance . The term pseudo-additive does not yet have widespread acceptance, but this book will use the term, and we hope its popularity will grow .

When a measure uses a non-additive aggregate function, the value of a parent member can’t be derived from the values of its children . The two non-additive aggregate functions are None and DistinctCount . No aggregate values are created by the None aggregate function . If a measure uses the DistinctCount aggregate function, the value of a parent member isn’t the sum, average, or minimum of the values of its children . It may be the maximum value of its children, but this is not guaranteed . For example, the distinct count of customers for a quar-ter is not the sum of the values of the months in that quarquar-ter . If DistinctCount did simply sum the values, then any customers that appear in more than one month would be counted more than once . Instead, the list of customers for the quarter must be created and counted to get the value for the quarter . Because the DistinctCount aggregate function requires calculation at the granularity of the attribute that is being counted, this aggregate function provides the slowest query performance .

When a measure uses a semi-additive aggregate function, the value of a parent member is equal to the sum of its children, unless the member is in the time dimension . In the time di-mension, the value of a parent member is equal to a point in time represented by one of its children . For example, if you wanted to measure beginning credit balance, you could use the FirstChild aggregate function . The value of the beginning credit balance for a month would be the sum of each customer’s credit balance on the first day of the month . The value of the

beginning credit balance for a quarter would be the value of the first month in the quarter . Academically, an aggregate function could be semi-additive along another dimension, but the time dimension is by far the most common occurrence, and the only one that Analysis Services natively supports . Table 6-1 provides a brief description of all of the aggregate functions .

TABLe 6-1 AggRegATe FUNCTIONS

Aggregate Function Category Description

Sum Additive The value of a parent member is the sum of the values of its children . Sum is the default aggregate function .

Count Additive Counts number of rows in a fact table where are particular column is non-empty or counts fact table rows . The value of a parent member can also be calculated by summing of the values of its children .

Min

Pseudo-additive

The value of a parent member is the minimum value of its children .

Max

Pseudo-additive

The value of a parent member is the maximum value of its children .

DistinctCount

Non-additive

Counts unique values of a column in the fact table . The value of a member is determined by counting unique values for the member .

The value of a parent member is the sum of the value of its children, except for a member in the Time dimension . In the Time dimension, the value of a parent member is the value of its first child .

LastChild

Semi-additive

The value of a parent member is the sum of the value of its children, except for a member in the Time dimension . In the Time dimension, the value of a parent member is the value of its last child .

FirstNonEmpty Semi-additive

The value of a parent member is the sum of the value of its children, except for a member in the Time dimension . In the Time dimension, the value of a parent member is the value of its first non-empty child .

LastNonEmpty

Semi-additive

The value of a parent member is the sum of the value of its children, except for a member in the Time dimension . In the Time dimension, the value of a parent member is the value of its last non-empty child .

AverageOfChildren Semi-additive

The value for a member derived by summing along all dimen-sions at the lowest level of granularity of the cube’s time di-mension and then averaging .

ByAccount

Semi-additive

The ByAccount aggregate function is used when the cube contains an account type dimension . The aggregate function applied to the measure is a property of the members of the Account dimension . This aggregate function will be covered in detail in Chapter 8, “Working with Account Intelligence .”

Now that you have been introduced to aggregate functions, you are ready to add measures that use these functions to a cube . In the next several procedures, you will create measures using the Count, Min, and DistinctCount aggregate functions . Then you will browse the cube and compare the effect of using these different functions .

Create a measure using the Count aggregate function

In this procedure, you will add a measure that uses the Count aggregate function to count the number of rows in the FactResellerSales table . Measures that count the number of rows in a table or number of values in a column are frequently used as the denominator in a calculated measure that computes an average value . You can also use row count measures to validate a measure group . You can query the source data to find out how many rows are in a fact table and compare that with the value of the row count measure . Before you can add measures to the cube, you need to open this chapter’s version of the AdventureWorks BI solution .

1. Use Business Intelligence Development Studio (BIDS) to open the AdventureWorks BI solution contained in the C:\Microsoft Press\Analysis Services 2008 SBS\Chapter 06\

AdventureWorks BI folder .

2. In Solution Explorer, expand the Cubes folder, right-click the AdventureWorks .cube, and select View Designer .

3. On the Cube Structure tab in the Measures pane, right-click the Reseller Sales measure group and select New Measure .

4. In the New Measure dialog box, expand the Usage list . The Usage list displays the types of aggregation rules you can apply to the new measure . Table 6-2 shows a correspon-dence between the aggregation rules displayed in the Usage list and the aggregate functions .

TABLe 6-2 USAge LIST AggRegATe FUNCTIONS

Usage List Item Aggregate Function

Sum Sum

Count of rows Count (applied to fact table rows) Count of non-empty values Count (applied to a fact table column)

Minimum Min

5. In the Usage list, select Count Of Rows . Click OK .

The new measure appears in the Reseller Sales measure group . Because the new mea-sure usage is count of rows, its name is the name of the meamea-sure group followed by the word count .

6. Each row in the FactResellerSales table contains data from a single line of an order . You should rename the measure so that its name more clearly reflects the data it contains . Right-click the Reseller Sales Count measure and select Rename . Change the name of the measure to Reseller Order Line Count . You will also want to apply a format to the measure to make it easy to read .

7. Right-click the Reseller Order Line Count measure and select Properties . In the Properties window, set the value of the FormatString property to #,# . In the Properties window, you can also see that the value of the AggregateFunction property is Count .

Creating a measure using the Max aggregate function

In this procedure, you will create a measure that displays the value from the FactResellerSales table row that contains the largest number of units based on the dimension members se-lected . You can use this type of measure to find products that are ordered in bulk .

1. In the Measures pane, right-click the Reseller Sales measure group and select New Measure .

2. In the New Measure dialog box Usage list, select Maximum and in the Source Column list, select Order Quantity . Click OK .

3. The new measure name, Maximum Order Quantity, is the measure usage combined with the name of the fact table column . The new measure actually displays the maxi-mum quantity from an order line, so you should rename the measure .

4. Right-click the Maximum Order Quantity measure and select Rename . Change the name of the measure to Max Order Line Quantity . You should also apply a format to the measure .

5. Right-click the Max Order Line Quantity measure and select Properties . In the

Properties window, set the value of the FormatString property to #,# . In the Properties window, you can also see that the value of the AggregateFunction property is Max .

Creating measures using the DistinctCount aggregate function

In this procedure, you will create a measure that counts orders and a measure that counts resellers . You can use order count by time period to help gauge call center performance and use reseller count to measure product penetration . You can use either measure as the de-nominator in a calculated measure that computes an average . Because the fact table contains a record for each order line and an order may have multiple lines, a single order number may appear many times . Because a reseller may place multiple orders, a single reseller will most likely appear many times in the fact table . Because individual order numbers and re-sellers may appear multiple times in the fact table, the measures you create have to use the DistinctCount aggregate function .

1. In the Measures pane, right-click the Reseller Sales measure group and select New Measure .

2. In the New Measure dialog box Usage list, select Distinct Count and in the Source Column list, select SalesOrderNumber .

3. Click OK . A new measure group, Reseller Sales 1, has been created . BIDS will create a separate measure group for each distinct count measure that you create . When Analysis

Services processes a distinct count measure group, the SQL query used to select data from the fact table sorts the records by the distinct count column so that measure group physical data storage is optimized for the distinct count measure . Therefore, each distinct count measure should be in a separate measure group .

Tip Because the SQL query executed when a distinct count measure group is processed sorts the fact records by the distinct count column, processing a measure group that con-tains a distinct count measure takes longer and uses more source database resources than processing a similar measure group that doesn’t contain a distinct count measure . Because the DistinctCount aggregate function is non-additive, an Analysis Services query that con-tains a distinct count measure executes slower than a similar query that doesn’t contain a distinct count measure .

4. The names of the new measure group and measure are not very user friendly, so you should rename them . Right-click the Reseller Sales 1 measure group and select Rename . Change the name of the measure group to Reseller Orders .

5. Right-click the Sales Order Number Distinct Count measure and select Rename . Change the name of the measure to Reseller Order Count . You will also want to apply a format to the new measure .

6. Right-click the Reseller Order Count measure and select Properties . In the Properties window, set the value of the FormatString property to #,# . In the Properties window, you can also see that the value of the AggregateFunction property is DistinctCount .

Now that you have created the Reseller Order Count measure, you are ready to create a measure that counts resellers .

7. Repeat steps 1 through 6 to create a new measure with the properties listed in the fol-lowing table .

Property Value

Usage Distinct Count

Source Column ResellerKey Measure Group Name Resellers Measure Name Reseller Count

FormatString #,#

You have created four new measures . The Measures pane should now look like this:

In the next procedure, you will deploy the Analysis Services project, process the database, and then browse the cube to view the new measures .

Deploy and browse the cube

1. On the Build menu, select Deploy AdventureWorks SSAS . If the AdventureWorks SSAS database already exists on the server, a dialog box may appear warning you that the database will be overwritten . If the warning appears, click Yes . The prior version of the database will be deleted and the current deployment will continue .

2. After the database has been successfully deployed and processed, click the Browser tab of the Cube Designer .

3. In the Metadata pane, expand the Measures folder and then expand the Reseller Orders, Reseller Sales, and Resellers measure groups .

4. Drag the Reseller Count, Reseller Order Count, Reseller Order Line Count, Max Order Line Quantity, and Reseller Order Quantity measures to the area labeled Drop Totals Or Detail Fields Here in the Report pane .

The report shows the effect of using the DistinctCount, Count, Max, and Sum aggregate functions . It shows that 635 resellers placed 3,796 orders that had 60,855 order lines . The largest order line was for 44 units, and a total of 214,378 units were ordered .

5. Drag the Product dimension to the area labeled Drop Row Fields Here in the Report pane .

6. Expand the Accessories product category . The first two measures, Reseller Count and Reseller Order Count, are non-additive distinct count measures . You can easily see that a parent value of these two measures can’t be derived from the value of the children . Because individual resellers order multiple products from the Accessories category, the sum of the subcategory values (Bike Racks, Bottles and Cages, and so on) is greater than the value of the parent category .

The last three measures use additive or pseudo-additive Count, Max, and Sum ag-gregate functions . For these measures, you can derive the value of a parent from the value of the children . For the count and sum measures, Reseller Order Line Count and Reseller Order Quantity, the value of a parent is just the sum of the values of the chil-dren . For the max measure, Max Order Line Quantity, the value of the parent is the maximum of the values of the children .

If you are interested, you can expand the Helmets subcategory and verify the behavior of the measures as product values are aggregated into product category values . Analysis Services contains semi-additive aggregate functions to enable you to include mea-sures that represent balances in a cube . A balance is a count or sum at a particular point in time . One characteristic of a balance is that the same item may be counted or summed mul-tiple times . For example, using shipping and receiving records, the inventory in a warehouse can be derived and recorded on a daily basis . An item may be counted multiple times, once for each day it is in the warehouse .

When you aggregate a measure that represents a balance, you want to avoid counting items more than once . The value of a parent member is equal to the sum of its children for every dimension except for the time dimension . For the time dimension, the value of a par-ent member is the value of one of its children . In the warehouse example, suppose there are two measures, Opening Inventory and Closing Inventory, supported by values recorded in the underlying relational data . Opening Inventory is the count of items at the beginning of the day and Closing Inventory is the count of items at the end of the day . For a month, the value of Opening Inventory is the value of Opening Inventory on the first day of the month, and the value of Closing Inventory is the value of Closing Inventory on the last day of the month . You can continue to aggregate up the time dimension in a similar fashion . For ex-ample, the Opening Inventory for a quarter is the Opening Inventory for the first month in

the quarter, and the Closing Inventory for a year is the Closing Inventory for the fourth quar-ter . Continuing with our warehouse example, the Opening Inventory for the Bikes category is the sum of the Opening Inventory values for each of the children of Bikes (Mountain Bikes + Road Bikes + Touring Bikes) .

When a measure uses the FirstChild or LastChild aggregate functions, the value of a parent member in the time dimension is equal to the first or last child of the member . This works well except in those situations where the first or last child is blank . In those situations, when the first child is blank, you may want to use the value of the first child that isn’t blank . When the last child is blank, you may want to use the value of the last child that isn’t blank . For example, suppose the warehouse is closed on weekends and holidays, and inventory is re-corded only on workdays . For some months, the first or last day of the month occurs during a

In document Printed and bound in the United States of America QWT Distributed in Canada by H.B. Fenn and Company Ltd. (Page 159-170)

Table of Contents