Category Archives: Analytics and MDX

MDX – Multi Dimensional Expression for doing OLAP Queries

MDX Query Design for Data Extraction

MDX Query Design

The following section discusses the MDX Query design for extracting the data into a CSV file for Multi Dimensional data. The 2 main Dimensions with hierarchies are Time ( Year/Quarter/Month) as the hierarchy and Product ( Market / Product / Package ) as the Hierarchy. These 2 dimensions would be present in any data extract.

This is divided into 2 sections. Section 1 is for extracting the Hierarchy data for time dimension and Section 2 is for extracting the Hierarchy data for the Product dimension.

Section 1 – MDX for Time Hierarchy

If the user chooses the data extract to include data in the time dimension at the most granular level – so that it has – Year, Quarter and Month – the MDX for the Time Dimension part would look like

SELECT

NON EMPTY {Descendants([Time.Time].[All Members], 3)} ON COLUMNS,

The integer 3 above indicates that the data should be returned at the most granular level – with all the 3 levels of the Time dimension included.

For this Case :

When the data is to be extracted at the quarter level – the MDX :

SELECT

NON EMPTY {Descendants([Time.Time].[All Members], 2)} ON COLUMNS,

The above MDX means – that we need data for only 2 levels in the Time Dimension starting with data for All Years.

Section 2 – When extracting data for Product Dimension Hierarchy

Case 1

Assume the user has selected the Market ( which is from the Product Dimension ) to be ALLERGY and the user wants to get data for all the Products and Packages which come under the Market – ALLERGY.

MDX for the above choice would look like below

SELECT

… on Columns,

NON EMPTY {Descendants([Product].[ALLERGY], [Product].[package_name])} ON ROWS

FROM [Cube]

WHERE ([Measures].[MeasureName])

Results of the above

Case 2 :

Assume that the user has selected the ALLERGY Market from the Product Dimension and wants to get the data values at the Product Level.

MDX for the above would be like this

SELECT

.. ON COLUMNS,

NON EMPTY {Descendants([Product].[market_name].[ALLERGY], [Product].[product_name])} ON ROWS

FROM [Cube]

WHERE ([Measures].[MeasureName])

Result

Case 3 :

The use case where the user selects the Product Dimension and wants to get the data extract – for all Products and all Packages under them – the MDX Query would look like :

SELECT

.. ON COLUMNS,

NON EMPTY {Descendants([Product.Product_Package].[All Members], [Product.Product_Package].[package_name])} ON ROWS

FROM [Cube]

WHERE ([Measures].[TRx])

Results:

Case 4 :

The use case where the end user wants to get a data extract for all Packages under a specific Product – called – RESTASIS – the MDX query would look like :

SELECT

.. ON COLUMNS,

NON EMPTY {Descendants([Product.Product_Package].[RESTASIS], [Product.Product_Package].[package_name])} ON ROWS

FROM [Cube]

WHERE ([Measures].[TRx])

Explanation

Advertisements

Calculated Measure and MDX to calculate Average Weekly Change (AWC) for Time Series in Mondrian

Assuming we have a set of data points in a Weekly Time Series for a measure – TRXUnits. The data below shows the weekly data for a given Product.

WeekName TRXUnits
Week1 123
Week2 345
Week3 146
Week4 234
Week5 156
Week6 198
Week7 212
Week8 200
Week9 178
Week10 167
Week11 199
Week12 170
Week13 211

The Line Chart for the above Table Data – with a Least Square Fit line is shown below.

The AWC calculation Strategy is given below. Assume that the starting time for the above data is called T0 and the final time is called Tn

  1. Assume that the equation of the least square fit line is – y = mx + C, where m is the slope of the line and C is the intercept.
  2. Get the values of  y at Time T0 and  Tn.

Let these be :                                       yn = xn * m + C           &

y0 = x0 * m + C

AWC = ( yn – y0 ) = m * (xn – x0)

Functional Requirements for Comptuing AWC

This should be computable for any granularity for any dimension at any level.

For example – if we want to Calculate the AWC for the last 18 weeks of data here is the Calculated Measure – that is used to calculate the AWC in 2 steps.

The 1st Calculated Member is used to calculate the Slope of the line using the build in function called – LinRegSlope. This Calculated Measure uses the RANK function.

<CalculatedMember dimension=”Measures”

formula=”LinRegSlope( {LastPeriods( 18,  ([Period].[Week].Members.Item(18-1)  ) )  }, [Measures].[TRx], Rank([Period].[Week].CurrentMember,

LastPeriods( 18,  ([Period].[Week].Members.Item(18-1)  )    ) ))”>

</CalculatedMember>

The 2nd Calculated Member is then just used to multiply the Slope obtained from the 1st calculated measure and multiply it by ( 18 -1 ) – which essentially is ( yn-y0) – as derived in the formula above.

<CalculatedMember dimension=”Measures” formula=” ( [Measures].[ Slope18Week] *  (18-1) )”>

</CalculatedMember>

** MDX — Item function is Zero Based

Sample MDX Queries using the above Summary Variable

A sample MDX using the above Calculaated Measure would look like

select NON EMPTY {[Measures].[ AWC18] } ON COLUMNS,

NON EMPTY {[Alignment].[TerritoryName].Members} ON ROWS

from [Cube]

MDX for doing the Percent of Total in Magic Square – using Mondrian OLAP Engine

Assume we had a magic square where one can have any measure in the squares and the values are drilled down in both the Axes – Rows and Columns – by dimensions. The Magic Square also shows the row totals and column totals and at the lower right hand side we show the total of everything – which we call the cross total – see figure 3. The picture below illustrates it explicitly. In the columns we have the dimension – Patient Cosmetic, while the rows have the dimension – Toric/Multifocal/Spherical. The squares have the measure – Value.

Figure 1

Showing the above table with the Percent of Total would yield the following values.

Figure 2

Notice how the Total Column adds up to 100%, but each row does not. This is the desired behavior. If the user wants to see the percent of total for the Rows, they can transpose the chart.

The figure below shows what is meant by Horizontal and Vertical Total and Cross Total.

Figure 3.

Figure 4

MDX for generating the Percent of Total for all Squares and Cross, Vertical and Horizontal Total in Mondrian

Original MDX Query

The MDX query for generating the data values in the MagicSquare as in Figure 1 would be

SELECT

{[Dimension1.Dimension1].members} ON COLUMNS,

{[Dimension2.Dimension2]..members}  ON ROWS

FROM [Cube]

WHERE ([Measures].[Value])

Let us see how we modifiy the above MDX Query for the Percent of Total values so that we get the results as in Figure 2.

WITH   member [Measures].[POT]  as [Measures].[Value]  /  ([Measures].[Value],[Dimension2.Dimension2].CurrentMember.Parent),format_string=’00.00%’

SELECT

{[Dimension1.Dimension1].members}, {[Dimension1.Dimension1].[All Members]} ON COLUMNS,

{[Dimension2.Dimension2].members} }} ON ROWS

FROM [Cube]

WHERE ([Measures].[POT])

An important addition to this MDX Query is – the section in BOLD – , {[Dimension1.Dimension].[All Members]}

This gives the Vertical Total automatically for Percent of Total calculations.

Since Percent of Total is based on Sum/Volume – it is additive – the Horizontal Totals & Cross total is then manually calculated for each column by adding the Percent of Total value result is obtained in each column. The highlighted sections in Figure 4 below – have to be manually calculated by the Application layer before presenting the results to the end user.

Later on we found an important issue with the above MDX.

The above MDX works only when there is no Filter in the Column Segments. So in other words the above Query works if all the Dimension1 members are to be shown in the Magic Square column. If the user applies any filter on Dimension1 – for example if the user wants to show only 2 members of Dimension1 in the Column the above MDX query will NOT return correct data values for the Vertical Total – since it calculates the Vertical Totals which is based on – [All Members] – which means it takes into consideration all members of Dimension1 and NOT just the filtered ones OR the ones the end user needs.

[All Members] – is actually the default member for any Level’s Parent Level in Mondrian.

So how to Resolve this issue

The simplest way to resolve this is to do this new Query for the Total Column. Initially the thought was  a single MDX Query would be easier to formulate and execute and better to handle the results – but actually it fails when as illustrated above the Columns has Segments from the SG filter.

It has to be understood that – 1 MDX query cannot help us – to get the values of the Squares as well as the Total Column values. There has to be 2 MDX Query. The application layer  then assembles the results from the 2 MDX Queries and merges them together in a Data Structure and populates the UI accordingly.

The MDX Query to get the values in the squares is exactly as the one above – but without [All Members ] member. Assume that the user has selected to be shown on the columns only the following members of Dimension1 – { member1, member2 }.

WITH  member [Measures].[POT] as

[Measures].[Value] /

([Measures].[Value],[Dimension2.Dimension2]. CurrentMember.Parent),format_string=’00.00%’

SELECT

{[Dimension1.Dimension1].[member1]}, {[Dimension1.Dimension1].[member2]} }} ON COLUMNS,

{[Dimension2]. members} }} ON ROWS

FROM [Cube]

WHERE

([Measures].[POT])

While the MDX Query to get the Vertical Column is :

WITH  member [Measures].[POT] as [Measures].[Value] / ([Measures].[Value],[Dimension2.Dimension2].CurrentMember.Parent),format_string=’00.00%’

member [Measures].[PartialSum] as

Sum({[Dimension1]. [member1], [Dimension1].[member2]}, [Measures].[POT] )

member [Measures].[TOTAL_POT] as

[Measures].[PartialSum] / ([Measures].[PartialSum],[Dimension2.Dimension2].CurrentMember.Parent),format_string=’00.00%’

SELECT  { [Measures].[TOTAL_POT] }  ON COLUMNS,

{[Dimension2.Dimension2]. members} ON ROWS

FROM [Cube]

The MDX query above has 3 Calculated Measure defined dynamically at run time.

Let us understand each measure:

1.

WITH  member [Measures].[POT] as [Measures].[Value] / ([Measures].[Value],[Dimension2.Dimension2].CurrentMember.Parent),format_string=’00.00%’

This is the same as the Percent of Total Measure

2.

member [Measures].[PartialSum] as

Sum({[Dimension1]. [member1], [Dimension1].[member2]}, [Measures].[POT] )

This measure calculates the POT  for the 2 Segments which are filtered by – [Dimension1]. [member1], [Dimension1]. [member2]

3.

member [Measures].[TOTAL_POT] as

[Measures].[PartialSum] / ([Measures].[PartialSum],[Dimension2.Dimension2].CurrentMember.Parent),format_string=’00.00%’

The above defines the Percent of Total based on the 2nd Measure ( as defined above ) – [Measures].[PartialSum]

This Measure calculates the POT based on the Totals for the 2 Segments in the SG which are shown in the UI

So the actual MDX Query for the Total Column is

SELECT  { [Measures].[TOTAL_POT] }  ON COLUMNS,

{{{[Manufacturer.Manuf_Brand_H].[Manufacturer].members} }} ON ROWS

FROM [Prescriber Product PeriodRollup]

WHERE

([Product.Product_Package].[product_name].[ProjectedTotalLenses])

The above Query for the total Column – uses the measure – [Measures].[TOTAL_POT]

MDX in Mondrian – for Percent of Total calculation for – Table Charts

What are we trying to do?  Assume we have a Table Chart – which lists some measures which are grouped by members of Dimension1. See below.

BreakoutByDimension1 Measure1 Measure2
Segment1 9,722,863 9,432,877
Segment2 184,569,451 187,198,313
Segment3 26,471,424 24,360,739
Segment4 348,214 313,446
Segment5 830,859 889,048
Total 221,942,812 222,194,423

We would like to get the Percent of Total for Each measure – rather than actual measure values.

A B
Dimension1 POT Measure1 POT Measure2
Segment1 4.38% 4.25% 1
Segment2 83.16% 84.25% 2
Segment3 11.93% 10.96% 3
Segment4 0.16% 0.14% 4
Segment5 0.37% 0.40% 5
Total 100.00% 100.00% 6

MDX Query

with member [Measures].[POT Measure1] as

‘[Measures].[Measure1]

/

([Measures].[Measure1], [Dimension1.Dimension1].[Level1].CurrentMember.Parent)’,

format_string=’00.00000%’

member [Measures].[POT Measure2] as

‘[Measures].[Measure2]

/

([Measures].[Measure2], [Dimension1.Dimension1].[Level1].CurrentMember.Parent)’,

format_string=’00.00000%’

SELECT {[Measures].[POT Measure1], [Measures].[POT Measure2]} ON COLUMNS ,

{[Dimension1.Dimension1].[Level1].members} ON ROWS

FROM [Cube]

The highlighted line shows the Breakout by Dimension and the Mondrian Calculated Measure – [Measures].[ POT Measure1] – is part of the MDX Query that is generated at Run time. This has to be run time – because of actually 2 reasons

  1. Look at the denominator of the [Measures].[ POT Measure1] – it is – ([Measures].[Measure1], [Dimension1.Dimension1].[Level1].CurrentMember.Parent)’

The above denominator – is what gives the Total value for the Measure – Measure1 for the Parent Member of Dimension1.

Now the above Denominator is only known at run time since it depends on what the breakout by Dimension the user has selected in the AdHoc Query tool.

  1. All the above changes related to forming the modified MDX would not have been necessary had Mondrian supported the Axis(). If that were the case then the POT Measure1 – would be a Calculated Measure as below in the Mondrian Schema

<CalculatedMember name=” POT Measure1″ caption=” POT Measure1″ dimension=”Measures” formula=”[Measures].[Measure1]

/ ([Measures].[Measure1], Axis(1).Item(0).CurrentMember.Parent)”>

</CalculatedMember>

Where – the highlighted section above – would be automatically understood by Mondrian as the breakout by Dimension.

Slight Changes to the above MDX Query for POT

All the above MDX – is fine – when the breakout by dimension is a flat dimension or the end user wants a POT comparison at the same level. However, if the end user wants to get the POT Values at the global level – For example Look at multiple breakouts by :

Dim1-Level1 Dim1-Level2 SUM POT ( from 1st MDX ) Actual POT at Global Level

( Desirable Answer )

Manuf1 Brand1 20 20/(20+30) = 40% 20/100 = 20%
Manuf1 Brand2 30 30/(20+30) = 60% 30/100 = 30%
Manuf2 Brand21 15 15/(15+25+10) = 30% 15/100 = 15%
Manuf2 Brand22 25 25/(15+25+10) = 50% 25/100 = 25%
Manuf2 Brand23 10 10/(15+25+10) =10% 10/100=10%

What happens is that the MDX Query as it was before – for a Hierarchical case – would calculate the POT ( intra Level ) and not global across all the Segments in all levels.

So I had to change the MDX Query to this

WITH member [Measures].[POT Measure1] as [Measures].[Measure1] / ([Measures].[Measure1],[Dim1.Dim1].[All Members]),format_string=’00.00%’
SELECT { [Measures].[POT Measure1]} ON COLUMNS ,
{[Dim1.Dim1].[Level1].Children} }} ON ROWS
FROM [Cube]

Now we are computing the POT at the Global Level since we have added
([Measures].[Measure1],[Dim1.Dim1].[All Members])

as the Denominator – where [All Members] means the Root Level ( Global Level )

MDX Trick

So what was the problem I was trying to solve.

I need to compute something called – Change Over Previous Year, so for example if my Quarterly data looks like

2004-Q1 2004-Q2 2004-Q3 2004-Q4 2005-Q1 2005-Q2 2005-Q3 2005-Q4 2006-Q1 2006-Q2
123 456 789 987 654 321 123 456 789 987

I wanted to get the Change of each quarters data for previous year. The output should look like

2004-Q1 2004-Q2 2004-Q3 2004-Q4 2005-Q1 2005-Q2 2005-Q3 2005-Q4 2006-Q1 2006-Q2
654-123 321-456 123-789 456-987 789-654 987-321
531 -135 -666 -531 135 666

The 2nd row above shows the actual values that are desirable.

The simplest MDX Query to do the above would be something like this

With Member [Measures].[Value Change] as

‘(([Time].CurrentMember, [Measures].[Value]) – (ParallelPeriod ([Time.Time].[Year], 1, [Time].CurrentMember), [Measures].[Value]))’

SELECT {[Time.Time].[Quarter].Members} ON COLUMNS,

{[Product.Product_Package].[product_name].[Patient Visits]}  ON ROWS

FROM [CubeName]

WHERE ([Measures].[Value Change])

It was returning me the values as below – (the 1st 4 columns of which are undesirable to me)

2004-Q1 2004-Q2 2004-Q3 2004-Q4 2005-Q1 2005-Q2 2005-Q3 2005-Q4 2006-Q1 2006-Q2
123-NA 456-NA 789-NA 987-NA 654-123 321-456 123-789 456-987 789-654 987-321
123 456 789 987 531 -135 -666 -531 135 666

So I changed the MDX query to be this now

With Member [Measures].[Value Change] as

‘(IIF( IsEmpty( (ParallelPeriod ([Time.Time].[Year], 1, [Time].CurrentMember), [Measures].[Value]) ),  (([Time].CurrentMember, [Measures].[Value]) – (ParallelPeriod ([Time.Time].[Year], 1, [Time].CurrentMember), [Measures].[Value]))/ ( (ParallelPeriod ([Time.Time].[Year], 1, [Time].CurrentMember), [Measures].[Value])), ([Time].CurrentMember, [Measures].[Value]) – (ParallelPeriod ([Time.Time].[Year], 1, [Time].CurrentMember), [Measures].[Value])))’

SELECT {[Time.Time].[Quarter].Members} ON COLUMNS,

{[Product.Product_Package].[product_name].[Patient Visits]}  ON ROWS

FROM [CubeName]

WHERE ([Measures].[Value Change])

What I did – was essentially use 2 – functions provided by MDX.

IfEmpty( ) and IIF ( )

I first use the method – IsEmpty( ) on the [Measures].[Value] for the Previous Year’s Time Period – which is – ParallelPeriod ([Time.Time].[Year], 1, [Time].CurrentMember)

and if that is EMPTY – then I do a calculation that is sure to return me Infinity or NAN ( not a Number ) – otherwise I return the original calculated value that I want to.

Once, I get a values of NAN / Infinity from the OLAP Engine – my Downstream Code and Visualizing tool picks that up as a NA and does not show any Number

So initially if my Visualizing tool was showing data as below ( with a different data set from the one shown above in the table )

Now with the modified MDX Query the visualization looks more like this

Now It correctly shows the data as