Top MDX Queries – Parsing from JBOSS Log files using Unix Command Line

Assume we have the following JBOSS log file – which has a lot of MDX ( Multi Dimensional Expression for OLAP Analytic ) queries coming from the application

2012-07-27 13:20:29,726 WARN [com.leapfrogrx.gpharma.service.viewerservice.model.mondrian.QueryProcessor] InstanceName: ORA_ORAD_20120715_3, MDX Query = SELECT {{ {[Product.Market_Product_Package].[market_name].[Arestin Dom (Field)]}}} ON COLUMNS , {{{[National_Territory_Curr.NATIONAL_CURR_ALIGN_H].[National_Territory_Curr].members} }} ON ROWS FROM [Prescriber Product_Package Period] WHERE ([Measures].[BOXES_197Dy_Sum]), time = 4499 ms

2012-07-27 13:21:11,003 WARN [com.leapfrogrx.gpharma.service.viewerservice.model.mondrian.QueryProcessor] InstanceName: ORA_ORAD_20120715_3, MDX Query = SELECT {{ {[Product.Market_Product_Package].[market_name].[Arestin Dom (Field)]}}} ON COLUMNS , {{{[National_Territory_Curr.NATIONAL_CURR_ALIGN_H].[National_Territory_Curr].members} }} ON ROWS FROM [Prescriber Product_Package Period] WHERE ([Measures].[BOXES_197x365_Prev_Sum]), time = 41277 ms

2012-07-27 13:22:20,329 WARN [com.leapfrogrx.gpharma.service.viewerservice.model.mondrian.QueryProcessor] InstanceName: ORA_ORAD_20120715_3, MDX Query = WITH MEMBER [National_Territory_Curr.NATIONAL_CURR_ALIGN_H].[Total] as ‘Sum({[National_Territory_Curr.NATIONAL_CURR_ALIGN_H].CurrentMember}, [Measures].[BOXES_197x365_Pct_Chg])’ MEMBER [Business_Unit_Curr.NATIONAL_CURR_ALIGN_H].[x] AS ‘Aggregate( {[Business_Unit_Curr.NATIONAL_CURR_ALIGN_H].[Business_Unit_Curr].[1000 – SALES-NORTHEAST]} )’ SELECT {{ {[Product.Market_Product_Package].[market_name].[Arestin Dom (Field)]}}} ON COLUMNS , {{{[National_Territory_Curr.NATIONAL_CURR_ALIGN_H].[National_Territory_Curr].members} , [National_Territory_Curr.NATIONAL_CURR_ALIGN_H].[Total]}} ON ROWS FROM [Prescriber Product_Package Period] WHERE ([Measures].[BOXES_197x365_Pct_Chg],[Business_Unit_Curr.NATIONAL_CURR_ALIGN_H].[x]), time = 217953 ms

2012-07-27 13:22:20,368 WARN [com.leapfrogrx.gpharma.service.viewerservice.model.mondrian.QueryProcessor] InstanceName: ORA_ORAD_20120715_3, MDX Query = SELECT {{ {[Product.Market_Product_Package].[market_name].[Arestin Dom (Field)]}}} ON COLUMNS , {{{[National_Territory_Curr.NATIONAL_CURR_ALIGN_H].[National_Territory_Curr].members} }} ON ROWS FROM [Prescriber Product_Package Period] WHERE ([Measures].[BOXES_197x365_Chg]), time = 69364 ms
2012-07-27 13:22:20,369 WARN
[com.leapfrogrx.gpharma.service.viewerservice.model.mondrian.QueryProcessor] InstanceName: ORA_ORAD_20120715_3, MDX Query = WITH MEMBER [Business_Unit_Curr.NATIONAL_CURR_ALIGN_H].[x] AS ‘Aggregate( {[Business_Unit_Curr.NATIONAL_CURR_ALIGN_H].[Business_Unit_Curr].[1000 – SALES-NORTHEAST]} )’ SELECT { { {[Product.Market_Product_Package].[market_name].[Arestin Dom (Field)]}}} ON COLUMNS , { {[Measures].[BOXES_197x365_Pct_Chg]}} ON ROWS FROM [Prescriber Product_Package Period] WHERE ([Business_Unit_Curr.NATIONAL_CURR_ALIGN_H].[x] ), time = 40 ms
2012-07-27 13:22:20,374 INFO

The general structure of a line with the queries is like this

2012-07-27 13:20:29,726 WARN [com.leapfrogrx.gpharma.service.viewerservice.model.mondrian.QueryProcessor] InstanceName:, MDX Query = , time = ms

If you look carefully each of the queries have the text – , time = ms at the end which shows the time the query took to execute.

Our JBOSS Log – has 100s and 1000s of such queries.

During a day – all queries on this Application Server are logged and at night we need to figure out – what are the top 100-200 queries based on the time of the query
So the output I want is like this :

List of top 100 or 200 MDX Queries – which are sorted in Desc order of time taken for them to execute.

My output should just include the MDX Queries and not the timing – the timing is to be used only for sorting

Here is a Unix Command Line solution for this same

grep “, time =” server.log | awk -F= {‘print $3, $2′} | grep “time” | sort -rn | head -200 | sed ‘s/, time//g’ | sed ‘s/ ms//’ | cut -d’ ‘ -f3-5000

 

Let me explain how I arrived at the above command line

This shows the enormous power of Unix Command Line Scripting and the amount of thought the great men who designed Unix put into

grep “, time =” server.log : This part – only brings into lines from the server file – which have the timing information ( Since the JBOSS Log has a lot of other lines too interspersed with the MDX Query )

awk -F= {‘print $3, $2′} : if you notice carefully the lines in the Server log – which have the query have this structure

2012-07-27 13:20:29,726 WARN [com.leapfrogrx.gpharma.service.viewerservice.model.mondrian.QueryProcessor] InstanceName: , MDX Query = , time = ms

The -F option to AWK – tells us that we want to separate the line which has the query timings into 3 parts where the record separator is the character =

$1 : 2012-07-27 13:20:29,726 WARN [com.leapfrogrx.gpharma.service.viewerservice.model.mondrian.QueryProcessor] InstanceName: , MDX Query

$2 :

$3:

Now we print $3 and $2 — columns from the above record (since $1 is useless for us )
$2 is the actual MDX Query
$3 is the timing information

grep “time” : I pass the above information for only lines which have the timing information

sort -rn : sort on Column $3 ( from Awk Output ) and sort it in revered way – so that we get the most expensive queries first

head -200 : Gives me the top 200 queries

sed ‘s/, time//g’ : replaces the string “, time” from the line with a blank

sed ‘s/ ms//’ : replaces the string ” ms” ( which actually shows the time in mSec ) with a blank

Till here the output would be like this
Time1
Time2
.
.
.

Now we just need query1 and query2 and so on as out output – which can be logged to a file.
So we just use ‘cut’ – to remove the 1st column ( which is timing info ) and everything after that is included
– cut -d’ ‘ -f3-5000 : Now I just remove the 1st column and since my separator here is ‘ ‘ character and the query has a lot of spaces I just get everything till 5000th column ( hopefully my query will not have 5000 words / columns )

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s