Community Server

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Monday, 23 May 2011

Tutorial Aggregated Tables for Mondrian (Web)

Posted on 06:39 by Unknown
Tutorial Aggregated Tables for Mondrian
(Republished Tutorial - originally published July 2009)

For better readability please open this article as Google Document here.
  1. Requirements
  2. Know what you are up to
    1. How to enable MDX and SQL logging on BI Server 3
    2. How to configure Mondrian to recognize aggregated tables
    3. Table Names
    4. Column Names
  3. Your first aggregated tables
    1. Aggregated table for login channel
    2. Aggregated table for country
  4. Make use of aggregates in other programs

I've started using Pentaho BI Suite in early 2009. One of my main objectives was to create a multidimensional model for DSS (Decission Support Systems) and Analysis. As the BI Server already comes with Mondrian, it is fairly easy to start off.
I created my first cube definition with the Schema Workbench. The cube definition can be saved as an XML file. This XML file you can then export to your solution folder on the BI Server.
Soon we realized the we will need aggregated tables, as some MDX queries took rather long. I had a look at the Pentaho Aggregation Designer 1.1 and thought that this would be a rather easy task. 
One thing the Aggregation Designer does is to check if your Schema (the cube definition xml file) is valid, it also checks if your tables are all properly set up, which is indeed a very good functionalilty, as it will show you an errors and you can correct them then.
I used the Advisor Function within the aggregation designer and created all the recommended tables with the DDL and DML (which the Aggregation Designer creates automatically) and exported the new Cube Schema as well.
But the problem was, that everything didn't work. The problem was that I thought that the Aggregation Designer would do everything, but I was quite wrong about that (at the time of the writing Aggregation Designer 1.1 was available), or maybe, I was just doing something wrong. After some unsuccessful approaches I decided to not use the Aggregation Designer, but to create everything from scratch. I am not recommending here that you shouldn't use the Aggregation Designer, it is definitely worth having a look at it. I am just describing here the approach that worked for me. I am also assuming that you use the standard configuration of Mondrian.

Requirements

This article assumes that you are familiar with
  • MDX
  • MySQL (or SQL in general)
  • Pentaho BI Suite
  • Pentaho Schema Workbench
  • Pentaho Design Studio

Know what you are up to

While creating aggregated tables it makes sense to test if they are used for the purpose that you are creating them for. You might have a MDX query that takes quite long to execute and hence you will want to create a dedicated aggregated table therefore. Have a look at the mondrian_sql.log to see which SQL query Mondrian actually creates for your MDX. 
Your aggregated table will look quite similar to it. 
When you execute the MDX queries, always have a look at these logs in the beginning, you will learn at lot!

How to enable MDX and SQL logging on BI Server 3

Whenever an MDX query is executed, Mondrian transforms it into a SQL query. In some cases you want to see what the resulting SQL query is, for example to check if Mondrian is using aggregated tables.

Go to \tomcat\webapps\pentaho\WEB-INF\classes and change log4j.xml. Just uncomment the last half of the file where it says:
  
   
   <!-- ========================================================= -->
   <!-- Special Log File specifically for Mondrian                -->
   <!-- ========================================================= -->


   <appender name="MONDRIAN" class="org.apache.log4j.RollingFileAppender">
     <param name="File" value="mondrian.log"/>
     <param name="Append" value="false"/>
     <param name="MaxFileSize" value="500KB"/>
     <param name="MaxBackupIndex" value="1"/>

     <layout class="org.apache.log4j.PatternLayout">
       <param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>
     </layout>
   </appender>
   
   <category name="mondrian">
      <priority value="DEBUG"/>
      <appender-ref ref="MONDRIAN"/>
   </category> 
   


   <!-- ========================================================= -->
   <!-- Special Log File specifically for Mondrian MDX Statements -->
   <!-- ========================================================= -->

   <appender name="MDXLOG" class="org.apache.log4j.RollingFileAppender">
     <param name="File" value="mondrian_mdx.log"/>
     <param name="Append" value="false"/>
     <param name="MaxFileSize" value="500KB"/>
     <param name="MaxBackupIndex" value="1"/>
     <layout class="org.apache.log4j.PatternLayout">
       <param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>
     </layout>
   </appender>

   <category name="mondrian.mdx">
      <priority value="DEBUG"/>
      <appender-ref ref="MDXLOG"/>
   </category>


   <!-- ========================================================= -->
   <!-- Special Log File specifically for Mondrian SQL Statements -->
   <!-- ========================================================= -->


   <appender name="SQLLOG" class="org.apache.log4j.RollingFileAppender">
     <param name="File" value="mondrian_sql.log"/>
     <param name="Append" value="false"/>
     <param name="MaxFileSize" value="500KB"/>
     <param name="MaxBackupIndex" value="1"/>
     <layout class="org.apache.log4j.PatternLayout">
       <param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>
     </layout>
   </appender>

   <category name="mondrian.sql">
      <priority value="DEBUG"/>
      <appender-ref ref="SQLLOG"/>
   </category>

If you want the SQL to be displayed in a more readable form, add this to the mondrian.properties in pentaho-solutions\system\mondrian (BI Server 3):

mondrian.rolap.generate.formatted.sql=true

Restart the server, then start one xaction that uses Jpivot/MDX. In Tomcat/bin you will find a mondrian.log, mondrian_sql.log and a mondrian_mdx.log now. 

How to configure Mondrian to recognize aggregated tables

Well, you have to tell Mondrian as well that aggregated tabels exist. The first step is to add following lines to the Mondrian.properties, which you can find on pentaho-solutions\system\mondrian (BI Server 3):

mondrian.rolap.aggregates.Use=true
mondrian.rolap.aggregates.Read=true

Restart the server.
Furthermore, with the standard setup the table names have to follow a convention (I've copied here a bit from Julian Hyde's excellent documentation):

Table Names

As an example of applying the aggregate table name rule, let the fact table be called sales_fact_1997, the Sales cube's fact table from the FoodMart schema. Applying the specific fact table name to the regular expression template creates the following regular expression:

agg_.+_sales_fact_1997

This will match the following table names:
  • agg_l_05_sales_fact_1997
  • agg_c_14_sales_fact_1997
  • agg_lc_100_sales_fact_1997
  • agg_c_special_sales_fact_1997
  • AGG_45_SALES_FACT_1997
  • AGG_drop_time_id_sales_fact_1997


Column Names

  • There has to be one column called "fact_count", which stores values of a standard count.
  • If there are any ID columns in the aggregated table, they have to be of the same name as in the fact or dimensional table.
  • Level columns: 
    • ${usage_prefix}${level_column_name}: If you have a level that is used more than once, i.e. Year in a week hierarchy and a month hierarchy within the time dimension, then set a "usagePrefix" in the Schema so that they can be distinguished. I.E. usage prefix "TW_" for year in the week hierarchy (resutls in "TW_Year") and "TM_" for year in the month hierarchy (results in "TM_Year").
    • ${hierarchy_name}_${level_name}
    • ${hierarchy_name}_${level_column_name}
    • ${level_column_name}

If any of these parameters have space characters, ' ', these are mapped to underscore characters, '_', and, similarly, dot characters, '.', are also mapped to underscores. So, if the hierarchy_name is "Time", level_name is "Month" and level_column_name is month_of_year, the possible aggregate table column names are:

  • time_month

  • time_month_of_year

  • month_of_year

For this rule, the "hierarchy_name" and "level_name" are converted to lower case while the "level_column_name" must match exactly.

Make sure the column names are in small letters!

  •  There are three parameters to matching aggregate columns to measures: 


 1) ${measure_name}: the fact table's cube's measure name, "measure_name", 

 2) ${measure_column_name}: the fact table's cube's measure column name, "measure_column_name", and 

 3) ${measure_column_name}_${aggregate_name}: the fact table's cube's measure's aggregator (sum, avg, max, etc.), "aggregate_name".


where the measure name is converted to lower case and both the measure column name and aggregate name are matched as they appear. If the fact table's cube's measure name was, "Avg Unit Sales", the fact table's measure column name is "unit_sales", and, lastly, the fact table's cube's measure's aggregate name is "avg", then possible aggregate table column names that would match are:
  • avg_unit_sales

  • unit_sales

  • unit_sales_avg


In example: A measure called "Distinct Users" has to be referenced as "distinct_users".

Your first aggregated tables

We have following tables:

The fact table:

DROP TABLE IF EXISTS `pentaho`.`sn_login`;
CREATE TABLE  `pentaho`.`sn_login` (
  `msisdn` bigint(20) NOT NULL DEFAULT '0',
  `login_date_id` bigint(20) NOT NULL DEFAULT '0',
  `login_channel_id` bigint(20) NOT NULL DEFAULT '6' COMMENT 'Default Value Unknown',
  `sessions` int(255) NOT NULL DEFAULT '0',
  PRIMARY KEY (`msisdn`,`login_date_id`,`login_channel_id`),
  KEY `FK_sn_login_date_id` (`login_date_id`),
  KEY `FK_login_channel_id` (`login_channel_id`),
  KEY `login_msisdn` (`msisdn`),
  CONSTRAINT `FK_login_channel_id` FOREIGN KEY (`login_channel_id`) REFERENCES `sn_lt_communication_channels` (`communication_channel_id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `FK_login_msisdn` FOREIGN KEY (`msisdn`) REFERENCES `sn_user` (`msisdn`),
  CONSTRAINT `FK_sn_login_date_id` FOREIGN KEY (`login_date_id`) REFERENCES `sn_lt_time_dimension` (`time_dimension_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The time dimension table:

DROP TABLE IF EXISTS `pentaho`.`sn_lt_time_dimension`;
CREATE TABLE  `pentaho`.`sn_lt_time_dimension` (
  `the_year` int(11) DEFAULT NULL,
  `the_week_of_the_year` int(11) DEFAULT NULL,
  `the_month` int(11) DEFAULT NULL,
  `the_day_of_the_year` int(11) DEFAULT NULL,
  `the_day_of_the_week` int(11) DEFAULT NULL,
  `the_day_of_the_month` int(11) DEFAULT NULL,
  `the_month_name` varchar(9) DEFAULT NULL,
  `tenure_years` int(11) DEFAULT NULL,
  `tenure_days` int(11) DEFAULT NULL,
  `time_dimension_id` bigint(20) NOT NULL DEFAULT '0',
  `the_date` date DEFAULT NULL,
  `the_date_as_string` varchar(20) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `the_quarter` int(3) DEFAULT NULL,
  PRIMARY KEY (`time_dimension_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


The channel dimension table:

DROP TABLE IF EXISTS `pentaho`.`sn_lt_communication_channels`;
CREATE TABLE  `pentaho`.`sn_lt_communication_channels` (
  `communication_channel_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `channel_name` varchar(45) NOT NULL,
  `channel_system_name` varchar(45) NOT NULL,
  PRIMARY KEY (`communication_channel_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;

The user dimension table:

DROP TABLE IF EXISTS `pentaho`.`sn_user`;
CREATE TABLE  `pentaho`.`sn_user` (
  `msisdn` bigint(20) NOT NULL DEFAULT '0',
  `sex` char(1) DEFAULT NULL,
  `registration_date_id` bigint(20) NOT NULL DEFAULT '0',
  `country_id` bigint(20) NOT NULL DEFAULT '0',
  `birthday_date_id` bigint(20) NOT NULL,
  `registration_channel_id` bigint(20) NOT NULL,
  PRIMARY KEY (`msisdn`),
  KEY `msisdn` (`msisdn`),
  KEY `FK_birthday_date_id` (`birthday_date_id`),
  KEY `FK_registration_date_id` (`registration_date_id`),
  KEY `FK_country_id_2` (`country_id`),
  KEY `FK_registration_channel_id` (`registration_channel_id`),
  CONSTRAINT `FK_birthday_date_id` FOREIGN KEY (`birthday_date_id`) REFERENCES `sn_lt_time_dimension` (`time_dimension_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `FK_country_id_2` FOREIGN KEY (`country_id`) REFERENCES `sn_lt_countries` (`country_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `FK_registration_channel_id` FOREIGN KEY (`registration_channel_id`) REFERENCES `sn_lt_communication_channels` (`communication_channel_id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `FK_registration_date_id` FOREIGN KEY (`registration_date_id`) REFERENCES `sn_lt_time_dimension` (`time_dimension_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The country dimension table:

DROP TABLE IF EXISTS `pentaho`.`sn_lt_countries`;
CREATE TABLE  `pentaho`.`sn_lt_countries` (
  `country_id` bigint(20) NOT NULL DEFAULT '0',
  `country_name` tinytext,
  `country_number` double DEFAULT NULL,
  `iso_country_code` tinytext,
  PRIMARY KEY (`country_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


We have created an XML Schema (Cube Definition that looks like this):

<Schema name="Test">
<Dimension type="TimeDimension" usagePrefix="TM_" name="Time Monthly">
   <Hierarchy name="All Times Monthly" hasAll="true" allMemberName="All Times Monthly" allMemberCaption="All Times Monthly" primaryKey="time_dimension_id">
       <Table name="sn_lt_time_dimension">
       </Table>
       <Level name="Year" column="the_year" type="Integer" uniqueMembers="true" levelType="TimeYears" hideMemberIf="Never">
       </Level>
       <Level name="Quarter" column="the_quarter" type="Integer" uniqueMembers="false" levelType="TimeQuarters" hideMemberIf="Never">
       </Level>
       <Level name="Month" column="the_month" nameColumn="the_month_name" type="String" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never">
       </Level>
       <Level name="Day" column="the_day_of_the_month" type="Integer" uniqueMembers="false" levelType="TimeDays" hideMemberIf="Never">
       </Level>
   </Hierarchy>
</Dimension>
<Dimension type="StandardDimension" name="Country">
   <Hierarchy name="All Countries" hasAll="true" allMemberName="All Countries" allMemberCaption="All Countries" primaryKey="country_id">
       <Table name="sn_lt_countries">
       </Table>
       <Level name="Country" column="country_name" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
       </Level>
   </Hierarchy>
</Dimension>
<Dimension type="TimeDimension" usagePrefix="TW_" name="Time Weekly">
   <Hierarchy name="All Times Weekly" hasAll="true" allMemberName="All Times Weekly" allMemberCaption="All Times Weekly" primaryKey="time_dimension_id">
       <Table name="sn_lt_time_dimension">
       </Table>
       <Level name="Year" column="the_year" type="Integer" uniqueMembers="true" levelType="TimeYears" hideMemberIf="Never">
       </Level>
       <Level name="Week" column="the_week_of_the_year" type="Integer" uniqueMembers="false" levelType="TimeWeeks" hideMemberIf="Never">
       </Level>
       <Level name="Day" column="the_day_of_the_week" type="Numeric" uniqueMembers="false" levelType="TimeDays" hideMemberIf="Never">
       </Level>
   </Hierarchy>
</Dimension>
<Cube name="Users" cache="true" enabled="true">
   <Table name="sn_user">
   </Table>
   <Dimension type="StandardDimension" name="Gender">
       <Hierarchy name="All Genders" hasAll="true" allMemberName="All Genders" allMemberCaption="All Genders">
           <Level name="Gender" column="sex" nameColumn="sex" type="String" uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
           </Level>
       </Hierarchy>
   </Dimension>
   <Dimension type="StandardDimension" foreignKey="registration_channel_id" name="Registration Channel">
       <Hierarchy name="All Registration Channels" hasAll="true" allMemberName="All Registration Channels" allMemberCaption="All Registration Channels" primaryKey="communication_channel_id">
           <Table name="sn_lt_communication_channels">
           </Table>
           <Level name="Registration Channel" column="channel_name" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
           </Level>
       </Hierarchy>
   </Dimension>
   <Dimension type="StandardDimension" foreignKey="birthday_date_id" name="Age">
       <Hierarchy name="All Ages" hasAll="true" allMemberName="All Ages" allMemberCaption="All Ages" primaryKey="time_dimension_id">
           <Table name="sn_lt_time_dimension">
           </Table>
           <Level name="Age" column="age" type="Integer" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
           </Level>
       </Hierarchy>
   </Dimension>
   <DimensionUsage source="Time Monthly" name="Registration Time Monthly" caption="Registration Time Monthly" foreignKey="registration_date_id">
   </DimensionUsage>
   <DimensionUsage source="Country" name="Country" caption="Country" foreignKey="country_id">
   </DimensionUsage>
   <DimensionUsage source="Time Weekly" name="Registration Time Weekly" caption="Registration Time Weekly" foreignKey="registration_date_id">
   </DimensionUsage>
   <Measure name="New Registrations" column="msisdn" datatype="Numeric" aggregator="count" visible="true">
   </Measure>
</Cube>

</Schema>

We are going to pay attention to three dimensions:
  • Country: You can see that this dimension is derived from joining 3 tables: sn_login, sn_user and sn_countries. In fact, our Schema is of the type "snowflake schema". 
  • Login Channel: This is a simple dimension which is derived form just one join: sn_login and sn_lt_communication_channels.
  • Login Time: This dimension references a global dimension. Within a Cube Definition, a global dimension can be referenced within several cubes in the same Schema. This is very handy, as you don't have to recreate the same dimension over and over again.

Aggregated table for login channel

We basically want to see how many user login by week in each of the channels.

The MDX query therefore looks like:

WITH 
MEMBER [Measures].[Prior Period Distinct Logins] AS
(
[Login Time Weekly].CurrentMember.PrevMember,
[Measures].[Distinct Users]
)
MEMBER [Measures].[Change in Distinct Logins] AS
(
[Measures].[Distinct Users] - [Measures].[Prior Period Distinct Logins]
)
MEMBER [Measures].[Percent Change] AS
(
([Measures].[Change in Distinct Logins]) / 
([Measures].[Prior Period Distinct Logins])
)
 
, FORMAT_STRING = Iif(
([Measures].[Percent Change] < 0),
"|#.00%|style='red'|arrow='down'",
"|#.00%|style='green'|arrow='up'"
)
SELECT 
NON EMPTY { 

LastPeriods(
6,
[Login Time Weekly].[{week1_year}].[{week1_weeknr}]
)

} 
ON COLUMNS,
NON EMPTY Union (
Crossjoin (
{[Login Channel].[All Login Channels]},
{[Measures].[Distinct Users],
[Measures].[Percent Change]
}
),
Crossjoin (
{[Login Channel].[All Login Channels].Children},
{[Measures].[Distinct Users]
}
)
)
ON ROWS
from [Logins]

We create an Xaction with a JPivot viewer in the Design Studio, save it into one solution folder and execute it online. Now we open the mondrian_sql.log and we see that Mondrian create a lot of queries. Analyse these queries carefully.

Now, having looked at the queries in the mondrian_sql.log, I decided to create following aggregated tables:

-- channel / week -------------------------------

DROP TABLE IF EXISTS `agg_channel_week_sn_login`;

CREATE TABLE `agg_channel_week_sn_login` (
`login_channel_id` BIGINT(20),
`TW_the_year` INTEGER(11),
`the_week_of_the_year` INTEGER(11),
`distinct_users` INTEGER,
`sessions` DOUBLE,
`fact_count` INTEGER,
KEY `agg_channel_week_login_channel_id` (`login_channel_id`),
KEY `agg_channel_week_TW_the_year` (`TW_the_year`),
KEY `agg_channel_week_the_week_of_the_year` (`the_week_of_the_year`)    
);

INSERT INTO `agg_channel_week_sn_login` (
`login_channel_id`,
`TW_the_year`,
`the_week_of_the_year`,
`distinct_users`,
`sessions`,
`fact_count`)
select 
`sn_login`.`login_channel_id` as `login_channel_id`, 
`sn_lt_time_dimension_3`.`the_year` as `TW_the_year`, 
`sn_lt_time_dimension_3`.`the_week_of_the_year` as `the_week_of_the_year`, 
count(distinct `sn_login`.`msisdn`) as `distinct_users`, 
sum(`sn_login`.`sessions`) as `sessions`, 
count(*) as `fact_count`
from 
`sn_login` as `sn_login`, 
`sn_lt_time_dimension` as `sn_lt_time_dimension_3`
where 
`sn_login`.`login_date_id` = `sn_lt_time_dimension_3`.`time_dimension_id`
group by 1,2,3
;

And also one higher level one:

-- week ---------------------------------

DROP TABLE IF EXISTS `agg_week_sn_login`;

CREATE TABLE `agg_week_sn_login` (
`TW_the_year` INTEGER(11),
`the_week_of_the_year` INTEGER(11),
`distinct_users` INTEGER,
`sessions` DOUBLE,
`fact_count` INTEGER,
KEY `agg_week_TW_the_year` (`TW_the_year`),
KEY `agg_week_the_week_of_the_year` (`the_week_of_the_year`)    
);

INSERT INTO `agg_week_sn_login` (
`TW_the_year`,
`the_week_of_the_year`,
`distinct_users`,
`sessions`,
`fact_count`)
select 
`sn_lt_time_dimension_3`.`the_year` as `TW_the_year`, 
`sn_lt_time_dimension_3`.`the_week_of_the_year` as `the_week_of_the_year`, 
count(distinct `sn_login`.`msisdn`) as `distinct_users`, 
sum(`sn_login`.`sessions`) as `sessions`, 
count(*) as `fact_count`
from 
`sn_login` as `sn_login`, 
`sn_lt_time_dimension` as `sn_lt_time_dimension_3`
where 
`sn_login`.`login_date_id` = `sn_lt_time_dimension_3`.`time_dimension_id`
group by 1,2
;

Pay special attention on the naming:
  • The year column is named TW_the_year. Why the "TW_" prefix you wonder? This is because the year actually shows up twice in our cube schema: in the Time Week and in the Time Month dimension. We defined a usagePrefix in the cube schema in order to distinguish between these two occurrences. Here in the table definition you have to use this prefix, otherwise Mondrian is not able to distinguish between these two occurrences of the year level and will throw an error.
  • Every table has a column called fact_count, which is a simple count.
  • Most columns are in lower case, also have you realized how our Measure "Distinct Users" has changed into "distinct_users"?
  • All columns have the same name as they have in the dimension table (the year has obviously the prefix).
  • All the table names start with agg_*_yourtablename. This is very important!


Now we have to add following lines to the cube schema (the cube definition) within the table tags in order to tell Mondrian which tables to use etc [highlighted in blue]:

<Table name="sn_login">
       <AggName name="agg_channel_week_sn_login">
           <AggFactCount column="fact_count">
           </AggFactCount>
           <AggForeignKey factColumn="login_channel_id" aggColumn="login_channel_id"/>
           <AggMeasure column="distinct_users" name="[Measures].[Distinct Users]">
           </AggMeasure>
           <AggMeasure column="sessions" name="[Measures].[Sessions]">
           </AggMeasure>
           <AggLevel column="TW_the_year" name="[Login Time Weekly.All Times Weekly].[Year]">
           </AggLevel>
           <AggLevel column="the_week_of_the_year" name="[Login Time Weekly.All Times Weekly].[Week]">
           </AggLevel>
       </AggName>
       <AggName name="agg_week_sn_login">
           <AggFactCount column="fact_count">
           </AggFactCount>
           <AggMeasure column="distinct_users" name="[Measures].[Distinct Users]">
           </AggMeasure>
           <AggMeasure column="sessions" name="[Measures].[Sessions]">
           </AggMeasure>
           <AggLevel column="TW_the_year" name="[Login Time Weekly.All Times Weekly].[Year]">
           </AggLevel>
           <AggLevel column="the_week_of_the_year" name="[Login Time Weekly.All Times Weekly].[Week]">
           </AggLevel>
       </AggName>
</Table>
  
As you can from this schema we are matching the column names of the aggregated table with the measures and dimensions that we defined in the schema itself. This way Mondrian can identify your columns and link it to a dimension/measure.

An important thing to notice here is that we actually don't have a level login_channel_id in the XML Schema (cube definition), but we can just use the AggForeignKey tag to reference it. Within the AggForeignKey you specify the name of the column in the fact table and the name of the same column in the aggregated table.

Now save everything, export everything to your solution folder on the BI Server, go back to the Pentaho website, refresh the repository, clear the Mondrian cash and try to run the Xaction. In case you get an error message, check the Mondrian log files to figure out what went wrong. It is very easy to make mistakes, especially the spelling. 
Once you succeed, you will see in the mondrian_sql.log that Mondrian is in fact using your aggregated tables (and certainly you will realize that the execution is way faster).

Aggregated table for country

We have following MDX query with gives us the Top3 countries by week:

SELECT 
[Measures].[Distinct Users]
ON COLUMNS,
Generate (

{LastPeriods(
6,
[Login Time Weekly].[{week1_year}].[{week1_weeknr}]
)}
,
{[Login Time Weekly].CurrentMember}*
TopCount(
[Country.All ISO Country Codes].Children,
3,

[Measures].[Distinct Users]

),
ALL
)
ON ROWS
FROM
[Logins]

Again, we create an Xaction, execute it on the website, and check the mondrian_sql.log. Then we decide to create following tables:

-- country / week -------------------------------

DROP TABLE IF EXISTS `agg_country_week_sn_login`;

CREATE TABLE `agg_country_week_sn_login` (
`iso_country_code` VARCHAR(100),
`TW_the_year` INTEGER(11),
`the_week_of_the_year` INTEGER(11),
`distinct_users` INTEGER,
`sessions` DOUBLE,
`fact_count` INTEGER,
KEY `agg_country_week_iso_country_code` (`iso_country_code`),
KEY `agg_country_week_TW_the_year` (`TW_the_year`),
KEY `agg_country_week_the_week_of_the_year` (`the_week_of_the_year`)    
);
 
 INSERT INTO `agg_country_week_sn_login` (
`iso_country_code`,
`TW_the_year`,
`the_week_of_the_year`,
`distinct_users`,
`sessions`,
`fact_count`)
select 
`sn_lt_countries`.`iso_country_code` as `iso_country_code`,
`sn_lt_time_dimension_3`.`the_year` as `TW_the_year`, 
`sn_lt_time_dimension_3`.`the_week_of_the_year` as `the_week_of_the_year`, 
count(distinct `sn_login`.`msisdn`) as `distinct_users`, 
sum(`sn_login`.`sessions`) as `sessions`, 
count(*) as `fact_count`
from 
`sn_login` as `sn_login`,
`sn_user` as `sn_user`,
`sn_lt_time_dimension` as `sn_lt_time_dimension_3`,
`sn_lt_countries` as `sn_lt_countries`
where
`sn_login`.`msisdn`=`sn_user`.`msisdn` AND
`sn_login`.`login_date_id` = `sn_lt_time_dimension_3`.`time_dimension_id` AND
`sn_lt_countries`.`country_id` = `sn_user`.`country_id`
group by 1,2,3
;

As you can see this is a bit more complex, as we have to join in fact 4 tables, and the iso_country_code is derived from joining 3 tables.

We add following part to the cube definition (xml schema file) [highlighted in blue]:

<Table name="sn_login">
       <AggName name="agg_channel_week_sn_login">
           <AggFactCount column="fact_count">
           </AggFactCount>
           <AggForeignKey factColumn="login_channel_id" aggColumn="login_channel_id"/>
           <AggMeasure column="distinct_users" name="[Measures].[Distinct Users]">
           </AggMeasure>
           <AggMeasure column="sessions" name="[Measures].[Sessions]">
           </AggMeasure>
           <AggLevel column="TW_the_year" name="[Login Time Weekly.All Times Weekly].[Year]">
           </AggLevel>
           <AggLevel column="the_week_of_the_year" name="[Login Time Weekly.All Times Weekly].[Week]">
           </AggLevel>
       </AggName>
       <AggName name="agg_week_sn_login">
           <AggFactCount column="fact_count">
           </AggFactCount>
           <AggMeasure column="distinct_users" name="[Measures].[Distinct Users]">
           </AggMeasure>
           <AggMeasure column="sessions" name="[Measures].[Sessions]">
           </AggMeasure>
           <AggLevel column="TW_the_year" name="[Login Time Weekly.All Times Weekly].[Year]">
           </AggLevel>
           <AggLevel column="the_week_of_the_year" name="[Login Time Weekly.All Times Weekly].[Week]">
           </AggLevel>
       </AggName>

            <AggName name="agg_country_week_sn_login">
           <AggFactCount column="fact_count">
           </AggFactCount>
           <AggMeasure column="distinct_users" name="[Measures].[Distinct Users]">
           </AggMeasure>
           <AggMeasure column="sessions" name="[Measures].[Sessions]">
           </AggMeasure>
           <AggLevel column="TW_the_year" name="[Login Time Weekly.All Times Weekly].[Year]">
           </AggLevel>
           <AggLevel column="the_week_of_the_year" name="[Login Time Weekly.All Times Weekly].[Week]">
           </AggLevel>
           <AggLevel column="iso_country_code" name="[Country.All ISO Country Codes].[ISO Country Code]">
           </AggLevel>
       </AggName>

 </Table>

You see that in this case we don't make use of an ID, as the country ID is not directly stored in our fact table. We have again collapsed the time dimension to the week level. 
We save everything, export it to the BI Server and run our Xaction. If we get an error, we check the log, if not, we are pleased that we finally got everything running!

Make use of aggregates in other programs

Other programs or packages usually don't take advantage of Mondrian aggregated tables by default. Usually they provide a mondrian.properties file to configure the exact settings. Just open the properties file and enable following properties:

# Allow the use of aggregates
mondrian.rolap.aggregates.Use=true
mondrian.rolap.aggregates.Read=true

Please find below a reference for some of the programs so that you can easily find the mondrian.properties file:
  • Pentaho Data Integration (Kettle): data-integration/libext/mondrian/config/mondrian.properties
  • Pentaho Report Designer: report-designer/resources/mondrian.properties
  • Schema Workbench: schema-workbench/mondrian.properties

This article is just intended as a brief introduction. In no way I claim that this is the best way to create aggregated tables, this should only give you the idea on how it works and you will have to find the best way for your solution. I have found no tuturial on the web on how to do this and it cost me a lot of time to figure out how to get everything working, so I hope that with the help of this article you won't go through the same troubles. Again, creating everything manually is not easy, always check you spellings etc! Also read the excellent documentation by Julian Hyde which you can find here.

Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Pentaho Kettle Parameters and Variables: Tips and Tricks
    Pentaho Kettle Parameters and Variables: Tips and Tricks This blog post is not intended to be a formal introduction to using parameters and ...
  • Using Parameters in Pentaho Report Designer
    Using Parameters in Pentaho Report Designer Introduction How to define a parameter Additional info about the new parameter dialog Hidden Par...
  • Pentaho Data Integration: Scheduling and command line arguments
    Pentaho Data Integration (Kettle): Command line arguments and scheduling Tutorial Details Software: PDI/Kettle 4.1 (download here ), MySQL S...
  • Jaspersoft iReport: How to pass a parameter to a sub-dataset
    Jaspersoft iReport: How to pass a parameter to a sub-dataset Let’s say our main report is grouped by continent and the details band holds sa...
  • Using regular expressions with Pentah...
    Using regular expressions with Pentaho Data Integration (Kettle) There are quite some transformations steps that allow you to work with regu...
  • Pentaho Data Integration and Infinidb Series: Bulk Upload
    Pentaho Data Integration and InfiniDB Series: Bulk Upload Introduction Prepare Tables Using mainly Kettle steps Check if file exists Setup I...
  • Pentaho Data Integration: Remote execution with Carte
    Pentaho Data Integration: Remote execution with Carte Tutorial Details Software: PDI/Kettle 4.1 (download  here ), installed on your PC and ...
  • How to create a loop in Pentaho Kettle
    I finished my first ever video tutorial! This video will demonstrate you how easy it is to create a loop in Pentaho Kettle. Enjoy!
  • Metadata Driven ETL and Reporting
    Metadata Driven ETL and Reporting with Pentaho Data Integration and Report Designer Tutorial Details  Software : If PDI Kettle 4.2 GA and PR...
  • Pentaho Data Integration: Supplying Variables to Shell Scripts
    Pentaho Data Integration (Kettle): Supplying Kettle Variables to Shell Scripts Tutorial Details Software: PDI/Kettle 4.1 (download here ) Kn...

Categories

  • "Bulk Loader"
  • "Bulk Loading"
  • "Hadoop"
  • "Kettle"
  • "Pentaho Book"
  • "Pentaho Data Integration"
  • "Pentaho Kettle"
  • "Pentaho Report Designer MDX MQL JDBC Parameters How To"
  • "Pentaho Report Designer MDX Parameters"
  • "Pentaho Report Designer MQL Parameters"
  • "Pentaho Report Designer Parmaters"
  • "Pentaho Report Designer"
  • "Pentaho Reporting 3.5 for Java Developers"
  • "Pentaho Reporting Book"
  • "Routing"
  • "Schema Workbench"
  • "Testing"
  • "Unicode"
  • "Unit testing"
  • "UTF8"
  • Agile development
  • automated testing
  • Big Data
  • Book Review
  • C-Tools
  • CBF
  • Clustered transformation
  • Command Line Arguments
  • Community Build Framework
  • D3JS
  • Dashboarding
  • Data Integration
  • Data Warehouse
  • Database Change Management
  • Database Version Control
  • Date Dimension
  • DBFit
  • ETL
  • ETLFit
  • Federated database
  • Google Charts
  • Google Visualization API
  • Hadoop
  • HTML5
  • iReport
  • JasperReports
  • JasperSoft
  • JasperStudio
  • Kettle
  • Kimball
  • Loop
  • Master data management
  • Metadata
  • Metedata editor
  • Mondrian
  • multidimensional modeling
  • OLAP
  • Open Source
  • Parameter
  • Parameters
  • Pentaho
  • Pentaho BI Server
  • Pentaho Data Integration
  • Pentaho Data Integration 4 Cookbook
  • Pentaho Kettle
  • Pentaho Metadata Editor Tutorial
  • Pentaho Report Designer
  • PostgreSQL
  • PRD
  • Report Layout
  • REST
  • Routing
  • Saiku
  • Scheduling
  • Slowly Changing Dimension
  • Sqitch
  • SVG
  • Talend
  • Talend MDM
  • Talend Open Studio
  • Tutorial
  • Variable
  • Web service
  • Xactions

Blog Archive

  • ►  2013 (24)
    • ►  December (2)
    • ►  November (3)
    • ►  October (2)
    • ►  September (1)
    • ►  August (3)
    • ►  July (2)
    • ►  June (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (3)
    • ►  February (1)
    • ►  January (3)
  • ►  2012 (20)
    • ►  November (3)
    • ►  October (3)
    • ►  August (1)
    • ►  June (1)
    • ►  April (1)
    • ►  March (3)
    • ►  February (5)
    • ►  January (3)
  • ▼  2011 (19)
    • ►  November (3)
    • ►  July (2)
    • ►  June (1)
    • ▼  May (4)
      • Pentaho Report Designer: Creating a Bar-Line chart...
      • Kettle: Sourcing data from Hadoop Hive
      • Tutorial Aggregated Tables for Mondrian (Web)
      • Pentaho Data Integration: Supplying Variables to S...
    • ►  April (2)
    • ►  March (1)
    • ►  February (3)
    • ►  January (3)
  • ►  2010 (17)
    • ►  December (1)
    • ►  November (6)
    • ►  September (1)
    • ►  August (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (3)
    • ►  February (1)
    • ►  January (1)
  • ►  2009 (18)
    • ►  December (3)
    • ►  November (1)
    • ►  October (5)
    • ►  September (7)
    • ►  July (2)
Powered by Blogger.

About Me

Unknown
View my complete profile