Community Server

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 4 February 2011

Pentaho Metadata Editor: Joining two fact tables

Posted on 14:14 by Unknown
Pentaho Metadata Model: Joining two fact tables

This tutorial was possible with the help of Will Gorman, Vice President of Engineering at Pentaho

The data
Metadata Model
My Suggestions



Tutorial Details

  • Software: PME 3.7 (download here), a database of your liking, in example MySQL
  • Knowledge: Intermediate (To follow this tutorial you should have good knowledge of the software and hence not every single step will be described)
  • Tutorial files can be downloaded here


The Pentaho Metadata layer is amazing: It allows end users to build reports without actually writing a line of SQL. Database and/or table changes can be easily accommodated by only changing the references in the metadata model and hence reports, dashboards, etc stay unaffected. 

While it is fairly easy to set up a metadata model with one fact table (and some dimensions), using more than one fact table (which also might have different granularity) posses a challenge for many users. Is it really possible to do this in PME (Pentaho Metadata Editor)? The answer is yes, although in the current version I don't consider it to be an ideal solution and I hope that in future it will be possible to visually design this in the relationship diagram. 

In this session I'd like to discuss the approach of integrating more than one fact table into your metadata model. 

The data


We will base our example on this simple schema:

CREATE SCHEMA 
metadatatest
;

CREATE TABLE
metadatatest.fact_sales
(
date_id INT(255),
location VARCHAR(70),
sales INT(255)
)
;

INSERT INTO 
metadatatest.fact_sales 
VALUES
(1,"London", 34),(1,"Manchester", 44),(2,"London",24),(2,"Manchester",35),(2,"Bristol",34),(3,"London",53)
;

CREATE TABLE
metadatatest.fact_stock
(
date_id INT(255),
location VARCHAR(70),
aisle VARCHAR(70),
stock INT(255)
)
;

INSERT INTO 
metadatatest.fact_stock
VALUES
(1,"London","aisle 1", 34),(1,"London","aisle 2", 44),(1,"London","aisle 3", 234),(1,"Manchester","aisle 1", 44),(1,"Manchester","aisle 2", 344),(2,"London","aisle 1",24),(2,"London","aisle 2",25),(2,"Manchester","aisle 1",35),(2,"Bristol","aisle 1",34),(2,"Bristol","aisle 2",324),(3,"London","aisle 1",53)
;

CREATE TABLE
metadatatest.dim_date
(
date_id INT(255),
date DATE
)
;

INSERT INTO
metadatatest.dim_date
VALUES
(1,"2011-01-01"),(2,"2011-01-02"),(3,"2011-01-03")
;

What we want to achieve is basically getting the same result out of PME as with following summary query:

SELECT
*
FROM
(
SELECT
date,
SUM(sales) AS sales
FROM
metadatatest.fact_sales s
INNER JOIN
metadatatest.dim_date d
ON
s.date_id=d.date_id
GROUP BY 1
) a
INNER JOIN
(
SELECT
date,
SUM(stock) AS stock
FROM
metadatatest.fact_stock s
INNER JOIN
metadatatest.dim_date d
ON
s.date_id=d.date_id
GROUP BY 1
) b
ON
a.date=b.date
;

Open your favourite SQL client and run these queries.

The aggregation query gives back following result:

Metadata Model

So let's start creating our Metadata model: One thing to keep in mind is that our two fact tables have different granularity. In PME you cannot just import them as physical tables, then place them both in a business model and create a relationship between them: Such a model will not return correct aggregations.

The critical element is: You basically have to specify the "physical tables" as join of your star schemas. What I mean by this is that you write a query that joins your fact table with all the dimension tables. This query returns a result set which can act as a "physical table" in PME (although the wording "physical table" doesn't make sense in the context, it is more a derived table or better said a view). 

In our case we will have two derived tables. These two derived tables have to have the same granularity:

  • Sales Table: a join of the fact_sales table with the date dimension
  • Stock Table: a jon of the fact_stock table with the date dimension, aggregated by date and location
  

  1. Open PME and create a new domain.
  2. Set up your database connection to the just created schema.
  3. Reference our two fact tables.
  4. Double click on the fact_sales table. The properties window will be shown.
  5. Click on the green plus icon in the Properties section and add the Target Table property. Click OK.
  6. Fill out the Target Table field with this query:
     
    SELECT date date_1, location location_1, SUM(sales) AS sales FROM metadatatest.fact_sales s INNER JOIN metadatatest.dim_date d ON s.date_id=d.date_id GROUP BY 1, 2).
    Note we gave the date and location columns intentionally different names so that they are later on easier to distinguish from the other fact table's fields.
  7. Click OK to close the Properties window.
  8. Double click on the fact_stock table. The properties dialog will be shown.
  9. Click on the green plus icon and add the Target Table property. Click OK.
  10. Fill out the Target Table field with this query: Fill out the Target Table field with this query:
    SELECT date, location, SUM(stock) AS stock FROM metadatatest.fact_stock s INNER JOIN metadatatest.dim_date d ON s.date_id=d.date_id GROUP BY 1, 2
  11. Click OK to close the Properties window.
  12. Create a business model and drag and drop the physical tables into the Business Tables folder. Your Tables should now look like this (as you can see they have same granularity now):

  13. Next create the relationship: Drag and drop the business tables onto the canvas, hold down the CTRL key, mark the first table and then the second one and right click. Choose Add Relationship ...
  14. Set Relationship to N:N and Join type to Inner. 
  15. Tick Complex Join? and copy following lines into the Complex Join Expression: 
    AND([BT_FACT_SALES_FACT_SALES.BC_FACT_SALES_DATE_1]=[BT_FACT_STOCK_FACT_STOCK.BC_FACT_STOCK_DATE]; [BT_FACT_SALES_FACT_SALES.BC_FACT_SALES_LOCATION_1]=[BT_FACT_STOCK_FACT_STOCK.BC_FACT_STOCK_LOCATION])


  16. Click OK.
  17. Save the model. Now we are ready to test if it is working properly.
  18. Click on the Query Editor icon. Select Date, Sales and Stock as columns, then press the Play button. Examine the result: It returns exactly the same values as the SQL query we ran at the beginning of the tutorial. If you are curious you can as well click on the SQL icon and you can see which SQL query PME created under the hood.


My Suggestions

It is important to provide feedback to the developers of such great products as the PME. I can see in my daily work what a big impact this product makes. I can see end users who can finally create reports to their liking without running to a report designer and waiting for the result to be delivered. When you create a lot of reports you don't have to worry if a database change results in having to change all the references in all your reports. From my point of view a metadata model is a key element in a BI system. 

In the beginning of this article I already mentioned that I don't think the current way of setting up this solution is ideal. The reason for this is that it breaks the concept. Normally you would define the relationships between tables in a business model and the SQL would be automatically created. When dealing with more than one fact table (with different granularity), you have to use a completely different approach and write a SQL query yourself by joining your fact table all dimension tables and only then you can create the relationship diagram.
It's not a problem of "Oh I have to write a query", but it is more a problem of consistent approach. I'd really welcome if we could design the complete model in a relationship diagram (consistant approach) instead of hacking the shortcomings by providing SQL queries. I think the user experience would be much better.
I set up a JIRA case on the Pentaho website to, please show your support by voting for it and we might see this feature implemented soon!
At the end of the session I'd like the thank Will for his great support!
Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Pentaho Kettle Parameters and Variables: Tips and Tricks
    Pentaho Kettle Parameters and Variables: Tips and Tricks This blog post is not intended to be a formal introduction to using parameters and ...
  • Using Parameters in Pentaho Report Designer
    Using Parameters in Pentaho Report Designer Introduction How to define a parameter Additional info about the new parameter dialog Hidden Par...
  • Pentaho Data Integration: Scheduling and command line arguments
    Pentaho Data Integration (Kettle): Command line arguments and scheduling Tutorial Details Software: PDI/Kettle 4.1 (download here ), MySQL S...
  • Jaspersoft iReport: How to pass a parameter to a sub-dataset
    Jaspersoft iReport: How to pass a parameter to a sub-dataset Let’s say our main report is grouped by continent and the details band holds sa...
  • Using regular expressions with Pentah...
    Using regular expressions with Pentaho Data Integration (Kettle) There are quite some transformations steps that allow you to work with regu...
  • Pentaho Data Integration and Infinidb Series: Bulk Upload
    Pentaho Data Integration and InfiniDB Series: Bulk Upload Introduction Prepare Tables Using mainly Kettle steps Check if file exists Setup I...
  • Pentaho Data Integration: Remote execution with Carte
    Pentaho Data Integration: Remote execution with Carte Tutorial Details Software: PDI/Kettle 4.1 (download  here ), installed on your PC and ...
  • How to create a loop in Pentaho Kettle
    I finished my first ever video tutorial! This video will demonstrate you how easy it is to create a loop in Pentaho Kettle. Enjoy!
  • Understanding the Pentaho Kettle Dimension Insert/Update Step Null Value Behaviour
    We will be using a very simple sample transformation to test the null value behaviour: We use the Data Grid step to provide some sample dat...
  • Pentaho Data Integration: Supplying Variables to Shell Scripts
    Pentaho Data Integration (Kettle): Supplying Kettle Variables to Shell Scripts Tutorial Details Software: PDI/Kettle 4.1 (download here ) Kn...

Categories

  • "Bulk Loader"
  • "Bulk Loading"
  • "Hadoop"
  • "Kettle"
  • "Pentaho Book"
  • "Pentaho Data Integration"
  • "Pentaho Kettle"
  • "Pentaho Report Designer MDX MQL JDBC Parameters How To"
  • "Pentaho Report Designer MDX Parameters"
  • "Pentaho Report Designer MQL Parameters"
  • "Pentaho Report Designer Parmaters"
  • "Pentaho Report Designer"
  • "Pentaho Reporting 3.5 for Java Developers"
  • "Pentaho Reporting Book"
  • "Routing"
  • "Schema Workbench"
  • "Testing"
  • "Unicode"
  • "Unit testing"
  • "UTF8"
  • Agile development
  • automated testing
  • Big Data
  • Book Review
  • C-Tools
  • CBF
  • Clustered transformation
  • Command Line Arguments
  • Community Build Framework
  • D3JS
  • Dashboarding
  • Data Integration
  • Data Warehouse
  • Database Change Management
  • Database Version Control
  • Date Dimension
  • DBFit
  • ETL
  • ETLFit
  • Federated database
  • Google Charts
  • Google Visualization API
  • Hadoop
  • HTML5
  • iReport
  • JasperReports
  • JasperSoft
  • JasperStudio
  • Kettle
  • Kimball
  • Loop
  • Master data management
  • Metadata
  • Metedata editor
  • Mondrian
  • multidimensional modeling
  • OLAP
  • Open Source
  • Parameter
  • Parameters
  • Pentaho
  • Pentaho BI Server
  • Pentaho Data Integration
  • Pentaho Data Integration 4 Cookbook
  • Pentaho Kettle
  • Pentaho Metadata Editor Tutorial
  • Pentaho Report Designer
  • PostgreSQL
  • PRD
  • Report Layout
  • REST
  • Routing
  • Saiku
  • Scheduling
  • Slowly Changing Dimension
  • Sqitch
  • SVG
  • Talend
  • Talend MDM
  • Talend Open Studio
  • Tutorial
  • Variable
  • Web service
  • Xactions

Blog Archive

  • ►  2013 (24)
    • ►  December (2)
    • ►  November (3)
    • ►  October (2)
    • ►  September (1)
    • ►  August (3)
    • ►  July (2)
    • ►  June (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (3)
    • ►  February (1)
    • ►  January (3)
  • ►  2012 (20)
    • ►  November (3)
    • ►  October (3)
    • ►  August (1)
    • ►  June (1)
    • ►  April (1)
    • ►  March (3)
    • ►  February (5)
    • ►  January (3)
  • ▼  2011 (19)
    • ►  November (3)
    • ►  July (2)
    • ►  June (1)
    • ►  May (4)
    • ►  April (2)
    • ►  March (1)
    • ▼  February (3)
      • Pentaho Data Integration: Designing a highly avail...
      • Pentaho Data Integration: Best practice solutions ...
      • Pentaho Metadata Editor: Joining two fact tables
    • ►  January (3)
  • ►  2010 (17)
    • ►  December (1)
    • ►  November (6)
    • ►  September (1)
    • ►  August (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (3)
    • ►  February (1)
    • ►  January (1)
  • ►  2009 (18)
    • ►  December (3)
    • ►  November (1)
    • ►  October (5)
    • ►  September (7)
    • ►  July (2)
Powered by Blogger.

About Me

Unknown
View my complete profile