Community Server

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Monday, 20 February 2012

Talend Open Studio: Populating a date dimension

Posted on 14:04 by Unknown

Populating a date dimension

Date dimensions are an essential part of a data warehouse. Usually they are only populated once. Scripts can be created on the database side (as outlined here), but if you are working on various projects involving a variety of databases, it is more efficient to create just one ETL job which can be used to populate any database.

In this tutorial we will have a look at creating such an ETL job with Talend Open Studio for Data Integration. We will create a basic date dimension which you can then extend even further. In order to follow this tutorial, the reader should be familiar with the basic functionality of Talend Open Studio.

Our date dimension will look like this one (partial screenshot):


The primary key of the dimension will be an integer representation of the date, which saves us the hassle of looking up the key when we transform the fact data.

Open Talend Open Studio for Data Integration and create a new job called populate_date_dimension. First we will define a variable called start date, because we will use this job in various projects and we might require a different start date each time:

Click on the Context tab and then on the + button to add a new context variable. Give it the name myStartDate of type Date and define a value for it.


Next add a tRowGenerator component to the design area and double click on it to activate the settings dialog. The idea is to create X amount of rows: The first row will hold our start date and each subsequent row will increment the date by one day.


  1. Click the + button to add a new column. Name it date and set the type to Date.
  2. Click in the Environment variables cell on the right hand side and then you will see the parameters displayed in the Function parameters tab on the bottom left hand side.
  3. Define the number of rows that should be generated in Number of Rows for RowGenerator.
  4. In the Function parameters tab set the date parameter value to context.myStartDate. This will ensure that the context variable which we defined earlier will be used.
  5. Set the nb parameter to Numeric.sequence(“s1”, 1, 1) - 1. Use the expression builder for a more convenient setup. This will create a sequence which we will use to add days to our start date. The reason why we subtract 1 at the end is because we want to keep our start date.
  6. Set the dateType parameter value to “dd”. This ensures that days will be added to our date.
  7. Click on the Preview tab and check if the result set looks as expected.
  8. Click Ok to close the component settings.


Now add a tMap component and create a row from the tRowGenerator to the tMap component. Double click the tMap component:




  1. Click the + button on the right hand side to create a new output table.
  2. Add new columns to the output table and for each of them define a specific date format using this approach: Integer.parseInt(TalendDate.formatDate("yyyyMMdd",row1.date)) for integer values and TalendDate.formatDate("MM",row1.date) for string values. Have a look at the Java SimpleDateFormat specs to get an understanding of all the formatting options. You will spend now some time setting all the various date formats up.
  3. Java SimpleDateFormat doesn’t provide a quarter format, hence we have to create our own in the form of a ceiled devision / covered quotient: (Integer.parseInt(TalendDate.formatDate("M",row1.date))+3-1) /  3   
  4. Click Ok.


Add a database output component of your choice (in my case I used one for PostgreSQL) and create a row from the tMap to the database output component. Double click the database output component and provide all the necessary settings. That’s it: Now you can run the job and examine the data in your table.

Email ThisBlogThis!Share to XShare to Facebook
Posted in Data Warehouse, Date Dimension, ETL, Kimball, Talend Open Studio | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Pentaho Kettle Parameters and Variables: Tips and Tricks
    Pentaho Kettle Parameters and Variables: Tips and Tricks This blog post is not intended to be a formal introduction to using parameters and ...
  • Using Parameters in Pentaho Report Designer
    Using Parameters in Pentaho Report Designer Introduction How to define a parameter Additional info about the new parameter dialog Hidden Par...
  • Pentaho Data Integration: Scheduling and command line arguments
    Pentaho Data Integration (Kettle): Command line arguments and scheduling Tutorial Details Software: PDI/Kettle 4.1 (download here ), MySQL S...
  • Jaspersoft iReport: How to pass a parameter to a sub-dataset
    Jaspersoft iReport: How to pass a parameter to a sub-dataset Let’s say our main report is grouped by continent and the details band holds sa...
  • Using regular expressions with Pentah...
    Using regular expressions with Pentaho Data Integration (Kettle) There are quite some transformations steps that allow you to work with regu...
  • Pentaho Data Integration and Infinidb Series: Bulk Upload
    Pentaho Data Integration and InfiniDB Series: Bulk Upload Introduction Prepare Tables Using mainly Kettle steps Check if file exists Setup I...
  • Pentaho Data Integration: Remote execution with Carte
    Pentaho Data Integration: Remote execution with Carte Tutorial Details Software: PDI/Kettle 4.1 (download  here ), installed on your PC and ...
  • How to create a loop in Pentaho Kettle
    I finished my first ever video tutorial! This video will demonstrate you how easy it is to create a loop in Pentaho Kettle. Enjoy!
  • Metadata Driven ETL and Reporting
    Metadata Driven ETL and Reporting with Pentaho Data Integration and Report Designer Tutorial Details  Software : If PDI Kettle 4.2 GA and PR...
  • Understanding the Pentaho Kettle Dimension Insert/Update Step Null Value Behaviour
    We will be using a very simple sample transformation to test the null value behaviour: We use the Data Grid step to provide some sample dat...

Categories

  • "Bulk Loader"
  • "Bulk Loading"
  • "Hadoop"
  • "Kettle"
  • "Pentaho Book"
  • "Pentaho Data Integration"
  • "Pentaho Kettle"
  • "Pentaho Report Designer MDX MQL JDBC Parameters How To"
  • "Pentaho Report Designer MDX Parameters"
  • "Pentaho Report Designer MQL Parameters"
  • "Pentaho Report Designer Parmaters"
  • "Pentaho Report Designer"
  • "Pentaho Reporting 3.5 for Java Developers"
  • "Pentaho Reporting Book"
  • "Routing"
  • "Schema Workbench"
  • "Testing"
  • "Unicode"
  • "Unit testing"
  • "UTF8"
  • Agile development
  • automated testing
  • Big Data
  • Book Review
  • C-Tools
  • CBF
  • Clustered transformation
  • Command Line Arguments
  • Community Build Framework
  • D3JS
  • Dashboarding
  • Data Integration
  • Data Warehouse
  • Database Change Management
  • Database Version Control
  • Date Dimension
  • DBFit
  • ETL
  • ETLFit
  • Federated database
  • Google Charts
  • Google Visualization API
  • Hadoop
  • HTML5
  • iReport
  • JasperReports
  • JasperSoft
  • JasperStudio
  • Kettle
  • Kimball
  • Loop
  • Master data management
  • Metadata
  • Metedata editor
  • Mondrian
  • multidimensional modeling
  • OLAP
  • Open Source
  • Parameter
  • Parameters
  • Pentaho
  • Pentaho BI Server
  • Pentaho Data Integration
  • Pentaho Data Integration 4 Cookbook
  • Pentaho Kettle
  • Pentaho Metadata Editor Tutorial
  • Pentaho Report Designer
  • PostgreSQL
  • PRD
  • Report Layout
  • REST
  • Routing
  • Saiku
  • Scheduling
  • Slowly Changing Dimension
  • Sqitch
  • SVG
  • Talend
  • Talend MDM
  • Talend Open Studio
  • Tutorial
  • Variable
  • Web service
  • Xactions

Blog Archive

  • ►  2013 (24)
    • ►  December (2)
    • ►  November (3)
    • ►  October (2)
    • ►  September (1)
    • ►  August (3)
    • ►  July (2)
    • ►  June (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (3)
    • ►  February (1)
    • ►  January (3)
  • ▼  2012 (20)
    • ►  November (3)
    • ►  October (3)
    • ►  August (1)
    • ►  June (1)
    • ►  April (1)
    • ►  March (3)
    • ▼  February (5)
      • Talend Open Studio: How to set up context variables
      • Talend Open Studio: Populating a date dimension
      • Talend Open Studio: Scheduling and command line ex...
      • Talend: Setting up database logging for a project
      • PostgreSQL: Auto generating a sample dataset
    • ►  January (3)
  • ►  2011 (19)
    • ►  November (3)
    • ►  July (2)
    • ►  June (1)
    • ►  May (4)
    • ►  April (2)
    • ►  March (1)
    • ►  February (3)
    • ►  January (3)
  • ►  2010 (17)
    • ►  December (1)
    • ►  November (6)
    • ►  September (1)
    • ►  August (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (3)
    • ►  February (1)
    • ►  January (1)
  • ►  2009 (18)
    • ►  December (3)
    • ►  November (1)
    • ►  October (5)
    • ►  September (7)
    • ►  July (2)
Powered by Blogger.

About Me

Unknown
View my complete profile