Community Server

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 13 June 2010

Posted on 04:45 by Unknown
PDI: Clone your data in Kettle! (Video Tutorial)

In this tutorial we look at the cloning feature of Pentaho Data Integration/Kettle and use the power of the JavaScript step. Please watch the video here:


Part 1:



Part 2:

Part 3:

The files for this tutorial: Download the budget data here and the PDI/Kettle file here.


You can find the video's also on YouTube:


Video Part 1: click here
Video Part 2: click here
Video Part 3: click here





Please find below some more information:

Code for 1st JavaScript step:

function daysInMonth(iMonth, iYear)
{
iMonth=iMonth-1;
return 32 - new Date(iYear, iMonth, 32).getDate();
}

var days_in_month=daysInMonth(month,year);
var days_loop_nr=days_in_month-1;

var year=2010;

var bdgt_unsubs=bdgt_sub_base_start+bdgt_new_subs-bdgt_sub_base_end;




I found a nifty function (”daysInMonth”) on the internet that calculates the days of a particular month (we are neither JavaScript heros nor do we have time). This function provides us with the required number of days in a month, which we assign to the days_in_month variable. Important thing to remember is that we already have one row of data for each month, hence we have to deduct 1 from the days_in_month. We store this value in the “days_loop_nr” variable which we will use for the clone step.

We also add a default year and calculate how many users unsubscribed from our service (”bdgt_unsubs”). We need this figure as we want to calculate later on a daily figure for the subscription base (users subscribe and unsubscribe all the time).



Code for the 2nd JavaScript step:

var day_of_month;
// used to hold value of previous row:
var year_month_old;
var bdgt_sub_base_start_old;
var bdgt_new_subs_daily_old;
var bdgt_unsubs_daily_old;


//define day of month
var year_month=(year+"")+(month+"");

if(year_month!=year_month_old){
day_of_month=1;
}
else {
day_of_month=day_of_month+1;
}


//calculate the daily sub base at the start of the day
if(month==1 && day_of_month==1){
var bdgt_sub_base_start_daily=bdgt_sub_base_start;
}
else {
var bdgt_sub_base_start_daily=bdgt_sub_base_start_daily_old+bdgt_new_subs_daily_old-bdgt_unsubs_daily_old;
}


//keep value of current fields for next iteration
//assign them to a new variable so that they are available in the next iteration
//it is important that you only save the new value of year_month
//here as you want to work with the old value in the if statement
//above
year_month_old=year_month;
bdgt_sub_base_start_daily_old=bdgt_sub_base_start_daily;
bdgt_new_subs_daily_old=bdgt_new_subs_daily;
bdgt_unsubs_daily_old=bdgt_unsubs_daily;





We declare some variables first. Then we create the day number for each row: We use an if condition therefore. We have to start with 1 when a new month begins and then increase the number by 1.

It is important to understand the condition:
if(year_month!=year_month_old)

For the very first row of our dataset, the variable year_month_old won’t have a value. Hence, if(year_month!=year_month_old) will be true.

At the very end of the script we create a new variable call year_month_old which stores the current value of year_month for the next iteration. We only create this variable at the end of the script, so that we can use it in the next iteration for the condition before we overwrite it again.

A variable will be only overwritten if a new value is available.

So imagine we are in the second row of our dataset, the if(year_month!=year_month_old) condition will look like this: if(201001!=201001), hence the script will increase the value of the day by 1.  Once the month changes (i.e. if(201002!=201001)) then we have to start with 1 again.

That’s quite a nice solution. Again the javascript code might not be the best one, but it works. Please feel free to improve it.

We also have figures that we cannot just divide by the number of days in a month. The subscription base has to be calculated in a more complex way (well, not that complex). The calculation uses the same concept as above: saving the field value of the previous row in a new variable so that you can use it again in the current row.








Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Pentaho Kettle Parameters and Variables: Tips and Tricks
    Pentaho Kettle Parameters and Variables: Tips and Tricks This blog post is not intended to be a formal introduction to using parameters and ...
  • Using Parameters in Pentaho Report Designer
    Using Parameters in Pentaho Report Designer Introduction How to define a parameter Additional info about the new parameter dialog Hidden Par...
  • Pentaho Data Integration: Scheduling and command line arguments
    Pentaho Data Integration (Kettle): Command line arguments and scheduling Tutorial Details Software: PDI/Kettle 4.1 (download here ), MySQL S...
  • Jaspersoft iReport: How to pass a parameter to a sub-dataset
    Jaspersoft iReport: How to pass a parameter to a sub-dataset Let’s say our main report is grouped by continent and the details band holds sa...
  • Using regular expressions with Pentah...
    Using regular expressions with Pentaho Data Integration (Kettle) There are quite some transformations steps that allow you to work with regu...
  • Pentaho Data Integration and Infinidb Series: Bulk Upload
    Pentaho Data Integration and InfiniDB Series: Bulk Upload Introduction Prepare Tables Using mainly Kettle steps Check if file exists Setup I...
  • Pentaho Data Integration: Remote execution with Carte
    Pentaho Data Integration: Remote execution with Carte Tutorial Details Software: PDI/Kettle 4.1 (download  here ), installed on your PC and ...
  • How to create a loop in Pentaho Kettle
    I finished my first ever video tutorial! This video will demonstrate you how easy it is to create a loop in Pentaho Kettle. Enjoy!
  • Understanding the Pentaho Kettle Dimension Insert/Update Step Null Value Behaviour
    We will be using a very simple sample transformation to test the null value behaviour: We use the Data Grid step to provide some sample dat...
  • PDI: Full Outer Joins
    Pentaho Data Integration: Full Outer Joins Creating full outer joins in Pentaho Data Integartion (PDI aka Kettle) is a fairly straight forwa...

Categories

  • "Bulk Loader"
  • "Bulk Loading"
  • "Hadoop"
  • "Kettle"
  • "Pentaho Book"
  • "Pentaho Data Integration"
  • "Pentaho Kettle"
  • "Pentaho Report Designer MDX MQL JDBC Parameters How To"
  • "Pentaho Report Designer MDX Parameters"
  • "Pentaho Report Designer MQL Parameters"
  • "Pentaho Report Designer Parmaters"
  • "Pentaho Report Designer"
  • "Pentaho Reporting 3.5 for Java Developers"
  • "Pentaho Reporting Book"
  • "Routing"
  • "Schema Workbench"
  • "Testing"
  • "Unicode"
  • "Unit testing"
  • "UTF8"
  • Agile development
  • automated testing
  • Big Data
  • Book Review
  • C-Tools
  • CBF
  • Clustered transformation
  • Command Line Arguments
  • Community Build Framework
  • D3JS
  • Dashboarding
  • Data Integration
  • Data Warehouse
  • Database Change Management
  • Database Version Control
  • Date Dimension
  • DBFit
  • ETL
  • ETLFit
  • Federated database
  • Google Charts
  • Google Visualization API
  • Hadoop
  • HTML5
  • iReport
  • JasperReports
  • JasperSoft
  • JasperStudio
  • Kettle
  • Kimball
  • Loop
  • Master data management
  • Metadata
  • Metedata editor
  • Mondrian
  • multidimensional modeling
  • OLAP
  • Open Source
  • Parameter
  • Parameters
  • Pentaho
  • Pentaho BI Server
  • Pentaho Data Integration
  • Pentaho Data Integration 4 Cookbook
  • Pentaho Kettle
  • Pentaho Metadata Editor Tutorial
  • Pentaho Report Designer
  • PostgreSQL
  • PRD
  • Report Layout
  • REST
  • Routing
  • Saiku
  • Scheduling
  • Slowly Changing Dimension
  • Sqitch
  • SVG
  • Talend
  • Talend MDM
  • Talend Open Studio
  • Tutorial
  • Variable
  • Web service
  • Xactions

Blog Archive

  • ►  2013 (24)
    • ►  December (2)
    • ►  November (3)
    • ►  October (2)
    • ►  September (1)
    • ►  August (3)
    • ►  July (2)
    • ►  June (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (3)
    • ►  February (1)
    • ►  January (3)
  • ►  2012 (20)
    • ►  November (3)
    • ►  October (3)
    • ►  August (1)
    • ►  June (1)
    • ►  April (1)
    • ►  March (3)
    • ►  February (5)
    • ►  January (3)
  • ►  2011 (19)
    • ►  November (3)
    • ►  July (2)
    • ►  June (1)
    • ►  May (4)
    • ►  April (2)
    • ►  March (1)
    • ►  February (3)
    • ►  January (3)
  • ▼  2010 (17)
    • ►  December (1)
    • ►  November (6)
    • ►  September (1)
    • ►  August (1)
    • ▼  June (2)
      • PDI: Clone your data in Kettle! (Video Tutorial)In...
      • PDI: Full Outer Joins
    • ►  May (1)
    • ►  April (3)
    • ►  February (1)
    • ►  January (1)
  • ►  2009 (18)
    • ►  December (3)
    • ►  November (1)
    • ►  October (5)
    • ►  September (7)
    • ►  July (2)
Powered by Blogger.

About Me

Unknown
View my complete profile