Community Server

Tuesday, 27 October 2009

Send Pentaho Report By Email

Posted on 09:04 by Unknown

Send Pentaho Report By Email

The Pentaho Report Designer 3.5 brought along some important improvements. If you set up Xactions before that sent out reports by email, there are some slight changes now that you want to be aware of.

In my case I prepared a report that uses a MDX query as source, which automatically selects the last 5 days of data. The report doesn't use an input, so it is fairly easy to integrate into an Xaction.

My idea is to have basically a simplified version of the report in the email body and the full report with a chart as attachment. For the report, that I intend to have in the email body, I have to make sure that the CSS is not an external file. You can do this by going to File/Configururation in the Report Designer 3.5 and setting InlineStyle to true in output-table-html.

I set up following steps in the Xaction (Pentaho Report Designer 3.5):

Message template for email subject name. Define an output name.
Message template for file attachment name. Define an output name.
Message template for email text. Define an output name.
Pentaho Report XLS: In this step we create the Excel file for the attachment. Tag "Use Content Repository". Define "Report XLS" as output-name. Make sure you set the output-format to application/vnd.ms-excel
Pentaho Report HTML: Set to output-format to text/html. Tag "Use Content Repository". We will use this output for the email html body. Define "Report-HTML" as output-name.
Now drag and drop both Report-HTML and Report-XLS into the output folder in the report-outputs area. For each of them, add a output destination. At this point it doesn't matter which destination you choose.
Now we have to edit the XML directly, hence change to the XML view. Look for the outputs that you defined earlier, and if you defined i.e. a file destination beforehand, then replace the <file></file> by <contentrepo></contentrepo>.
Now add the email step. Fill out the fields and reference the parameters that we define earlier on where applicable. In example: Add a parameter to HTML message: "Report-HTML".

Posted in | No comments

Friday, 9 October 2009

Current date function on Mondrian

Posted on 06:39 by Unknown

Current date function on Mondrian

Pentaho BI Suite - Mondrian

I create quite a lot of operational dashboard views like "How many users did we acquire through which kind of channels in the last 7 days on a daily level?". In the past my solution was to write some Javascript within an xaction to define the date variables and then to pass on these variables as parameters to an MDX query. The results set of the MDX query would then be passed on to a JPivot component. While you can fairly easily implement this in an xaction, it is probably not the most convenient way to do this.

Ideally your OLAP server would offer this kind of functionality and as I only found out recently, Mondrian has actually been supporting this for a while. Mondrian offers a function called CurrentDateMember() that basically gets the current date for you and maps it to your time dimension. As everybody's time dimension is not set up the same way, CurrentDateMember() solves the problem by providing a formatting string, that can do the mapping for you.

But it doesn't stop there: In case you have a time dimension that is less than daily, you can specify BEFORE, AFTER or EXACT as the last argument of the function and it will figure out your time dimension details respectively.

So in a glance CurrentDateMember() has following arguments:

your time dimension, i.e. [Login Time]
the mapping format, i.e. '["Login Time"]\.[yyyy]\.["Q"q]\.[m]\.[d]', which will resolve to [Login Time].[2009].[Q4].[10].[7]
mapping method (optional): BEFORE, AFTER or EXACT

The mapping format uses the VB format syntax, which is used for MDX in general as well. As you can see in the example above, you have to use \\ before dots and quotes are need to escape the formatting process.

By using CurrentDateMember(argument1, argument2).Lag(3) you can go 3 siblings back (in example it could give you the date of 3 days ago). If you want to go 3 days ahead in time, specify Lag(-3).

Please find some working examples below:

Mapping [2009-10-12] date format:

select {CurrentDateMember([Login Date], '[yyyy-mm-dd]').Lag(3.0)} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}
For some reason, although the documentation clearly states "MM" for
month number with a leading 0, only "mm" will work. It is a good idea
to check the mondrian.log in case you experience errors, as you will
see there if the translation works (i.e.: if the Mondrian log shows
[2009-MM-09], you know that the month number was not translated).

The above example was for a dimension with one hierarchy only. Please find below an example with a dimension with more than one hierarchy:

SELECT
CurrentDateMember([Date].[Date],'[Date]\.[Date]\.[yyyy-mm-dd]').Lag(357.0) ON COLUMNS,
NON EMPTY {Hierarchize({[Measures].[Sales]})} ON ROWS
FROM [Sales]

Mapping [2009].[4].[October].[12] date format:
select {CurrentDateMember([Login Time Monthly], '[yyyy]\.[q]\.[mmmm]\.[d]').Lag(3)} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}
Mapping [2009].[41].[6] date format:

select {CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]\.[w]').Lag(3)} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}

So when should you now mention the dimension in the formating string? If you avoid mentioning it, Mondrian will have to search for the existence of the various members within all the dimensions, which takes processing time. If you have only one date dimension, then you could theoretically not mention the dimension string, if you have more, then there is no way around it.

Imagine now, we want to have a monthly and weekly summary of the last 6 periods. So how do we approach this?

Last 6 weeks:
select {
CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]').Lag(6) :
CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]')
} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}

Please keep in mind that here our time dimension looks like this: [year].[week].[day]

Last 6 months:

select {
CurrentDateMember([Login Time Monthly], '["Login Time Monthly"]\.[yyyy]\.[q]\.[mmmm]').Lag(6) :
CurrentDateMember([Login Time Monthly], '["Login Time Monthly"]\.[yyyy]\.[q]\.[mmmm]')
} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}

Please keep in mind that here our time dimension looks like this: [year].[quarter].[month].[day]

I hope that this tutorial showed you the power of CurrentDateTime(). It is a very useful function, especially if you have to do a lot of analysis across time. It tooks me quite some time to use this function correctly (especially as there are not many examples), so I hope you can implement it within 5 minutes.

VB Format Syntax: http://www.apostate.com/vb-format-syntax

Posted in | No comments

Wednesday, 7 October 2009

How to change the Pentaho Login Screen

Posted on 03:57 by Unknown

How remove the sample users from the login box

Pentaho BI Server - Configuration

Have you ever wondered how to get rid of the sample users in the Pentaho BI Server login box? Tom Barber pointed out a simple solution recently on the Pentaho forum that was originally posted by Paul Stoellenberger:

You can edit the login properties of mantle in:
biserver-ce\tomcat\webapps\pentaho\mantleLogin\loginsetting s.properties

there is an option:
#showUsersList=true

change that to
showUsersList=false

That's easy enough, if you know it :)

Posted in | No comments

Thursday, 1 October 2009

Converting binary data type to string in...

Posted on 09:06 by Unknown

Converting binary data type to String in Kettle

Sometimes you might come across data that is store in in binary form in a table. To convert this data you chose one of the following approaches:

Convert directly using SQL in the database input step

One quick method would be to use the CONVERT or CAST function (test which one works best for you):

CONVERT(prodCode USING latin1) AS prodCode

CAST(prodCode AS SIGNED) AS prodCode

Select Values ... step

Go to the Meta-data tab, choose your binary field and set Binary to Normal to Y.

Note: You can only use one tab at a time in the Select Values ... step!

Modified Java Script Value Step

If you have to import the binary data into Kettle, you can use this approach:

var string = new Packages.java.lang.String(yourField, "UTF-8");

Posted in | No comments

Setting up an "All" Parameter

Posted on 09:00 by Unknown

Setting up an "All" Parameter

This short tutorial is fairly similar to the last one, let's say it's an improved version:

We work again with Pentaho Report Designer 3.5. Imagine you want to give your users the possibility to choose either one of the parameter values or all. So how can we implement this with a MySQL query?

The approach is as follows:

Set up the query to for the parameter values like this:

SELECT

"All Countries" AS country_name

UNION

SELECT

country_name

FROM

table

;

We do the same for the channels query. Please keep in mind that UNION will remove any duplicates. If you are sure that you have no duplicates, you can use UNION ALL to improve the performance.

Set up a new parameter called "country_name", tick "mandatory", choose the above query as the source, set the type to string.

Now change the main query that feeds the report like this:

SELECT

[...]

WHERE

(cc.channel_name = ${channel_name} OR "All Channels" = ${channel_name}) AND

(country_name = ${country_name} OR "All Countries" = ${country_name}) AND

the_date >= ${start_date} AND

the_date <= ${end_date}

GROUP BY 1,2,3,4

;

Save everything and run the report ... you should see the all options now in your pull down menu:

As some of you might use Oracle as well, a user named "zulu" from the pentaho forum pointed out that:
"Not sure if this helps you now, but depending on your SQL dialect, a
NULL (meaning nothing) is treated differently to a "NULL" string.

In Oracle your predicate could be:
WHERE (${media} IS NULL OR media=${media}).
Oracle applies "lazy" logic, so if your Media parameter is not
completed by the user (meaning it "IS NULL"), the condition will not
even check the "media=${media}" part."

Just a remark from my side: My original post included a query like this one "(media = ${media} OR 'All Media' = ${media})"
You can find the original post here.

Update 2012-05-30: Somebody asked me how to achieve the same with mulitselect parameters. Here is the approach that works for me (It's a different dataset, but you get the idea):

SELECT
`city_sales`.`date`,
`city_sales`.`continent`,
`city_sales`.`country`,
`city_sales`.`city`,
`city_sales`.`revenue`
FROM
`city_sales`
WHERE
(city IN (${CITY}) OR "All cities" IN (${CITY}))

Note: I defined the CITY parameter as type STRING.

Posted in | No comments

Tuesday, 29 September 2009

Pentaho Report Designer 3.5: Not mand...

Posted on 09:01 by Unknown

How to set up not mandatory parameters

When creating parameters in Pentaho Report Designer 3.5, you can definie if the paramater is mandatory or not.

Imagine we have a query like this one to feed our report:

SELECT

FROM

table

WHERE

channel = ${channel} AND

media = ${media}

;

I define two parameters for this report: channel and media. The end user will be able to choose a value for these parameters once they call the report. In the query you define the parameter like this: ${parameter_name}. This is fairly easy so far, but how do we configure the report if we want to allow the users to choose either one media value or non (=showing all media types)?

This is not difficult to implement, if you know how to do it. Basically, you have to change your query to this:

SELECT

FROM

table

WHERE

channel = ${channel} AND

(media = ${media} OR 0 = IFNULL(${media},0)

;

Basically, if the user doesn't choose any value for media, the media value will be set to NULL (as in "no value"). So I changed the query to include "(media = ${media} OR 0 = IFNULL(${media},0)", which basically says: If there is a media value, apply it, if not, return all the media types (=no filter). So depending on which part is TRUE, the query result will change!

Now this is a fairly wonderful way to implement this. The only problems are that the Null value/No filter will be only available when you call the report in the first go, once you changed the Media to "TV" in example, the drop down menu (or whatever display type you chose) doens't offer a NULL value any more. Also, it would be great to be able to rename the NULL value for the end user individually, in example to "All Media Types".

Posted in | No comments

Monday, 28 September 2009

Pentaho Kettle: Using Full Outer Join

Posted on 07:57 by Unknown

Today I came across a special use case for an outer join. I was planning to set up an ETL process that imports yesterday's revenue summary by summarizing the input of two tables.

The main problem was, that we might not have made any revenue yesterday at all, or there might be only revenue in one table but not the other.

Basically, if revenue figures exist in both tables, the process would be following:

- input data from table 1: date, service name, revenue_cars
- input data from table 2: date, service name, revenue_carparts

- join the tables, so that the output looks like: date, service name, revenue_cars, revenue_carparts
The problem is, as mentioned, that not every day data will be available in both tables. So I would have to implement a logic like this one:

IF input data table 1 NOT EMPTY AND input data table 2 EMPTY THEN output only data of table 1.
IF input data table 2 NOT EMPTY AND input data table 1 EMPTY THEN output only data of table 2.
IF input data table 1 NOT EMPTY AND input data table 2 NOT EMPTY THEN JOIN everything and output it.

All this can be achieved by using a FULL OUTER JOIN.

So your process in Kettle should be as follows:

Drag and drop two "table input" steps into the working area and define the settings. Our query looks like this:For table1:
SELECT
date,
service_name,
COUNT(*) AS count
FROM
table1
GROUP BY 1,2
For table2:
SELECT
date,
service_name,
COUNT(*) AS count,
SUM(rev) AS rev_sum
FROM
table2
GROUP BY 1,2
For each of them add a sort step and connect them with the table input. Make sure you sort the relevant fields in ascending order (in our case this is date and service_name).
Drag and drop a Join step on the working area, connect them with the two sort steps and then define the two sort steps as input.

In this screenshot you can also see a Group By Step. This one is not in use at all. But in case you want to check if there is any output of this step, the group by step has an option called "Always give back a result row", which, when ticked, will set the aggregation to 0 (Zero). This might be useful in some cases. This is different to the Detect Empty Stream step, which only gives back a completely empty row in any case.

Ok, so now we have the FULL OUTER JOIN working. Have a look at the output by exporting it into a text or Excel file. You will see that in case both result sets didn't match, the number fields (count and rev_sum) are blank. As we want to use these fields later on in calculations, we have to assign a default value in case the number field is null.

Insert a If field value is null step, connect it to the previous step and double click on it afterwards. Tick "Select fields" and in the fields table add count and rev_sum and assign a default value of 0 (Zero) to them.

Another thing that you will also have recognized by looking at the output is that we have now two additional fields: service_name_1 and date_1.

So now we have to do a check. In case there are matching values in both results, both

date and data_1 and
service_name and service_name_1
will be populated. If it is not in both, only one of them will be populated. So we are going to create a new variable called service_name_new and date_new that are going to replace the old ones in a "Modified Java Script Value" Step. Drag and drop this step on the working area and connect it with the Merge Join Step. Douple click on the Modified Java Script Value Step and insert following script plus tick "Compatibility Mode":

if (service_name.isNull()) {
var service_name_new=service_name_1;
}
else
{
var service_name_new=service_name;
}

if (date.isNull()) {
var date_new=date_1;
}
else
{
var date_new=date;
}

Add service_name_new and date_new as output fields and define the type. Then add a Select values Step and add all the fields, but not service_name_1, service_name, date, date_1. Add another step to export it into Excel in example. Our whole process should look like this now (please ignore that some connection have the copy sign, this will not be the case with your process):

In this small tutorial we have now set up a nicely working FULL OUTER JOIN. I hope that you can implement this in your ETL processes now without any problems.

Posted in | No comments

Community Server

Tuesday, 27 October 2009

Send Pentaho Report By Email

Friday, 9 October 2009

Current date function on Mondrian

Wednesday, 7 October 2009

How to change the Pentaho Login Screen

Thursday, 1 October 2009

Converting binary data type to string in...

Convert directly using SQL in the database input step

Select Values ... step

Modified Java Script Value Step

Setting up an "All" Parameter

Tuesday, 29 September 2009

Pentaho Report Designer 3.5: Not mand...

Monday, 28 September 2009

Pentaho Kettle: Using Full Outer Join

Popular Posts

Categories

Blog Archive

About Me