- Make sure your MySQL table uses the UTF-8 encoding
- Make sure that in the database connection details in Kettle following options are set: characterEncoding=utf8, characterSetResult=utf8,useUnicode=true.
- Once you ran the ETL process and populated, don't worry if MySQL Query Studio displays the characters as an array of pipes (in this case the Cyrillic fonts are not installed). If you see question marks, well then, something is still wrong.
Wednesday, 2 December 2009
Exporting Characters as UTF-8 from Ke...
Tuesday, 3 November 2009
Using Parameters in Pentaho Report Designer
- Introduction
- How to define a parameter
- Additional info about the new parameter dialog
- Using parameters with database queries
- Using parameters with Metadata/MQL queries
- Using parameters with Mondrian/OLAP/MDX queries
- XPath
- Reflection/Java Method Call
- Pentaho Data Integration (Kettle)
- Scriptable
- Referencing parameters in the message field
Introduction
You wonder what all this fuss is about? Or are you one of the many who have been struggling to use parameters in your data queries?
Using parameters with Pentaho Report Designer 3.5 (PRD) and later versions is not a trivial topic. The main problem is, that for each data source the parameter definition is different. Using parameters with standard database queries is fairly easy, using it with MDX, MQL queries and other data sources is a bit more an advance topic.
We will start off with discussing the easiest implementation and then we will move on to discuss solutions for MQL and MDX queries.
How to define a parameter
PRD 3.5 is a major improvement over previous versions of this software. One of the new features is, that you can define parameters within your report, which means, that you do not have to use Xactions any more. (You can still use Xactions for more complex logic, but generally speaking, defining the parameters within your report will make your life easier).
Within the data tab in PRD you can see a section called "Parameters". Right click on it and select "Add parameter". In the dialog you can define your parameter name, define the input type (drop down menu etc.), add a data source to populate the input dialog and make the parameter mandatory. This is all fairly straight forward and easy to set up.
Note: You have to set up the parameter before you reference it in your data queries.
Additional info about the new parameter dialog
Hidden Parameters
By selecting a tiny little check-box, parameters can now be excluded from the UI-generation. The parameter itself still behaves like any other parameter and thus takes part in the formula-calculations and is part of the validation.
Parameter Validation can be loosened
When this flag is set to false, the validator will accept other values than the ones that have been specified in a list-parameter. With relaxed validation, you can easily write WHERE conditions, that accept partial matches or contain “LIKE” clauses. For the sake of security, we none-the-less default this feature to strict-checks.
Display-Formula
Every List-Parameter can now have computed display-texts. This way, you are no longer limited by what your data source can produce, and even a simple data-source like an XML file or a static table-datasource can produce better looking prompts.
Post-Processing Formula
A post-processing formula processes the parameter-input before it gets validated or passed into the report. The post-processing can either be defined on the parameter itself or can be defined as an extra field by using a hidden parameter. The computed value will still be validated by the engine and thus using formula’s does not necessarily mean you have to fear that illegal input comes through. Formulas allow you to access runtime information, like the current user’s roles or the user’s name via the ENV(..) function.
Using parameters with database queries
From PRD 3 to PRD 3.5 the way you define your parameters in a SQL query has slightly changed.You define a parameter within a SQL query like this: ${parametername}. Make sure you do not use enclosing single or double quotation marks, as PRD knows already how to display the parameter.
Example:
SELECT
*
FROM
table
WHERE
start_date <= ${my_start_date}
;
Using parameters with Metadata/MQL queries
<constraints>
<constraint>
<operator>AND</operator>
<condition>[TABLE.COLUMN] = [param:MY_PARAMETER]</condition>
</constraint>
</constraints>
Let's have a look now at how to use date parameters in MQL. Normally, if you are using a date in MQL, you would have to use DATEVALUE("2009-11-01"), but it seems that if you are using a data parameter, DATEVALUE() shouldn't be used. As Thomas (Taqua) points out "'Normally' probably refers to 'in the old days'. But in the old days, Metadata had no parametrization at all, and all the parametrization that happened in the BI-server was a simple string-manipulation exercise on the XML code that represents the query. Thats why you in reality only passed in strings and then had to convert it via a DATEVALUE function. (And of course, its just a horrible way to do parametrization, as you now have to be careful that your strings do not break the XML or even rewrite the XML for fun and profit.)"
Using parameters with Mondrian/OLAP/MDX queries
How to pass on a single value
Imagine you want to pass on a single value to the member of a simple dimension like [Login Date].[Date]. You can use a function called parameter, which basically looks like this: parameter("parametername", dimension, member default value).The member default value will be used (you guessed it) in case no parameter value is available. This is very useful if you are still designing your report in PRD.
Here is an example of using this approach:
SELECT
LastPeriods(6,Parameter("my_date",[Login Date],[Login Date].[2009-09-12]))
ON ROWS,
[Measures].[Distinct Users]
ON COLUMNS
from [Logins]
Since PRD 3.6 I've been also using the standard variable approach of ${variable} successfully:
WITH
MEMBER [Measures].[7daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(7.0),
[Measures].[CTR])
MEMBER [Measures].[6daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(6.0),
[Measures].[CTR])
MEMBER [Measures].[5daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(5.0),
[Measures].[CTR])
MEMBER [Measures].[4daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(4.0),
[Measures].[CTR])
MEMBER [Measures].[3daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(3.0),
[Measures].[CTR])
MEMBER [Measures].[2daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(2.0),
[Measures].[CTR])
MEMBER [Measures].[1daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(1.0),
[Measures].[CTR])
SET [Period] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(7.0) : CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(1.0))
MEMBER [Measures].[Period] AS
Avg([Period],
[Measures].[CTR])
MEMBER [Capsule].[Total Live Capsules] AS
SUM((FILTER([Capsule].[Capsule].Members,
(NOT ([Capsule].[Capsule].CurrentMember.Name MATCHES ".*test.*")))),([Measures].CurrentMember))
SELECT
{[Measures].[7daysago],
[Measures].[6daysago],
[Measures].[5daysago],
[Measures].[4daysago],
[Measures].[3daysago],
[Measures].[2daysago],
[Measures].[1daysago],
[Measures].[Period]}
ON COLUMNS,
NON EMPTY (UNION(({Filter({[Capsule].[Capsule].Members},
(NOT ([Capsule].[Capsule].CurrentMember.Name MATCHES ".*test.*")))} * {[Creative].Children}),
({[Capsule].[Total Live Capsules]} * {[Creative]}),
ALL))
ON ROWS
FROM [FUNNEL CUBE]
WHERE
({[Site].[${VAR_PARTNER_NAME}].[UNKNOWN].[${VAR_CLIENT_NAME}]} * {[Bucket Type].[TREATMENT]} * {[Ad Type].[Ad Type].[Retargeted]})
How to pass on multiple values
It is very likely that you have to populate nearly the whole dimension, i.e. [Login Date.Monthly Calendar].[Monthly Calendar All].[Year].[Quarter].[Month].[Day] which would look like this: [Login Date.Monthly Calendar].[Monthly Calendar All].[2009].[Q3].[Jul].[2].More info
Beginning with Milestone 2 of the Pentaho Report-Designer 3.6,we now support parameter injection via the${parameter} syntax. Parameter values injected via that syntax will not be checked in any way, so it is the designer's responsibility to ensure that everything is quoted correctly to cause no harm or to break the query. With great powers comes great responsibility.
The ${parameter} syntax for MDX is not just a toString() conversion. It follows the MessageFormat syntax and thus allows to format Date and Number objects properly before inserting them into the MDX query. An extended format rule allows to produce quoted MDX-string literals by specifying the subformat <string. These strings start and end with a double-quote and all double-quote characters found in the original string get escaped according to the MDX grammar.
So now I can finally answer the question on how to parametrize a Date-Axis from a Date-parameter. To produce a member string like [2009].[10].[4] from a parameter called dateparam use[${dateparam,date,"yyyy"}].[${dateparam,date,"MM"}].[${dateparam,date,"dd"}] in your MDX query.
I still haven't found out how to do the same with the PARAMETER function.
Support for the PARAMETER function will remain there (as in theory it is a good idea to have prepared/explicit parameter).
You can test this functionality with either the latest CI build or with the upcoming Milestone 2 of the Report-Designer 3.6.
by the reporting engine (if the parameter is a java.lang.String, otherwise we will throw a very visible exception). You are using a StrToMember which probably does not like the "{..}" syntax of the sets. Try a StrToSet instead, if you intend to pass a set.
You can also skip the parameter function and use the classical ${...} parametrization syntax (PRD 3.6 or newer), which performs a simple replacement on the MDX string. To prevent code injections you can specify a subformat along with those parameters: ${param} will include the parameter without any modification.
${param,string} will quote the parameter value as MDX string (slap double quotes around it and escape all inner double quote chars)
${param,formattype,formatstyle} will treat the parameter as a message format request, so that you can format numbers, dates and so on properly.
For the grammar of formattype and formatstyle have a look at http://download.oracle.com/doc
How to create a parameter for a set (posted by Bill W.):
select [selectedset] ON COLUMNS,
from [Monthly Influenza Summary]
I am using PRD parameters to get the set of customers in a multi-select, then I turn that set into a formatted array which I use in the query. The basics are there now.
XPath
- also uses the ${param} syntax to have parameters. (Same syntax as JDBC)
Reflection/Java Method Call
- parameters are given as plain names, no special encoding needed.
- parameter are separated by comma
- if the parameter-name itself contains a comma or double-quotes, quote the parameter-name by
- enclosing it in double quotes
- any inner double-quote character must be quoted by another double quote. (Parameter name : Joe"A" becomes "Joe""A""")
Pentaho Data Integration (Kettle)
- Parameter are declared explicitly
- PDI only accepts strings, so if the parameter is no string, then it will be converted via "toString()"
- KTR file needs to declare the parameter and arguments
Scriptable
- Scripts have a "dataRow" variable that implements org.pentaho.reporting.engine.classic.core.DataRow.
- Scripts also have access to
- "configuration" (org.pentaho.reporting.libraries.base.config.Configuration)
- "resourceBundleFactory" (org.pentaho.reporting.engine.classic.core.ResourceBundleFactory)
- "resourceManager" (org.pentaho.reporting.libraries.resourceloader.ResourceManager)
- "contextKey" (org.pentaho.reporting.libraries.resourceloader.ResourceKey)
Referencing parameters in the message field
Tuesday, 27 October 2009
Send Pentaho Report By Email
The Pentaho Report Designer 3.5 brought along some important improvements. If you set up Xactions before that sent out reports by email, there are some slight changes now that you want to be aware of.
In my case I prepared a report that uses a MDX query as source, which automatically selects the last 5 days of data. The report doesn't use an input, so it is fairly easy to integrate into an Xaction.
My idea is to have basically a simplified version of the report in the email body and the full report with a chart as attachment. For the report, that I intend to have in the email body, I have to make sure that the CSS is not an external file. You can do this by going to File/Configururation in the Report Designer 3.5 and setting InlineStyle to true in output-table-html.
I set up following steps in the Xaction (Pentaho Report Designer 3.5):
- Message template for email subject name. Define an output name.
- Message template for file attachment name. Define an output name.
- Message template for email text. Define an output name.
- Pentaho Report XLS: In this step we create the Excel file for the attachment. Tag "Use Content Repository". Define "Report XLS" as output-name. Make sure you set the output-format to application/vnd.ms-excel
- Pentaho Report HTML: Set to output-format to text/html. Tag "Use Content Repository". We will use this output for the email html body. Define "Report-HTML" as output-name.
- Now drag and drop both Report-HTML and Report-XLS into the output folder in the report-outputs area. For each of them, add a output destination. At this point it doesn't matter which destination you choose.
- Now we have to edit the XML directly, hence change to the XML view. Look for the outputs that you defined earlier, and if you defined i.e. a file destination beforehand, then replace the <file></file> by <contentrepo></contentrepo>.
- Now add the email step. Fill out the fields and reference the parameters that we define earlier on where applicable. In example: Add a parameter to HTML message: "Report-HTML".
Friday, 9 October 2009
Current date function on Mondrian
- your time dimension, i.e. [Login Time]
- the mapping format, i.e. '["Login Time"]\.[yyyy]\.["Q"q]\.[m]\.[d]', which will resolve to [Login Time].[2009].[Q4].[10].[7]
- mapping method (optional): BEFORE, AFTER or EXACT
Mapping [2009-10-12] date format:
select {CurrentDateMember([Login Date], '[yyyy-mm-dd]').Lag(3.0)} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}
For some reason, although the documentation clearly states "MM" for
month number with a leading 0, only "mm" will work. It is a good idea
to check the mondrian.log in case you experience errors, as you will
see there if the translation works (i.e.: if the Mondrian log shows
[2009-MM-09], you know that the month number was not translated).
The above example was for a dimension with one hierarchy only. Please find below an example with a dimension with more than one hierarchy:
SELECT
CurrentDateMember([Date].[Date],'[Date]\.[Date]\.[yyyy-mm-dd]').Lag(357.0) ON COLUMNS,
NON EMPTY {Hierarchize({[Measures].[Sales]})} ON ROWS
FROM [Sales]
Mapping [2009].[4].[October].[12] date format:
select {CurrentDateMember([Login Time Monthly], '[yyyy]\.[q]\.[mmmm]\.[d]').Lag(3)} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}
Mapping [2009].[41].[6] date format:
select {CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]\.[w]').Lag(3)} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}
So when should you now mention the dimension in the formating string? If you avoid mentioning it, Mondrian will have to search for the existence of the various members within all the dimensions, which takes processing time. If you have only one date dimension, then you could theoretically not mention the dimension string, if you have more, then there is no way around it.
Imagine now, we want to have a monthly and weekly summary of the last 6 periods. So how do we approach this?
Last 6 weeks:
select {
CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]').Lag(6) :
CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]')
} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}
Please keep in mind that here our time dimension looks like this: [year].[week].[day]
Last 6 months:
select {
CurrentDateMember([Login Time Monthly], '["Login Time Monthly"]\.[yyyy]\.[q]\.[mmmm]').Lag(6) :
CurrentDateMember([Login Time Monthly], '["Login Time Monthly"]\.[yyyy]\.[q]\.[mmmm]')
} ON COLUMNS,
{[Login Channel].[All Login Channels]} ON ROWS
from [Logins]
WHERE
{[Measures].[Distinct Users]}
Please keep in mind that here our time dimension looks like this: [year].[quarter].[month].[day]
I hope that this tutorial showed you the power of CurrentDateTime(). It is a very useful function, especially if you have to do a lot of analysis across time. It tooks me quite some time to use this function correctly (especially as there are not many examples), so I hope you can implement it within 5 minutes.
Wednesday, 7 October 2009
How to change the Pentaho Login Screen
Have you ever wondered how to get rid of the sample users in the Pentaho BI Server login box? Tom Barber pointed out a simple solution recently on the Pentaho forum that was originally posted by Paul Stoellenberger:
biserver-ce\tomcat\webapps\pentaho\mantleLogin\loginsetting s.properties
there is an option:
#showUsersList=true
change that to
showUsersList=false
Thursday, 1 October 2009
Converting binary data type to string in...
Sometimes you might come across data that is store in in binary form in a table. To convert this data you chose one of the following approaches:
Convert directly using SQL in the database input step
One quick method would be to use the CONVERT or CAST function (test which one works best for you):
Select Values ... step
Modified Java Script Value Step
If you have to import the binary data into Kettle, you can use this approach:var string = new Packages.java.lang.String(yourField, "UTF-8");
Setting up an "All" Parameter
We do the same for the channels query. Please keep in mind that UNION will remove any duplicates. If you are sure that you have no duplicates, you can use UNION ALL to improve the performance.
As some of you might use Oracle as well, a user named "zulu" from the pentaho forum pointed out that:
"Not sure if this helps you now, but depending on your SQL dialect, a
NULL (meaning nothing) is treated differently to a "NULL" string.
In Oracle your predicate could be:
WHERE (${media} IS NULL OR media=${media}).
Oracle applies "lazy" logic, so if your Media parameter is not
completed by the user (meaning it "IS NULL"), the condition will not
even check the "media=${media}" part."
Just a remark from my side: My original post included a query like this one "(media = ${media} OR 'All Media' = ${media})"
You can find the original post here.
Update 2012-05-30: Somebody asked me how to achieve the same with mulitselect parameters. Here is the approach that works for me (It's a different dataset, but you get the idea):
SELECT
`city_sales`.`date`,
`city_sales`.`continent`,
`city_sales`.`country`,
`city_sales`.`city`,
`city_sales`.`revenue`
FROM
`city_sales`
WHERE
(city IN (${CITY}) OR "All cities" IN (${CITY}))
Note: I defined the CITY parameter as type STRING.