Community Server

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Saturday, 30 July 2011

Pentaho Data Integration 4 Cookbook - Book Review

Posted on 02:37 by Unknown

Book Review: Pentaho Data Integration 4 Cookbook

Pentaho Data Integration 4 Cookbook by María Carina Roldan and Adrián Sergio Pulvirenti illustrates with plenty examples how data can be imported from various sources, transformed and exported to various sources with the open source tool Kettle (Pentaho Data Integration). It's all a hands-down approach: Follow the step by step guide of the examples and you will have the right knowledge at the end of the book to create your own data integration solutions. The book is easy to read even for novices and offers a great introduction to the world of Pentaho Kettle.

In times of economic downturn and an extremely competitive market companies are trying to save money where ever possible. In the last few years we have seen a trend towards open source business intelligence solutions. The quality and functionality of open source business intelligence tools has increased over time as well and nowadays they are a very convincing alternative to very costly commercial business intelligence solutions.

For a lot of open source business intelligence tools, documentation has been a weak point in the past. Some products have a really strong community following and maybe a Wiki, which were often you main source of information. In the last 2 years this situation changed: There are now some publications available that cover some of these open source business intelligence packages, one of the very recent one being Pentaho Data Integration 4 Cookbook by María Carina Roldan and Adrián Sergio Pulvirenti.

Data Integration (ETL, short for extract, transform, load data) is one of the most important building blocks of a business intelligence solution. You can have the fanciest dashboards and reports, but if you don't have the correct data for them, they are pretty useless. ETL involves sourcing that data from a database, text file, Excel spreadsheet etc, transforming the data and outputting it to any convenient file format or database. Most ETL tools offer a graphical user interface which allows you to create data flows. The beauty of this is that there is hardly any coding involved, it is easy to understand (as you are looking at a flow diagram) and you are working quicker as well (you are not coding everything). Nowadays, the probably two most popular open source ETL tools are Pentaho Data Integration and Talend.  

Pentaho Data Integration 4 Cookbook is an ideal book for somebody who looks for a hands-down tutorial-style book. It is filled with an abundance of examples and you can download the accompanying files from the dedicated website as well. Now what does this book cover:

The first chapter is all about how to get data from databases, parameterizing SQL queries and updating/deleting records from database tables. Especially interesting is the section about using parameters with your SQL query to dynamically source your data. The authors also explain how you can generate primary key values for your dataset in Kettle in case your database table requires them.
The second chapter focuses on getting data from text files. Rich examples are provided to handle structured as well as instructed text files. Especially the section about working with various forms of unstructured text files is really informative. The last section of the chapter is all about retrieving data from Excel spreadsheets. Chapter 3 focuses on working with XML files: Kettle offers features to read, validate and generate XML documents. 
Kettle is not only an ETL tool, but also offers workflow management as well: This book gives a good overview on how to create jobs, which allow to run transformations and other tasks (e.g. send email email notifications, create folders, check if a file exists, etc) in sequential order. File management is covered next, which looks at copying, moving and deleting files and folders. Once you imported the data, you also might have to look up some additional data from other sources, so chapter 5 gives you a good overview on this topic. There is really one nifty example: It is about transforming data about museums and looking about the opening hours by using a web service. 
Up next: Learn how to split, merge and compare data streams. Chapter 7 shows you how to execute and reuse jobs and transformations. Then it's all about how you can integrate your Kettle ETL processes with the other Pentaho products (Business Intelligence Server, Report Designer, Dashboards etc). The final chapter learns you the tricks on how get the most out of Kettle.

All in all, this book is in excellent read for somebody who wants to have a quick start in open source ETL process design. The book is fully based on examples, which explained in an easy to understand language. There are many illustrations as well and at the end of each example you will find a summary and sometimes hints on where to find more info. This book is ideal for somebody who wants to get up and running with this popular open source ETL tool in a short amount of time. If you interested in this book, you can get it directly from here.

Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Pentaho Kettle Parameters and Variables: Tips and Tricks
    Pentaho Kettle Parameters and Variables: Tips and Tricks This blog post is not intended to be a formal introduction to using parameters and ...
  • Using Parameters in Pentaho Report Designer
    Using Parameters in Pentaho Report Designer Introduction How to define a parameter Additional info about the new parameter dialog Hidden Par...
  • Pentaho Data Integration: Scheduling and command line arguments
    Pentaho Data Integration (Kettle): Command line arguments and scheduling Tutorial Details Software: PDI/Kettle 4.1 (download here ), MySQL S...
  • Jaspersoft iReport: How to pass a parameter to a sub-dataset
    Jaspersoft iReport: How to pass a parameter to a sub-dataset Let’s say our main report is grouped by continent and the details band holds sa...
  • Using regular expressions with Pentah...
    Using regular expressions with Pentaho Data Integration (Kettle) There are quite some transformations steps that allow you to work with regu...
  • Pentaho Data Integration and Infinidb Series: Bulk Upload
    Pentaho Data Integration and InfiniDB Series: Bulk Upload Introduction Prepare Tables Using mainly Kettle steps Check if file exists Setup I...
  • Pentaho Data Integration: Remote execution with Carte
    Pentaho Data Integration: Remote execution with Carte Tutorial Details Software: PDI/Kettle 4.1 (download  here ), installed on your PC and ...
  • How to create a loop in Pentaho Kettle
    I finished my first ever video tutorial! This video will demonstrate you how easy it is to create a loop in Pentaho Kettle. Enjoy!
  • Understanding the Pentaho Kettle Dimension Insert/Update Step Null Value Behaviour
    We will be using a very simple sample transformation to test the null value behaviour: We use the Data Grid step to provide some sample dat...
  • Metadata Driven ETL and Reporting
    Metadata Driven ETL and Reporting with Pentaho Data Integration and Report Designer Tutorial Details  Software : If PDI Kettle 4.2 GA and PR...

Categories

  • "Bulk Loader"
  • "Bulk Loading"
  • "Hadoop"
  • "Kettle"
  • "Pentaho Book"
  • "Pentaho Data Integration"
  • "Pentaho Kettle"
  • "Pentaho Report Designer MDX MQL JDBC Parameters How To"
  • "Pentaho Report Designer MDX Parameters"
  • "Pentaho Report Designer MQL Parameters"
  • "Pentaho Report Designer Parmaters"
  • "Pentaho Report Designer"
  • "Pentaho Reporting 3.5 for Java Developers"
  • "Pentaho Reporting Book"
  • "Routing"
  • "Schema Workbench"
  • "Testing"
  • "Unicode"
  • "Unit testing"
  • "UTF8"
  • Agile development
  • automated testing
  • Big Data
  • Book Review
  • C-Tools
  • CBF
  • Clustered transformation
  • Command Line Arguments
  • Community Build Framework
  • D3JS
  • Dashboarding
  • Data Integration
  • Data Warehouse
  • Database Change Management
  • Database Version Control
  • Date Dimension
  • DBFit
  • ETL
  • ETLFit
  • Federated database
  • Google Charts
  • Google Visualization API
  • Hadoop
  • HTML5
  • iReport
  • JasperReports
  • JasperSoft
  • JasperStudio
  • Kettle
  • Kimball
  • Loop
  • Master data management
  • Metadata
  • Metedata editor
  • Mondrian
  • multidimensional modeling
  • OLAP
  • Open Source
  • Parameter
  • Parameters
  • Pentaho
  • Pentaho BI Server
  • Pentaho Data Integration
  • Pentaho Data Integration 4 Cookbook
  • Pentaho Kettle
  • Pentaho Metadata Editor Tutorial
  • Pentaho Report Designer
  • PostgreSQL
  • PRD
  • Report Layout
  • REST
  • Routing
  • Saiku
  • Scheduling
  • Slowly Changing Dimension
  • Sqitch
  • SVG
  • Talend
  • Talend MDM
  • Talend Open Studio
  • Tutorial
  • Variable
  • Web service
  • Xactions

Blog Archive

  • ►  2013 (24)
    • ►  December (2)
    • ►  November (3)
    • ►  October (2)
    • ►  September (1)
    • ►  August (3)
    • ►  July (2)
    • ►  June (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (3)
    • ►  February (1)
    • ►  January (3)
  • ►  2012 (20)
    • ►  November (3)
    • ►  October (3)
    • ►  August (1)
    • ►  June (1)
    • ►  April (1)
    • ►  March (3)
    • ►  February (5)
    • ►  January (3)
  • ▼  2011 (19)
    • ►  November (3)
    • ▼  July (2)
      • Pentaho Data Integration 4 Cookbook - Book Review
      • Metadata Driven ETL and Reporting
    • ►  June (1)
    • ►  May (4)
    • ►  April (2)
    • ►  March (1)
    • ►  February (3)
    • ►  January (3)
  • ►  2010 (17)
    • ►  December (1)
    • ►  November (6)
    • ►  September (1)
    • ►  August (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (3)
    • ►  February (1)
    • ►  January (1)
  • ►  2009 (18)
    • ►  December (3)
    • ►  November (1)
    • ►  October (5)
    • ►  September (7)
    • ►  July (2)
Powered by Blogger.

About Me

Unknown
View my complete profile