In Pentaho Data Integration 6.0, we released a great new capability to collect data lineage for PDI transformations and jobs. Data lineage is an oft-overloaded term, but for the purposes of thi...
http://funpdi.blogspot.com/2015/11/data-lineage-internals-in-pdi-60.html
If you've ever played around with Drag-n-Drop in Spoon, you probably know that you can drag a KTR, KJB, or XML file onto the canvas, and it will open that file (if a legal PDI artifact) in Spoon ...
http://funpdi.blogspot.com/2015/07/drag-and-drop-support-in-spoon.html
The Pentaho Data Integration (PDI) Marketplace is a great place to share your PDI/Kettle contributions with the community at-large. To add your plugin, you can pull down the marketplace.xml f...
http://funpdi.blogspot.com/2015/07/bring-your-own-marketplace.html
The PDI Marketplace is a great way to extend the capabilities of your PDI installation, using excellent contributions from the community, and some less-excellent ones from yours truly ;) At pre...
http://funpdi.blogspot.com/2015/04/command-line-utility-for-pdi.html
In a previous blog post , I announced my SuperScript step for PDI, which adds and enhances some capabilities of the built-in Script step. One notable addition is the ability to use AppleScript ...
http://funpdi.blogspot.com/2015/03/using-applescript-with-pdi-superscript.html
For my latest fun side project, I looked at the integration of Pentaho Data Integration (PDI) and Apache Pig . From the website: "Apache Pig is a platform for analyzing large data sets that con...
http://funpdi.blogspot.com/2015/02/apache-pig-udf-call-pdi-transformation.html
As readers of my blog know, I'm a huge fan of scripting languages on the JVM (especially Groovy), and of course I'm a huge fan of Pentaho Data Integration :) While using the (experimental) Scri...
http://funpdi.blogspot.com/2014/12/superscript-pdi-plugin.html
I've spent quite a bit of time looking at Pentaho Data Integration (aka Kettle) and trying to make it do things with external technologies and idioms, anywhere from Groovy , Drill , memcached ,...
http://funpdi.blogspot.com/2014/12/how-sorted-or-sordid-is-your-data.html
While working with Apache Drill and PDI (see previous posts), I found myself needing to read and write values to and from Drill's ZooKeeper instance. Since ZooKeeper can be (and is) used for ma...
http://funpdi.blogspot.com/2014/11/zookeeper-input-and-output-steps-in-pdi.html
PDI Extension points are an awesome feature added to PDI 5.0 (and updated throughout 5.x) that allow you to hook into the operational aspects of your ETL processes to provide finer-grained contr...
http://funpdi.blogspot.com/2014/10/scripting-extension-points-in-pdi.html
I've heard a number of comments regarding JSON and PDI, most of them having to do with difficulties parsing nested documents, using JSONPath, etc. Personally, I've had a JSON doc I'd like to fe...
http://funpdi.blogspot.com/2014/10/flatten-json-to-key-value-pairs-in-pdi.html
Here's a quick Groovy script to recursively list Zookeeper nodes (and optionally, data), also on Gist here . What does this have to do with PDI, you may ask? Stay tuned ;) @Grab('org.apache....
http://funpdi.blogspot.com/2014/10/list-zookeeper-nodes-and-data-with.html
Ok, so this blog is called "Fun with Pentaho Data Integration", but I recently fielded a question about using scriptable data sources in Pentaho Report Designer (PRD), and rather than start a wh...
http://funpdi.blogspot.com/2014/09/groovy-datasources-with-pentaho-report.html
One of the non-Pentaho side projects I've become interested in is Apache Drill , I like all the different aspects of it and hope to contribute in some meaningful way shortly :) As a first step, w...
http://funpdi.blogspot.com/2014/09/using-apache-drill-with-pdi.html
I've written quite a few plugins for Pentaho Data Integration, some are "finished" in terms of being in the PDI Marketplace, and some are still works in progress, Proofs of Concept, etc. The us...
http://funpdi.blogspot.com/2014/09/pdi-plugins-and-dependency-hell.html
I've been trying to figure out ways to make it dead-simple to create new plugins for Kettle / Pentaho Data Integration, and as a result I've got some GitHub projects using various approaches: ...
As my blog followers know, I've been trying to get a Groovy Console into Pentaho Data Integration's (PDI's) Spoon UI for quite a while now. I haven't put it into the Marketplace as we're wrestl...
http://funpdi.blogspot.com/2014/03/gradle-spoon-console-plugin.html
I've definitely been neglecting this blog :) but I've recently put a couple of plugins into the PDI marketplace to read and write from memcached instances. I hope this leads to more key/value sto...
http://funpdi.blogspot.com/2014/03/pdi-memcached-plugins.html
Ok so this post is not PDI related (yet, stay tuned :) but in my search for easy memcached client UIs I came up fairly shorthanded unless I wanted to buy something, write a Java client, or instal...
http://funpdi.blogspot.com/2014/03/groovy-memcached-client.html
I recently stumbled across the Apache Tika project, which is a content analysis toolkit that offers great capabilities such as extracting metadata from various documents. Depending on the docu...
http://funpdi.blogspot.com/2013/03/content-metadata-udjc-step-using-apache.html
While working with a few new Hadoop-based technologies (blog posts to come later), the need arose to get Pentaho Data Integration (PDI) and its Big Data plugin (source available on GitHub ) worki...
http://funpdi.blogspot.com/2013/03/pentaho-data-integration-44-and-hadoop.html
In case you haven't heard, the Kettle project in Subversion has been restructured to be cleaner and to use Apache Ivy for dependency management. This has been a long time coming, and PDI/Kettle...
http://funpdi.blogspot.com/2013/01/new-pdikettle-project-structure.html
I recently gave a presentation of my GroovyConsoleSpoonPlugin (see earlier posts) to the Pentaho crew, and I got a lot of great feedback on it. Specifically, Pentaho Architect Nick Baker suggeste...
http://funpdi.blogspot.com/2012/12/groovyconsolespoonplugin-with-jsr-223.html
The "Verify Transformation" capability of Pentaho Data Integration (aka Kettle) is very handy for spotting issues with your transformations before running them. As a sanity check or as an audit...
http://funpdi.blogspot.com/2012/11/udjc-to-verify-transformations.html
I've been working on the Groovy Console plugin for Spoon, and I seem to have been able to sidestep the PermGen issues (at least the ones I was having during my last post). Also I added some mor...
http://funpdi.blogspot.com/2012/10/groovy-console-spoon-plugin-update.html