Apache Nifi Release 1.2.0 Highlights
Apache Nifi’s latest release is 1.2.0. I know we skipped the whole 1.0.0 release highglights, so a quick breakdown of overall changes follows for the jump from 0.* to 1.*, since there were quite a few major breaking changes. For 1.2.0, there were 381 issues closed or resolved, with a break down of issues to follow. Apache Nifi 1.2.0 can be downloaded from Apache here, full release notes can be found on their jira, and Highlights of the release on their Confluence page.
- Bug Fixes: 212
- Improvements: 129
- New Features: 24
There were a few other tasks and subtasks that were resolved that make up the other 16 items, but since they were pretty basic - updating Jetty and jQuery, we’ll probably skip going over most of them (except for the official Docker image!)
Bug Fixes
So many bug fixes!! Really too many to even go over, but some big ones:
- Fixed a provenance repository corruption issue
- Fixed multiple NPE in processors (SplitText Processor, Expression language toDate(), InvokeHTTP..and more)
- Documentation updates for the Admin guide
- UI fixes - styling, UX issues, timestamps, and a few others.
This is really just good news stuff. I know I barely named any, but again way too many to go through them! If you are looking for a specific fix, I’m sure you were watching the ticket, but if not the full list of issues for this release can be found here. I might honestly skip bug fixes for future releases except for any major issues.
Improvements
So this minor release is full of improvements too. I’ll try to list off the big ones, but again, go take a look at the full list yourself since there are too many to capture here in a highlights post.
- You can now have multiple versions of the same component!
- Finally, the label for the relationship names within a processor trigger the check box!
- You can now align processors/components more easily! Select multiple components - right click on one and choose “Align Vertically” or “Align Horizontally”
- The Expression Language now supports
ifElse
- Support for the Expression Language has been added to a more properties for quite a few processors
- Clojure is now supported for the ExecuteScript Processor
- A new reporting task allows you to send bulletins over Amazon S2S
- The Provenance repository has been refactored: Now supports Encryption and is much faster!
- ReportingTasks now support scripting!!
- Support has been added for Microsoft’s Azure Blob and Table Storage
- Grok Processor to parse logs
- You can now specify headers for GetHTTP
I might have skipped over a few improvements, but that is because I want to dive into them here!! So first off, there are a few new processors that I’ll get into in the Features section below. Outside of that, the number and quality of the improvements in this release is quite large and warrants a quick upgrade on your cluster!! The provenance repository has been completely redone and now is much faster, there have been library updates, documentation updates, and Component versioning! Let that sink in…want to make a breaking change for your processor, updating it to a new lib but want the old one to still run on the current flow while testing? Sure, we can now do that with this versioning change! Have MyCustomerProcessor
version 1 and MyCustomProcessor
version 1.1 running at the same time. This is pretty nice!
Docker
There is now an official Apache Nifi Docker image on Docker hub. This should be released with 1.2.0, so until that build completes with this release, it will not be available. Some other projects have an official image and now Nifi has one too, which means it is supported and part of the Apache Nifi project, making it a little easier to rely on and get support from the nifi community.
New Features
As processors are the main component in Nifi, I tend to focus on those first here. There are 31 new processors in this release, all listed below.
New Processors:
- AttributeRollingWindow
- CaptureChangeMySQL
- CompareFuzzyHash
- ConsumeEWS
- ConsumeKafkaRecord_0_10
- ConvertExcelToCSVProcessor
- ConvertRecord
- DeleteGCSObject
- EnforceOrder
- ExtractCCDAAttributes
- ExtractGrok
- FetchAzureBlobStorage
- FetchGCSObject
- FetchHBaseRow
- FetchParquet
- FuzzyHashContent
- GetTCP
- ISPEnrichIP
- ListAzureBlobStorage
- ListenBeats
- ListGCSBucket
- Notify
- PublishKafkaRecord_0_10
- PutAzureBlobStorage
- PutDatabaseRecord
- PutGCSObject
- PutParquet
- QueryRecord
- SplitRecord
- UpdateCounter
- Wait
The GC processors are for the Google Cloud platform, and as you can see there are quite a few to deal with Google Cloud Storage (GCS). Azure Blob storage and Table storage processors have been added too. There is a new processor, and controller services, to deal with formatted data such as CSV, AVRO and JSON, that now read in and write out those formats. And there is now a SplitRecord processor, QueryRecord processor, and ConvertRecord processor that support this same convention. There is now a ConsumeEWS, Exchange Web Server, processor also to go along with the PutEmail processor. This allows you to get emails for a user from an Exchange server!
The new Notify and Wait processors introduce a new flow management tool. Wait holds flowfiles until a certain event triggers the Notify processor to put a value into the Distributed Cache and the Wait processor will then release those flowFiles. This could be useful for a number of flows and could clean up some rather hacky script processor controlled flows out there!
A new Excel format to CSV parser can read in Microsoft Excel files and convert them to CSV.
The ListenBeats processor uses the Elasticsearch beats protocol (libbeat- used by filebeat, metricbeat, etc) and writes the incoming payload to the content of the FlowFile. This replaces the ListenLumberjack processor, which has now been deprecated.
A quick description is available on our processor page. For a full description, see the apache nifi docs.
Other new features
Other new features outside of just Processors is a new content viewer for avro FlowFiles and deep links to any component in a flow. The content view change I know I’ll make use of during debugging issues with custom processors that create Avro FlowFiles. And the deep linking directly to a component is another great change that I’ve been waiting on for a little bit!
1.0.0 Changes from 0.* line
As a real quick recap, 1.0.0’s major change was the new UI and the changing in clustering. The UI was updated for HTML5 and looks a lot better, more modern and has a better overall user experience and feel to it. A comparison of the two can be seen in this screenshot.
Nifi 0.. UI | Nifi 1..+ UI |
---|---|
The change in clustering, from master-slave to zero master clustering, is also quite a change. It now allows for, I think, a better cluster environment as the master previously at times did not do that much and now all nodes can act on the flow. So if you haven’t tried a 1.* release, you will have to make some changes in your cluster, but since all nodes have the same flow, it makes the cluster more efficient. The one caveat I found is that the master node could have been the entry point into a cluster, and now that there is no master, you have to decide on a different method for an ingest node or entry point.
1.2.0 release in closing…
Overall, this is probably one of the more feature rich releases of Nifi and should greatly improve both user experience developing flows and the processing of flows! I look forward to building new and updating my current flows to take advantage of this new release!