If you don’t want the details behind Nifi’s clustering, you can skip ahead to running a cluster.
Clustering Apache nifi has multiple benefits that come along with the complications it introduces, the main benefit being more throughput and data processing power. If you’ve run a single machine instance of nifi before, you know that having data intense processors running can take a hit on all of your resources depending on the type of tasks you are doing. We’ve maxed CPU with CSV/JSON parsing, IO with creating hundreds of thousands of records from gigabyte+ file ingestions, and RAM with large XML tree traversal that never seems avoidable. There are other ways to try and solve some of these resource issues (RAID, spanning repositories across disks, flow optimizations etc.), but one of the quickest ones is scale out horizontally, create a nifi cluster! With Nifi’s clustering, these tasks become more managable and you won’t look at your system stats wondering if everything will be ok… and it will, because we are here to help!
Updates as of January 29th, 2019 - Nifi v1.8.0
Nifi’s official docker image has come quite a ways since the first release in 1.2.0. There have been some quality of life changes as well as fixes to things missed on the first run through. We are updating this post for most of those new additions and have a full post about clustering in docker. The readme/quickstart on Nifi’s docker hub page has been fleshed out and contains quite a bit of documentation, so definitely check there for more features.
To start with, there is now a latest tag, so a quick
docker pull apache/nifinow works! Quite a few of the nifi.properties have been exposed via environment variables and you can now run a secured instance, via certificates or LDAP, through the official image. This includes clustering variables such as address, protocol, max threads, zookeeper address, etc. The docker image also now contains the nifi-toolkit, which allows you to manage a cluster and run various commands from the cli. To use the toolikt, after the container is running, you can just run a docker exec command,
docker exec -ti nifi nifi-toolkit-current/bin/cli.sh nifi current-user, to get the current user, which should print out
anonymous. We’ll go into a deeper dive about using the official docker image in our clustering post.
Monitoring your nifi deployment is almost as important as setting it all up. Knowing that your flow is performing as expected and that the data isn’t throwing errors, blocking queues, or causing other problems requires some ongoing tasks, be it daily checking nifi, or some sort of monitoring. Nifi has some built in reporting tasks which can be used for this, and one of them is a Datadog Reporting Task. This reports 25 different metrics about nifi and the JVM. Ranging from queued bytes to JVM garbage collection, plus more for each processor you have running in your flow, the overall stats are pretty useful. This post will walk through setting this up form start to finish.
Reporting tasks are background tasks commonly responsible for communicating system’s state, flow metrics, and other information commonly used for monitoring and alerting. The current list of reporting tasks cover a great deal of information and generally available as site to site reporting tasks. This generally means sending the data back into a flow to filter and transform before sending to its destination. While this works incredibly well, if the flow would require a custom processor, in its place you could develop a custom reporting task. The full source is hosted on Github or Gitlab.
Apache Nifi released version 1.4.0 in October 2017. I know we are late to talk about the release highlights, but better late than never. This will just be a quick run down for those interested or curious if they should upgrade their cluster from a previous release to the new and shiny! As usual, the release notes are on nifi’s confluence page, the issues resolved are on their jira and you can download the newest, and previous releases from the nifi site.
There were 204 issues closed in total.
- Bug Fixes: 97
- Improvements]: 87
- New Features: 14
There were a few other tasks and subtasks that were resolved that make up the other 6 items.
The other day I was using an older version of Apache Nifi than the most currently released and realized that the only way I could access the processor docks was via my locally running verion. You can open up the docs for the version you are on via the right click menu on a processor and even pop it out to a new tab or window, and while this is nice, I wanted a quick rundown or searchable area outside of the environment that I could use to lookup processors in. That way I wasn’t tied down to my instance - say I wanted to look at it on the go on my phone or whatever the use case may be. This spawned us to start doing something we had been talking about for quite some time - versioning our apache nifi processors page. We now have a nice dropdown on all processors pages that allows you to select from the most recent 0.x line as of writing this, 0.7.4, and the 1.x line from 1.2 on - 1.2, 1.3 and 1.4. Example dropdown:Includes all processors through release
We think this is a pretty handy feature and hope that you all find it useful too! If you have any other thoughts or feedback about useful tools or posts, feel free to let us know at email@example.com.
Apache Nifi’s latest release is 1.2.0. I know we skipped the whole 1.0.0 release highglights, so a quick breakdown of overall changes follows for the jump from 0.* to 1.*, since there were quite a few major breaking changes. For 1.2.0, there were 381 issues closed or resolved, with a break down of issues to follow. Apache Nifi 1.2.0 can be downloaded from Apache here, full release notes can be found on their jira, and Highlights of the release on their Confluence page.
There were a few other tasks and subtasks that were resolved that make up the other 16 items, but since they were pretty basic - updating Jetty and jQuery, we’ll probably skip going over most of them (except for the official Docker image!)
I came across a question on the nifi dev mailing list and thought it would make a good example solving a real world problem, building off of our previous ExecuteScript post. As a side note, since Elasticsearch uses json for their documents and the PutElasticsearch processors expect the flow file to be json, you could use the EvaluateJsonPath Processor to put the field you want as an attribute.
We’ve decided to start a new series for getting familiar with the different Apache Nifi services and processors, and I’m calling it “Getting Familiar”. We’ll try to post fairly often about different processors, using the controller services and configuring certain things in Nifi.The first one in the series will be about the ExecuteScript processor.
Apache Nifi’s newest release is out, 0.7.0. You can grab the binaries from their site as always. So lets dive in and see what to look for in this release!
As always, a bunch of bug fixes, but this time there are quite a few improvements.
subscribe via RSS