Apache NiFi - NexusOne

Apache NiFi is an open source data ingestion and flow management system designed to automate the transfer of data between systems. It provides a visual, drag-and-drop interface for building pipelines that can ingest, route, transform, and deliver data across a wide variety of technologies. NiFi can handle structured, semi-structured, or unstructured data. It also guarantees reliability, traceability, and provides real-time, visible control of data flow. This combination makes it suitable for fast prototyping and production pipelines.

Benefits of using NiFi

Organizations use NiFi because it provides the following:

Ease of use: Visually built data flows without the need to write custom code
Broad connectivity: Supports hundreds of processors that interact with files, databases, APIs, cloud services, and messaging systems
Flow management and governance: Includes back-pressure control, prioritization, data lineage, and detailed monitoring
Scalability: Designed to run on a single node or as a distributed cluster
Flexibility: Handles real-time streaming, batch processing, and everything in between

These characteristics make NiFi a general-purpose solution for moving and manipulating data across diverse environments.

Typical use cases

NiFi is commonly used to solve a wide range of data integration and movement challenges, such as:

Centralized data ingestion: Collects data from sources such as APIs, filesystems, IoT devices, cloud storage, messaging systems, and enterprise apps.
Distributed data routing: Routes data to different destinations based on content, schema or attributes that enable conditional workflows.
In-transit data transformation: Applies lightweight transformations such as format conversion, enrichment, filtering, attribute updates, and JSON/XML manipulation as data flows.
Streaming and event data processing: Publishes and subscribes to Kafka, Message Queuing Telemetry Transport (MQTT), Advanced Message Queuing Protocol (AMQP), and other messaging platforms to support real-time or near-real-time pipelines.
Batch to stream bridging: Integrates scheduled data ingestion jobs with continuous data streams for hybrid workflows.
System-to-system coordination: Acts as a bridge between different apps, storage systems, operational services, or analytic platforms.

Key concepts

Apache NiFi introduces several key concepts that form the foundation of how data moves through a flow. Understanding these concepts helps you design, manage, and troubleshoot data pipelines effectively.

FlowFile

A FlowFile is the basic unit of data in NiFi. It consists of two parts:

Content: The actual data, such as a JSON, CSV, or binary file.
Attributes: Key-value metadata associated with the content. It’s used to make routing and transformation decisions.

FlowFiles move through processors and queues as they progress through a dataflow.

Processor

A processor is a building block that performs specific operations on data, such as:

Reading from a source
Transforming content
Routing data
Writing to a destination
Communicating with external systems

NiFi includes many processors, each designed for a particular function or protocol.

Connection

A connection links processors together and acts as a queue for FlowFiles. Connections enable:

Buffering when downstream components slow down
Back-pressure handling
Prioritization and load distribution

They also visually represent the sequence of steps in a flow.

Process group

A process group is a container for organizing processors and sub-flows that helps you do the following:

Modularize complex flows
Reuse components
Improve readability and maintainability

Process groups can contain other process groups, allowing hierarchical flow design.

Controller service

A controller service provides shared, reusable configurations used by multiple processors. Typical examples include:

Database connection pools
SSL/TLS context configurations
Record readers and writers
Schema registries

Using controller services ensures consistency and reduces duplicated configuration.

Provenance

Data provenance provides full traceability of how data flows through a system. NiFi automatically tracks the following:

Where a FlowFile originated
What transformations occurred
Route of the FlowFile
Time and destination of the FlowFile

This information is essential for debugging, auditing, and compliance.`

Exploring the NiFi UI

Apache NiFi provides a web-based user interface for designing and managing dataflows. Accessing NiFi typically involves connecting to an instance running locally, on-premises, or in a cloud environment. NiFi is available at the following designated URL:

https://nifi.<client>.nx1cloud.com/nifi

When you purchase NexusOne, you receive a client name. Replace client with your assigned client name. The following sections describe several sidebars on the Kyuubi UI.

Authentication

While NiFi supports multiple authentication mechanisms, NexusOne’s NiFi integration uses OpenID Connect (OIDC) for all user authentication. This means you must log in through the configured OIDC provider before accessing the NiFi UI.

User interface overview

When you log into NiFi, you should see the NiFi homepage. This is the primary workspace for building and managing flows. The interface includes the following core options:

Component toolbar: Options for adding components, starting/stopping flows, and accessing configuration views
Component: A palette describing a component
Operator palette: Manages permissions, process groups, and more
Global menu: Provides access to system diagnostics, controller services, templates, and provenance
Status bar: Visual signals that show running components, errors, warnings, or back-pressure conditions

Apache NiFi homepage layout

NiFi hands-on examples

There are many things you can do in NiFi. This section describes some of these using examples.

Managing data flows

This section outlines the basic workflow for creating, configuring, and managing data flows.

Create a new flow

To create a new flow, you have to add components to the canvas by doing the following:

Open the NiFi UI and navigate to the canvas.
Use the toolbar to select the component you want to add. This can be any of the following:
- Processor
- Process group
- Input/Output port
- Funnel
Drag the component onto the canvas and position it as needed.

A sample flow

Configure processors

Each processor defines a specific action within a flow. To configure a processor do the following:

Right-click or control-click a previously added processor, and then select Configure.
Set required properties, such as the connection URLs, file paths, or format settings.
Adjust Scheduling options such as the run frequency or concurrent tasks.
Review and apply any relationships that determine FlowFile routing after processing.

NiFi can only start valid and configured processors.

Connect components

Connections define the path that data follows. To connect two or more components, do the following:

Hover over a processor’s arrow and drag it to another component to create a connection.
Configure the connection to specify the following:
- Accepted relationships
- Back-pressure thresholds
- Prioritization or load-balancing options

A sample of connected components

Connections serve as queues, buffering FlowFiles between components.

Using variables

Variables help simplify configuration and improve maintainability. Do the following when using a variable:

Define variables at the process group level.
Reference variables in processor properties using ${variable_name} syntax.

These allow the management of common values such as the directory paths, hostnames, or credentials references in one place.

Enable and use controller services

Controller services provide shared configurations used by multiple processors. It also helps avoid redundant configuration and ensure consistency. To enable and use controller services, do the following:

Open the controller services tab by right-clicking or control-clicking an empty canvas or a process group and selecting Controller Services.
Add and configure a service, such as a database connection pool, record reader, or SSL context.
Enable the service.

After completing the steps, processors, depending on the service, automatically detect and use it.

Start and stop components

NiFi allows granular control over component execution. This includes the following:

Start: Begin execution of a processor or process group.
Stop: Halt execution without removing configuration.
Enable or Disable: Control whether a component can run.
Start or stop process group: Apply actions to all components within the group.

Before starting a flow, ensure you enable all required services and verify if all components are valid.

Additional options for managing a data flow

Effective flow management helps ensure reliable operation and simplifies debugging and maintenance. Several options help manage and maintain dataflows. This includes:

Status bar: Observe the status bar for component states, errors, and back-pressure.
Data provenance: Use data provenance to trace the lifecycle of FlowFiles.
Bulletin board: Check the bulletin board for warnings and error messages.
Templates or versioned flows: Apply templates or versioned flows for reusability and consistent deployment.

Managing processors

Processors are the core building blocks of NiFi dataflows. Each processor performs a specific operation on FlowFiles, such as ingesting data, transforming content, routing, or delivering data to a destination.

Processors

While NiFi provides hundreds of processors, some are frequently used across a variety of flows. This section highlights commonly used processors, grouped by their typical functions.

Data ingestion processors

These processors ingest data into NiFi from various sources. One example is the GetFile processor. To configure the GetFile data ingestion processor, do the following:

Drag a processor component onto the canvas.
Search for the GetFile data ingestion processor, and click Add.
Right-click or control-click the processor, and then select Configure.
Set the Input Directory property to the folder containing files to ingest, then click Apply.
Connect it to the next processor in your flow.
Start each processor to begin pulling files into NiFi.

Other examples of data ingestion processors include:

GetS3Object/ListS3: Retrieves objects from Amazon S3
ConsumeKafka/ConsumeKafkaRecord_2_0: Consumes messages from Kafka topics
ListenHTTP/HandleHTTP: Accepts data via HTTP requests
QueryDatabaseTable/ExecuteSQL: Reads data from relational databases

Data transformation processors

These processors modify or enrich the content or attributes of FlowFiles. One example is the UpdateAttribute processor. To configure the UpdateAttribute data transformation processor, do the following:

Drag a processor component onto the canvas.
Search for the UpdateAttribute data transformation processor, and click Add.
Right-click or control-click the processor, and select Configure.
Add a new attribute, then click Apply.
Connect it to another processor.
Start each processor to modify the FlowFile attributes.

Other examples of data transformation processors include:

ReplaceText/ReplaceTextWithMapping: Performs string replacements or regular expression transformations
JoltTransformJSON: Transforms JSON data using Jolt specifications
AttributesToJSON/AttributesToXML: Converts FlowFile attributes to structured content
ConvertRecord/ConvertAvroToJSON/ConvertJSONToCSV: Converts data between formats

Data routing processors

These processors determine the direction of FlowFiles through a flow. One example is the RouteOnAttribute processor. To configure the RouteOnAttribute data routing processor, do the following:

Drag a processor component onto the canvas.
Select the RouteOnAttribute data ingestion processor, and click Add.
Right-click or control-click the processor, and then select Configure.
Configure a routing strategy.
Connect it to another processor producing FlowFiles.
Start each processor to route FlowFiles based on attributes.

Other examples of data routing processors include:

RouteOnContent: Routes FlowFiles based on content patterns
RouteOnRelationship: Routes FlowFiles based on processing outcomes
MergeContent/MergeRecord: Combines multiple FlowFiles into one, based on content or schema

Data delivery processors

These processors send data to external systems. One example is the PutFile processor. To configure the PutFile data delivery processor, do the following:

Drag a processor component onto the canvas.
Select the PutFile data ingestion processor, and then click Add.
Right-click or control-click the processor, and then select Configure.
Configure the Directory property where FlowFiles write to.
Connect it to another processor with a custom configuration.
Start each processor to deliver FlowFiles to the target directory.

Other examples of data delivery processors include:

PutS3Object: Uploads FlowFiles to Amazon S3
PutDatabaseRecord/PutSQL: Writes data into relational databases
PublishKafka/PublishKafkaRecord_2_0: Sends messages to Kafka topics
PutHDFS/PutHBase: Writes data to Hadoop or HBase

Utility and monitoring processors

These processors provide operational or auxiliary functions. One example is the LogAttribute processor. To configure the LogAttribute utility and monitoring processor, do the following:

Drag a processor component onto the canvas.
Select the LogAttribute utility and monitoring processor, and then click Add.
Right-click or control-click the processor, and then select Configure.
Add a custom configuration.
Connect it to another processor with a custom configuration.
Start each processor to log FlowFile attributes for monitoring and debugging.

Other examples of utility and monitoring processors include:

GenerateFlowFile: Creates test data
UpdateRecord: Updates record fields using schema-aware operations
ValidateRecord: Validates data against a schema or rules
ControlRate: Throttles the FlowFile flow rate

Controller services and shared resources

A controller service is a NiFi component that exposes a common capability or configuration to other processors. Examples include:

Database connection pools
SSL/TLS context services for secure connections
Record readers and writers for structured data such as CSV, JSON, or Avro
Schema registries for data validation and transformation

Controller services

Processors reference enabled controller services instead of duplicating configuration settings individually.

Add and configure a controller service

You can add and configure a controller service by doing the following:

On an empty canvas or process group, right-click, or control-click and then select Controller Services.
Click the plus sign at the left to add a controller service.
Search for a service type, select it, and click Add.
Click the three dots at the right, and then select Edit to configure the service type properties.
Click Apply to save your edit.

Manage a controller service state

Controller services have an operational state that controls whether processors can use them. You change this state by doing the following:

Add a controller service
On an empty canvas or process group, right-click, or control-click, and then select Enable or Disable.

Any processor that references the service automatically detects and uses it.

Monitoring, logging and troubleshooting

Effective monitoring and troubleshooting are essential to ensure NiFi dataflows operate reliably. NiFi provides a range of tools and features to observe flow activity, diagnose issues, and maintain operational visibility.

Monitor the flow

You can view a processor and its connection status by taking the following steps:

Observe the status bar on processors and connections.
Identify running, stopped, or invalid components.
Identify back-pressure indicators on connections.

Connections act as queues between processors. You can inspect a queue by doing the following:

Locate a connection between two components.
View the queued FlowFile count and data size displayed on the connection.
Right-click or control-click the connection to view additional queue details.

NiFi generates bulletins for processors reporting warnings or errors. You can view bulletins by doing the following:

At the top right corner of the NiFi page, click the three lines and select Bulletin Board.
Review the generated bulletins.

View data provenance

NiFi tracks the full lifecycle of each FlowFile so it can provide visibility into how data moves through a flow. You can view data provenance by doing the following:

At the top right corner of the NiFi page, click the three lines and select Data Provenance.
Select a specific event to view its details or see its lineage.

View logs

NiFi logs events and system activity to files for operational analysis. You can view these logs by doing the following:

Access the NiFi node.
Navigate to the NiFi logs directory.
Open log files to inspect runtime activity.
Review entries categorized by log level, such as DEBUG, INFO, WARN, or ERROR.

Troubleshooting tips

Use the following actions to inspect and resolve issues in a NiFi flow:

Check processor status and bulletins: Quickly identify processors that are failing or misconfigured.
Inspect logs: Look for errors, exceptions, or connectivity issues in the NiFi app logs.
Monitor queue sizes: Large queues may indicate bottlenecks or downstream performance issues.
Review data provenance: Trace the path of individual FlowFiles to understand where problems occur.
Test with small data samples: Use test FlowFiles to isolate and debug flow logic without processing large volumes of data.
Validate configuration: Ensure processors, controller services, and connections are properly configured and enabled.

NiFi best practices

Following best practices ensures that NiFi dataflows are reliable, maintainable, and efficient. This section summarizes key guidelines for building and managing flows effectively.

Flow design

The following best practices focus on structuring flows so they’re easy to maintain as complexity grows:

Minimize redundant processing: Avoid duplicating tasks by routing data efficiently and using shared resources.
Modularize with process groups: Break complex flows into smaller, logical groups to improve readability and maintainability.
Plan for scalability: Consider potential growth in data volume and design flows that can scale horizontally if needed.
Use descriptive names: Name processors, connections, and process groups clearly to reflect their purpose.

Data handling

The following best practices address how FlowFiles move through a flow:

Avoid blocking queues: Monitor and configure queue thresholds to prevent back-pressure from slowing flows unnecessarily.
Manage large files carefully: Use streaming or chunked processing for very large FlowFiles to avoid memory issues.
Use attributes wisely: Store metadata in FlowFile attributes to simplify routing and transformation.
Validate data: Use schema validation or record processors to catch errors early.

Processor configuration

The following best practices focus on configuring processors and shared services to ensure predictable behavior and efficient resource usage:

Check required properties: Ensure all processor properties are properly configured before starting.
Enable only what’s needed: Disable unused processors or services to reduce resource consumption.
Monitor dependencies: Before disabling or modifying a service, check which processors rely on it to avoid flow disruptions.
Test incrementally: Build and test flows in small increments to catch issues early.
Use controller services: Centralize shared connections and configuration for consistency and easier maintenance.

Monitoring and troubleshooting

The following best practices help detect, diagnose, and resolve operational issues during flow execution:

Document common issues: Maintain a knowledge base of known problems and solutions for your team.
Keep an eye on queue sizes: Large queues indicate potential bottlenecks or resource constraints.
Monitor bulletins and logs: Regularly review warnings, errors, and system messages to catch issues early.
Use data provenance: Trace FlowFiles to understand data paths and transformations.

Performance optimization

The following best practices focus on tuning scheduling, concurrency, and logging to maintain throughput under load:

Adjust scheduling and concurrency: Tune processor scheduling intervals and concurrent tasks based on data volume and system capacity.
Avoid excessive logging: Logging at the DEBUG level in production can impact performance; use it primarily for troubleshooting.
Balance flows: Distribute processing tasks evenly across processors to prevent hotspots.

Security and governance

The following best practices address access control, credential handling, and secure communication within and across NiFi flows:

Manage access control: Use NiFi’s policies and, if applicable, OIDC authentication to control who can view or modify flows.
Protect sensitive information: Use NiFi’s sensitive property handling to secure passwords, keys, and tokens.
Scope appropriately: Use process group-level services for local, isolated configurations. Use global services for common resources.
Use secure connections: Enable TLS/SSL for HTTP and data transfers.

Additional resources

For more details about NiFi, refer to the NiFi official documentation.

​Benefits of using NiFi

​Typical use cases

​Key concepts

​FlowFile

​Processor

​Connection

​Process group

​Controller service

​Provenance

​Exploring the NiFi UI

​Authentication

​User interface overview

​NiFi hands-on examples

​Managing data flows

​Create a new flow

​Configure processors

​Connect components

​Using variables

​Enable and use controller services

​Start and stop components

​Additional options for managing a data flow

​Managing processors

​Data ingestion processors

​Data transformation processors

​Data routing processors

​Data delivery processors

​Utility and monitoring processors

​Controller services and shared resources

​Add and configure a controller service

​Manage a controller service state

​Monitoring, logging and troubleshooting

​Monitor the flow

​View data provenance

​View logs

​Troubleshooting tips

​NiFi best practices

​Flow design

​Data handling

​Processor configuration

​Monitoring and troubleshooting

​Performance optimization

​Security and governance

​Additional resources

Benefits of using NiFi

Typical use cases

Key concepts

FlowFile

Processor

Connection

Process group

Controller service

Provenance

Exploring the NiFi UI

Authentication

User interface overview

NiFi hands-on examples

Managing data flows

Create a new flow

Configure processors

Connect components

Using variables

Enable and use controller services

Start and stop components

Additional options for managing a data flow

Managing processors

Data ingestion processors

Data transformation processors

Data routing processors

Data delivery processors

Utility and monitoring processors

Controller services and shared resources

Add and configure a controller service

Manage a controller service state

Monitoring, logging and troubleshooting

Monitor the flow

View data provenance

View logs

Troubleshooting tips

NiFi best practices

Flow design

Data handling

Processor configuration

Monitoring and troubleshooting

Performance optimization

Security and governance

Additional resources