Debezium Server - NexusOne

Debezium Server is a lightweight, standalone runtime for Change Data Captures (CDC). It streams database changes from relational databases, NoSQL stores, and message systems directly to a target sink. In this case, the sink is a lakehouse implemented using Apache Iceberg tables. With the Debezium Server, you have no need for a full Kafka Connect stack because the server bundles all components into a single unit. The NexusOne implementation of this server includes a specialized sink that writes CDC events into Apache Iceberg tables that make up the lakehouse. The implementation uses the Debezium Iceberg Consumer.

Environments and runtime specifications

This section describes the NexusOne environments you can deploy Debezium Server instances, and the runtime specifications of the Debezium Server, such as the software versions, supported connectors, and default configurations.

Deployment context

Within NexusOne, you can deploy Debezium Server instances in the following environments:

The NexusOne portal: When using the Ingest feature within the portal, it includes a data mirroring sub-feature for capturing streams of database changes. This sub-feature deploys a Debezium Server when it’s used.
Public cloud or on-premises: When you purchase NexusOne to deploy into your public cloud or on-premises, it comes pre-packaged with a Debezium Server.

Current version information

Debezium Server: 3.1.1
Debezium Iceberg Extension: 0.9.0
Apache Iceberg: 1.8.1

Default configurations

By default, within NexusOne, the Debezium Server writes CDC events to an Iceberg-compliant lakehouse composed of a Hive metastore and S3-compatible object storage. This configuration provides the following:

ACID transaction guarantees through the Iceberg table format
Schema evolution support for handling source schema changes
Time travel capabilities for historical data access
Automatic metadata management via Hive metastore
Scalable storage on S3-compatible infrastructure

Platform-managed default settings

NexusOne manages the following default configurations for the Debezium Server and Iceberg sink:

Snapshot mode: initial
The initial mode performs a full table snapshot followed by streaming changes
Batch processing: Optimized batch sizes for lakehouse ingestion
Connection pooling: Pre-configured for typical workloads
Error handling: Automatic retry mechanisms with exponential backoff

Supported source connectors

NexusOne’s Debezium Server deployment supports the following database connectors:

PostgreSQL: Captures row-level changes using logical replication
SQL Server: Streams changes via SQL Server CDC or Change Tracking
MySQL: Monitors binary logs for data modification events
Oracle: Tracks changes through LogMiner or Oracle GoldenGate

Each connector provides full support for INSERT, UPDATE, and DELETE operations with before or after state capture, where applicable.

Source database configuration

Debezium Server relies on a source database to produce change events in a format it can consume. Proper configuration of each database ensures that each insert, update, and delete is actually captured reliably and delivered to the lakehouse. Prerequisites and setup procedures vary by database vendor, and they include enabling CDC features, configuring replication slots, and setting appropriate permissions.

Implementation requirements

To ensure that Debezium Server can capture changes reliably, each database vendor requires specific configuration and user permissions. The following outlines the required settings for both aspects.

Database configuration: Enable CDC mechanisms specific to your database platform. Use any of the following:
- PostgreSQL: Requires logical replication configuration
- SQL Server: Needs CDC or Change Tracking enabled
- MySQL: Requires binary logging with ROW format
- Oracle: Requires supplemental logging configuration
User permissions: Grant appropriate privileges for CDC operations. Use the following:
- Replication permissions for reading change streams
- Metadata access for schema discovery
- Connection privileges for establishing monitoring sessions

The following sections provide detailed, step-by-step configuration instructions for each database vendor to implement the previously listed requirements.

PostgreSQL configuration

PostgreSQL uses logical replication to expose change events.

Set wal_level to logical in the postgresql.conf file.
Create a replication slot for Debezium to consume changes.
Install the pgoutput logical decoding plugin. The plugin is available by default in PostgreSQL 10+.

Grant the following replication permissions to the Debezium user:

CREATE ROLE debezium_user WITH REPLICATION LOGIN PASSWORD 'password';
GRANT SELECT ON ALL TABLES IN SCHEMA schema_name TO debezium_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA schema_name GRANT SELECT ON TABLES TO debezium_user;

SQL server configuration

SQL Server requires that you enable CDC at both the database and table levels.

Enable CDC at the database level:

USE DatabaseName;
EXEC sys.sp_cdc_enable_db;

Enable CDC for specific tables:

EXEC sys.sp_cdc_enable_table
  @source_schema = N'dbo',
  @source_name = N'TableName',
  @role_name = NULL;

Grant appropriate permissions to the Debezium user:

CREATE LOGIN debezium_user WITH PASSWORD = 'password';
CREATE USER debezium_user FOR LOGIN debezium_user;
EXEC sp_addrolemember 'db_owner', 'debezium_user';

MySQL configuration

MySQL uses binary logging to track changes.

Enable binary logging in my.cnf:

server-id = 1
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
expire_logs_days = 10

Grant replication permissions:

CREATE USER 'debezium_user'@'%' IDENTIFIED BY 'password';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT 
ON *.* TO 'debezium_user'@'%';
FLUSH PRIVILEGES;

Oracle configuration

Oracle requires supplemental logging and appropriate permissions.

Enable database-level supplemental logging:

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Enable table-level supplemental logging for specific tables:

ALTER TABLE schema.table ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Grant required permissions:

GRANT CREATE SESSION TO debezium_user;
GRANT SELECT ON V_$DATABASE TO debezium_user;
GRANT SELECT_CATALOG_ROLE TO debezium_user;
GRANT EXECUTE_CATALOG_ROLE TO debezium_user;
GRANT SELECT ANY TRANSACTION TO debezium_user;
GRANT LOGMINING TO debezium_user;

For more comprehensive setup instructions, troubleshooting guidance, and platform-specific considerations, refer to the Debezium connector documentation.

Deployment in the NexusOne portal

As previously described, you can deploy and configure Debezium Server instances by using the NexusOne Ingest feature within the portal, which includes a data mirroring sub-feature with customizable connector settings, transformation rules, and sink destinations. The following image shows the user interface of the data mirroring sub-feature in the NexusOne portal.

Data mirroring in the NexusOne portal

You can do the following when interacting with this feature:

Configure source connector settings:
- Select a database type such as PostgreSQL, SQL Server, MySQL, or Oracle.
- Provide connection details such as the hostname, port, or database name.
- Configure authentication credentials.
- Specify schema and table filters.
Configure sink destination:
- Use the default pre-populated Iceberg lakehouse configuration.
- Customize target schema and table names.
- Set up a partition strategy for large tables.
Configure optional settings:
- Select a snapshot mode for initial data load.
- Set a heartbeat interval for connection monitoring.
- Set a schema evolution handling.
- Set an error handling and retry policy.
Review the configuration summary and deploy.
Monitor deployment status and verify a successful connection.

Best practices for CDC pipelines

These are guidelines for building reliable, performant CDC pipelines with appropriate table filtering, snapshot strategies, schema management, and monitoring.

Table filtering: This is a recommended approach used to minimize overhead and improve performance by capturing only the tables you need. Ensure you apply the following best practices:
- Use schema and table inclusion patterns during portal configuration.
- Exclude audit, temporary, or staging tables. For example: table.include.list=schema1.orders,schema1.customers.
- Avoid overly broad wildcards that capture unnecessary tables.
- Review and refine table filters as your data model evolves.
Source database optimization: Optimize your source database configuration for CDC performance by applying the following best practices:
- Connection management: Ensure adequate connection pool sizes on the database side.
- Transaction log retention: Configure appropriate retention periods for CDC logs. Write-Ahead Logging (WAL) for PostgreSQL, and binary logs for MySQL.
- Database performance: Monitor source database load during initial snapshot and adjust maintenance windows accordingly.
- Network bandwidth: Ensure sufficient bandwidth between the source database and NexusOne environment.
Schema change management: Though NexusOne manages schema evolution, you can prepare for schema changes by applying the following best practices:
- Test schema changes: Always test schema modifications in a non-production environment first.
- Coordinate changes: Plan schema changes with awareness of the current allow-field-addition=false default
- Monitor for incompatible changes: Be aware that certain data type changes may require pipeline reconfiguration. For example, from a STRING to an INTEGER.
Initial snapshot considerations: NexusOne uses the initial snapshot mode by default. To ensure smooth initial loads:
- Schedule during off-peak hours: Run initial deployments when the source database load is low.
- Large table handling: For very large tables, such as a multi-terabyte, discuss incremental snapshot options with the NexusOne support team.
- Monitor progress: Use the NexusOne portal to track snapshot completion status.
- Network stability: Ensure stable network connectivity during the initial snapshot phase.

Understanding the CDC event structure

The Debezium Server produces structured change events that contain the before and after states of data, along with metadata about change operations.

Event structure components

Each Debezium CDC event contains components that describe the change and provide context for downstream consumers. The following components include:

Envelope: Wraps the CDC event in a standard structure. The structure comprises the following:
- before: Row state before the change, for INSERT operations, this field is null
- after: Row state after the change, for DELETE operations, this field is null
- op: Operation type: c indicates create, u indicates update, d indicates delete, and r indicates a read operation for snapshot events.
- ts_ms: Timestamp of the change in milliseconds
- source: Metadata about the source system and position
Source metadata: Provides traceability for each event. It includes the following:
- Database name and schema
- Table name
- Transaction ID or Log Sequence Number (LSN)
- Timestamp from source system
- Connector name and version
Schema information: Describes the structure of the before or after values. It includes the following:
- Field names and data types
- Optional/required indicators
- Default values were applicable

Iceberg sink transformation

When streaming events into Iceberg tables in a lakehouse, the event undergoes a transformation that converts raw events into a structured table format by:

Using the op field to determine the type of operation, such as an INSERT, UPDATE, or DELETE
Flattening the before and after values into columns
Optionally including metadata columns for audit and lineage tracking
Storing Debezium coordinates for idempotency
Extracting partition keys for efficient query performance

Troubleshoot common issues

This section covers common problems encountered when running the Debezium Server and their resolutions.

Source database connection issues

Problem: The connector fails to connect to the source database.
User actions:
- Verify database hostname and port are correct in your portal configuration.
- Confirm that the source database credentials have appropriate permissions.
- Ensure the database is properly configured for CDC:
  - PostgreSQL: wal_level=logical, replication slot created
  - SQL Server: CDC enabled at database and table levels
  - MySQL: Binary logging enabled with ROW format
  - Oracle: Supplemental logging enabled
- Review connector logs available through the NexusOne portal for specific connection errors.
Platform support: Contact the NexusOne support team to review network connectivity and connector status if the connection issues persist after verifying source configuration.

Source database permission issues

Problem: The connector connects but fails to read change events or access the source tables.
User actions:
- Review the database-specific permission requirements in the source setup section
- Verify the Debezium user has:
  - Replication permissions for PostgreSQL and MySQL
  - CDC read permissions for SQL Server
  - LogMiner access for Oracle
  - SELECT permissions on captured tables
- Check that privileges aren’t revoked after initial setup
- Test permissions by manually querying CDC structures with the Debezium user
Platform support: Contact the NexusOne support team if the permission grants appear correct but the issues persist.

Replication lag

Problem: CDC pipeline falls behind source database changes
User actions:
- Review and optimize table filters to capture only necessary tables.
- Monitor source database transaction log generation rate.
- Consider scheduling large batch operations during off-peak hours.
- Reduce transaction log retention if the generated volumes are very high.
Platform support: If replication lag persists with optimized filters, do the following:
- Contact the NexusOne support team to discuss resource allocation adjustments for your instance.
- For very high-volume sources, discuss scaling options or batch size optimization.

Schema evolution conflicts

Problem: Source schema changes cause pipeline failures or data inconsistencies
User actions:
- Understand that debezium.sink.iceberg.allow-field-addition defaults to false.
- Test all schema changes in a non-production environment first.
- Plan schema modifications with awareness of current schema evolution settings.
- Avoid incompatible type changes. For example, changing a string column to an integer.
Platform support: If you need to enable schema evolution:
- Contact the NexusOne support team to request the allow-field-addition=true configuration and provide guidance on managing schema changes.
- Discuss your specific schema evolution requirements and use case.
Workarounds: For urgent schema changes without schema evolution enabled:
- Add new columns. You may not be able to capture new fields.
- Avoid renaming or dropping columns while an active CDC is in progress.
- Consider redeploying the connector after significant schema changes.

Initial snapshot issues

Problem: The initial snapshot times out, fails to complete, or impacts the source database’s performance.
User actions:
- Schedule initial deployment during database off-peak hours
- Monitor source database load during snapshot process
- Ensure adequate connection pool capacity on the source database
- Verify network stability between source and NexusOne
Platform support: For large tables or persistent snapshot issues, do the following:
- Contact NexusOne support to discuss incremental snapshot options or other snapshot-related configurations
- For multi-terabyte tables, discuss alternative snapshot strategies or backfill approaches
- Platform monitoring can identify if timeouts are occurring at the connector level

The default initial snapshot mode works well for most use cases. Modifications to snapshot behavior require platform configuration changes.

Data inconsistency or missing changes

Problem: You made some changes, but they appear to be missing in the Iceberg lakehouse.
User actions:
- Verify table filters include all intended tables.
- Check that you enabled the source database CDC for all captured tables.
- Ensure the source database commits transactions. Uncommitted changes aren’t captured.
- Review the source database CDC logs for any gaps or errors.
Platform support: If the source database configuration is correct, but data inconsistencies persist, do the following:
- Contact the NexusOne support team to review connector offsets and checkpointing. They can also verify Iceberg sink processing and investigate potential issues in the CDC pipeline.
- Provide specific examples of missing data, including timestamps and table names.

Additional resources

For more details, refer to the official Debezium documentation and Apache Iceberg documentation. The Debezium community also provides an extensive guide on source connector configurations.
If you are using the NexusOne portal and want to learn how to use the Debezium Server, refer to the Ingest page.

​Environments and runtime specifications

​Deployment context

​Current version information

​Default configurations

​Platform-managed default settings

​Supported source connectors

​Source database configuration

​Implementation requirements

​PostgreSQL configuration

​SQL server configuration

​MySQL configuration

​Oracle configuration

​Deployment in the NexusOne portal

​Best practices for CDC pipelines

​Understanding the CDC event structure

​Event structure components

​Iceberg sink transformation

​Troubleshoot common issues

​Source database connection issues

​Source database permission issues

​Replication lag

​Schema evolution conflicts

​Initial snapshot issues

​Data inconsistency or missing changes

​Additional resources

Environments and runtime specifications

Deployment context

Current version information

Default configurations

Platform-managed default settings

Supported source connectors

Source database configuration

Implementation requirements

PostgreSQL configuration

SQL server configuration

MySQL configuration

Oracle configuration

Deployment in the NexusOne portal

Best practices for CDC pipelines

Understanding the CDC event structure

Event structure components

Iceberg sink transformation

Troubleshoot common issues

Source database connection issues

Source database permission issues

Replication lag

Schema evolution conflicts

Initial snapshot issues

Data inconsistency or missing changes

Additional resources