Skip to main content
Debezium server is a lightweight, standalone runtime for Change Data Captures (CDC). It streams database changes from relational databases, NoSQL stores, and message systems directly to a target sink. In this case, the sink is a lakehouse implemented using Apache Iceberg tables. With the Debezium server, you have no need for a full Kafka Connect stack because the server bundles all components into a single unit. The NexusOne implementation of this server includes a specialized sink that writes CDC events into Apache Iceberg tables that make up the lakehouse. The implementation uses the Debezium Iceberg Consumer.

Environments and runtime specifications

This section describes the NexusOne environments you can deploy Debezium server instances, and the runtime specifications of the Debezium server, such as the software versions, supported connectors, and default configurations.

Deployment context

Within NexusOne, you can deploy Debezium server instances in the following environments:
  • The NexusOne portal: When using the Ingest feature within the portal, it includes a data mirroring sub-feature for capturing streams of database changes. This sub-feature deploys a Debezium server when it’s used.
  • Public cloud or on-premises: When you purchase NexusOne to deploy into your public cloud or on-premises, it comes pre-packaged with a Debezium server.

Current version information

  • Debezium server: 3.1.1
  • Debezium Iceberg Extension: 0.9.0
  • Apache Iceberg: 1.8.1

Default configurations

By default, within NexusOne, the Debezium server writes CDC events to an Iceberg-compliant lakehouse composed of a Hive metastore and S3-compatible object storage. This configuration provides the following:
  • ACID transaction guarantees through the Iceberg table format
  • Schema evolution support for handling source schema changes
  • Time travel capabilities for historical data access
  • Automatic metadata management via Hive metastore
  • Scalable storage on S3-compatible infrastructure

Platform-managed default settings

NexusOne manages the following default configurations for the Debezium Server and Iceberg sink:
  • Snapshot mode: initial
    The initial mode performs a full table snapshot followed by streaming changes
  • Batch processing: Optimized batch sizes for lakehouse ingestion
  • Connection pooling: Pre-configured for typical workloads
  • Error handling: Automatic retry mechanisms with exponential backoff

Supported source connectors

NexusOne’s Debezium server deployment supports the following database connectors:
  • PostgreSQL: Captures row-level changes using logical replication
  • SQL Server: Streams changes via SQL Server CDC or Change Tracking
  • MySQL: Monitors binary logs for data modification events
  • Oracle: Tracks changes through LogMiner or Oracle GoldenGate
Each connector provides full support for INSERT, UPDATE, and DELETE operations with before or after state capture, where applicable.

Source database configuration

Debezium server relies on a source database to produce change events in a format it can consume. Proper configuration of each database ensures that each insert, update, and delete is actually captured reliably and delivered to the lakehouse. Prerequisites and setup procedures vary by database vendor, and they include enabling CDC features, configuring replication slots, and setting appropriate permissions. To ensure that Debezium server can capture changes reliably, each database vendor requires specific configuration and user permissions. The following outlines the required settings for both aspects.
  • Database configuration: Enable CDC mechanisms specific to your database platform. Use any of the following:
    • PostgreSQL: Requires logical replication configuration
    • SQL Server: Needs CDC or Change Tracking enabled
    • MySQL: Requires binary logging with ROW format
    • Oracle: Requires supplemental logging configuration
  • User permissions: Grant appropriate privileges for CDC operations. Use the following:
    • Replication permissions for reading change streams
    • Metadata access for schema discovery
    • Connection privileges for establishing monitoring sessions
The following sections provide detailed, step-by-step configuration instructions for each database vendor to implement the previously listed requirements.

PostgreSQL configuration

PostgreSQL uses logical replication to expose change events.
  • Set wal_level to logical in the postgresql.conf file.
  • Create a replication slot for Debezium to consume changes.
  • Install the pgoutput logical decoding plugin. The plugin is available by default in PostgreSQL 10+.
  • Grant the following replication permissions to the Debezium user:
    CREATE ROLE debezium_user WITH REPLICATION LOGIN PASSWORD 'password';
    GRANT SELECT ON ALL TABLES IN SCHEMA schema_name TO debezium_user;
    ALTER DEFAULT PRIVILEGES IN SCHEMA schema_name GRANT SELECT ON TABLES TO debezium_user;
    

SQL server configuration

SQL Server requires that you enable CDC at both the database and table levels.
  • Enable CDC at the database level:
    USE DatabaseName;
    EXEC sys.sp_cdc_enable_db;
    
  • Enable CDC for specific tables:
    EXEC sys.sp_cdc_enable_table
      @source_schema = N'dbo',
      @source_name = N'TableName',
      @role_name = NULL;
    
  • Grant appropriate permissions to the Debezium user:
    CREATE LOGIN debezium_user WITH PASSWORD = 'password';
    CREATE USER debezium_user FOR LOGIN debezium_user;
    EXEC sp_addrolemember 'db_owner', 'debezium_user';
    

MySQL configuration

MySQL uses binary logging to track changes.
  • Enable binary logging in my.cnf:
    server-id = 1
    log_bin = mysql-bin
    binlog_format = ROW
    binlog_row_image = FULL
    expire_logs_days = 10
    
  • Grant replication permissions:
    CREATE USER 'debezium_user'@'%' IDENTIFIED BY 'password';
    GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT 
    ON *.* TO 'debezium_user'@'%';
    FLUSH PRIVILEGES;
    

Oracle configuration

Oracle requires supplemental logging and appropriate permissions.
  • Enable database-level supplemental logging:
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    
  • Enable table-level supplemental logging for specific tables:
    ALTER TABLE schema.table ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    
  • Grant required permissions:
    GRANT CREATE SESSION TO debezium_user;
    GRANT SELECT ON V_$DATABASE TO debezium_user;
    GRANT SELECT_CATALOG_ROLE TO debezium_user;
    GRANT EXECUTE_CATALOG_ROLE TO debezium_user;
    GRANT SELECT ANY TRANSACTION TO debezium_user;
    GRANT LOGMINING TO debezium_user;
    
For more comprehensive setup instructions, troubleshooting guidance, and platform-specific considerations, refer to the Debezium connector documentation.

Deployment in the NexusOne portal

As previously described, you can deploy and configure Debezium Server instances by using the NexusOne Ingest feature within the portal, which includes a data mirroring sub-feature with customizable connector settings, transformation rules, and sink destinations. The following image shows the user interface of the data mirroring sub-feature in the NexusOne portal.
01-data-mirroring

Data mirroring in the NexusOne portal
You can do the following when interacting with this feature:
  • Configure source connector settings:
    • Select a database type such as PostgreSQL, SQL Server, MySQL, or Oracle.
    • Provide connection details such as the hostname, port, or database name.
    • Configure authentication credentials.
    • Specify schema and table filters.
  • Configure sink destination:
    • Use the default pre-populated Iceberg lakehouse configuration.
    • Customize target schema and table names.
    • Set up a partition strategy for large tables.
  • Configure optional settings:
    • Select a snapshot mode for initial data load.
    • Set a heartbeat interval for connection monitoring.
    • Set a schema evolution handling.
    • Set an error handling and retry policy.
  • Review the configuration summary and deploy.
  • Monitor deployment status and verify a successful connection.

Understanding the CDC event structure

The Debezium server produces structured change events that contain the before and after states of data, along with metadata about change operations.

Event structure components

Each Debezium CDC event contains components that describe the change and provide context for downstream consumers. The following components include:
  • Envelope: Wraps the CDC event in a standard structure. The structure comprises the following:
    • before: Row state before the change, for INSERT operations, this field is null
    • after: Row state after the change, for DELETE operations, this field is null
    • op: Operation type: c indicates create, u indicates update, d indicates delete, and r indicates a read operation for snapshot events.
    • ts_ms: Timestamp of the change in milliseconds
    • source: Metadata about the source system and position
  • Source metadata: Provides traceability for each event. It includes the following:
    • Database name and schema
    • Table name
    • Transaction ID or Log Sequence Number (LSN)
    • Timestamp from source system
    • Connector name and version
  • Schema information: Describes the structure of the before or after values. It includes the following:
    • Field names and data types
    • Optional/required indicators
    • Default values were applicable

Iceberg sink transformation

When streaming events into Iceberg tables in a lakehouse, the event undergoes a transformation that converts raw events into a structured table format by:
  • Using the op field to determine the type of operation, such as an INSERT, UPDATE, or DELETE
  • Flattening the before and after values into columns
  • Optionally including metadata columns for audit and lineage tracking
  • Storing Debezium coordinates for idempotency
  • Extracting partition keys for efficient query performance

Additional resources