Skip to main content
Apache Iceberg is an open table format for large-scale analytical datasets in data lakes. It provides ACID transaction guarantees, schema evolution, and time travel capabilities. Traditional table formats, like Hive or Impala, combine metadata and read/write transactions to a single compute engine, like Spark or Flink. Only that engine can manage them while maintaining consistency. Unlike traditional table formats, Iceberg separates the metadata from the compute engine, allowing multiple engines to read and write data concurrently while ensuring a consistent table state. Because NexusOne requires multi-compute engine access, strong consistency, and long-term schema evolution in its data lakehouse, the NexusOne team adopts Iceberg as its foundational table format. NexusOne also stores Iceberg metadata in a Hive Metastore and data files in Amazon S3.

Key features

  • ACID transactions: Keep data consistent using serializable or snapshot isolation. It defaults to a serializable isolation and ensures that multiple jobs can read from and write to a table at the same time, with conflicting writes detected at commit and retried.
  • Schema evolution: Add, drop, or rename columns without rewriting data, with schema changes tracked in table metadata, so all readers see the updated schema.
  • Partitioning flexibility: Allows you to change partitioning schemes, for example, by date instead of by region. You can do this without touching existing files. New data uses the new partitioning scheme, old data remains accessible, and queries scan only relevant partitions.
  • Time travel: Query historical table states using snapshot IDs or timestamps for auditing, debugging, and reproducible analytics.

Iceberg components

Iceberg’s architecture comprises the following components from top to bottom:
  1. Catalog: Contains a pointer to an Iceberg table’s metadata file
  2. Metadata file: Contains all information about a table. It’s a metadata.json file, and it comprises the following:
    1. Snapshots: Records the table’s content at specific points in time
    2. Manifest lists: Tracks all manifest files in a snapshot
    3. Manifest files: Tracks data files
  3. Data files: Parquet, ORC, or Avro files storing the table’s data
  4. Storage layer: S3, HDFS, or other object stores holding the metadata and data files

Environment configuration

This section describes how the NexusOne team configured Iceberg. The current version information supported by the NexusOne environment includes the following details:
  • Iceberg version: 1.8
  • Table format version: 2.0
  • File format: Defaults to Parquet, but also supports Avro and ORC
  • Catalog type: Hive Metastore

Iceberg best practices

This section provides practical Iceberg examples that adhere to best practices.

Managing tables

You can create, alter, and drop tables with ACID guarantees. Spark SQL or the DataFrame API helps execute these operations, and they also leverage the Hive Metastore for cataloging.

Create Iceberg tables

Creating an Iceberg table lets you store and manage large datasets with ACID guarantees and efficient queries. When creating an Iceberg table, adhere to the following best practices for better result:
  • Table structure: How you structure your table affects data storing and access. Key considerations:
    • Define column names and data types to shapes the table schema.
    • Specify the USING iceberg clause to determine the table format.
    • Set the table location in S3 to determine where you store data and metadata.
    • The Hive Metastore then automatically registers the table.
  • Partitioning strategy: How you organize your data impacts query performance. Key considerations:
    • Use the PARTITIONED BY clause with columns of low-to-medium cardinality to improve efficiency.
    • Choose columns such as date, region, or category to help queries skip irrelevant data.
    • Use proper partitioning to allow queries to scan only relevant data.
  • Table location: A clear location structure helps manage both data and metadata files effectively. Key considerations:
    • Specify the S3 path with the LOCATION clause to determine where files reside.
    • Follow a consistent pattern, such as s3://datalake/iceberg/schema/table/ to improve organization.
    • Organize data and metadata files effectively to simplify maintenance.
  • Hive Metastore integration: Proper integration ensures consistent access and metadata management. Key actions include:
    • The table registers automatically in the catalog
    • Metadata stays synchronized between Iceberg and Hive
    • Using fully qualified names ensures consistent table access
The following example shows how to create Iceberg tables with different structures and partitioning strategies.
-- Basic table creation with partitioning
CREATE TABLE iceberg_db.sales (
    id BIGINT,
    amount DOUBLE,
    dt DATE
) USING iceberg
PARTITIONED BY (dt)
LOCATION 's3://datalake/iceberg/sales/';

-- Table with multiple partitions
CREATE TABLE iceberg_db.orders (
    order_id BIGINT,
    customer_id BIGINT,
    product_id STRING,
    quantity INT,
    total_amount DOUBLE,
    order_date DATE,
    region STRING
) USING iceberg
PARTITIONED BY (order_date, region)
LOCATION 's3://datalake/iceberg/orders/';

-- Unpartitioned table for small dimensions
CREATE TABLE iceberg_db.product_categories (
    category_id INT,
    category_name STRING,
    parent_category STRING
) USING iceberg
LOCATION 's3://datalake/iceberg/product_categories/'

Drop Iceberg tables

Dropping an Iceberg table removes the table definition and associated Iceberg metadata while optionally deleting the underlying data. When dropping an Iceberg table, adhere to the following best practices for better result:
  • Dropping a table in the standard way: When you want to remove the table from a catalog but keep the data intact using DROP TABLE, keep in mind that this does the following:
    • Removes the table from the Hive Metastore
    • Keeps data files in S3 by default
    • Retains associated Iceberg metadata in the table location
    • Makes the table schema inaccessible through the catalog
  • Dropping a table with a purge: When you want to permanently delete both a table and its data using a DROP TABLE with PURGE, keep in mind that this does the following:
    • Removes the table from the catalog
    • Deletes all data and associated Iceberg metadata files from S3
    • Permanently removes the table with no recovery option
    • Requires extra caution in production environments
  • Safety considerations: Always take precautions before dropping a table. Consider the following:
    • Verify the table name before dropping it.
    • Check for downstream dependencies.
    • Take a final snapshot before deletion.
    • Coordinate with the team when are all sharing a table.
The following example shows how to drop Iceberg tables using different options.
-- Drop table (keeps data files in S3)
DROP TABLE iceberg_db.sales;

-- Drop table and delete all data files
DROP TABLE iceberg_db.temp_table PURGE;

-- Check if table exists before dropping
DROP TABLE IF EXISTS iceberg_db.sales;

Alter table schema

Altering a table schema lets you add, drop, or change column types without rewriting data files. When altering an Iceberg table, adhere to the following best practices for better result:
  • Adding columns: How you add new columns determines whether existing data changes or downtime occurs. Use an ALTER TABLE ADD COLUMN syntax. This adds the column to the table schema, and old rows read as NULL for the new column.
  • Dropping columns: Remove columns safely without touching existing data files. Use the ALTER TABLE DROP COLUMN syntax.
  • Changing column types: Only perform safe type promotions to avoid data loss. Use the ALTER TABLE ALTER COLUMN syntax.
    • Allowed changes include:
      • INT to BIGINT
      • FLOAT to DOUBLE
    • Disallowed changes include:
      • STRING to INT
      • Other unsafe type conversions that risk data loss
  • Renaming columns: Change column names logically in metadata only. Physical column names in Parquet files remain unchanged, this ensures backward compatibility.
The following example shows practical ways to change an Iceberg table. This includes adding new columns, dropping old ones, renaming columns, and safely updating column types.
-- Add new column
ALTER TABLE iceberg_db.sales ADD COLUMN region STRING;

-- Drop obsolete column
ALTER TABLE iceberg_db.sales DROP COLUMN obsolete_column;

-- Rename column
ALTER TABLE iceberg_db.sales 
RENAME COLUMN customer_id TO account_id;

-- Change column type (safe promotion)
ALTER TABLE iceberg_db.sales 
ALTER COLUMN quantity TYPE BIGINT;

-- Add multiple columns
ALTER TABLE iceberg_db.sales 
ADD COLUMN discount_amount DOUBLE;

ALTER TABLE iceberg_db.sales 
ADD COLUMN promotion_code STRING;

Use a create table as select

Create Table as Select (CTAS) creates a new Iceberg table and populates it with data from a SELECT query in a single operation. Then it uses the query results to automatically define the table’s schema. When using a CTAS in an Iceberg table, adhere to the following best practices for better result:
  • Partitioning in CTAS: How you define partitions affects query performance. Consider the following:
    • Add the PARTITIONED BY clause before the AS keyword.
    • Ensure partition columns appear in the SELECT list.
    • Iceberg creates partitions based on the actual data values.
  • Why use CTAS: CTAS creates a table and loads data in one step. The advantages include:
    • Iceberg automatically infers the schema from the SELECT query results.
    • Iceberg creates tables and summaries quickly.
    • Schemas don’t need to be explicitly defined.
  • Common use cases: Typical scenarios where CTAS works well include:
    • Creating summary tables from fact tables
    • Materializing complex query results
    • Copying tables with transformations
    • Creating filtered subsets of data
The following example shows practical uses of CTAS in Iceberg. It includes creating summary tables, applying partitions, performing aggregations, and copying filtered subsets of data.
-- Basic CTAS: Create summary table
CREATE TABLE iceberg_db.sales_summary AS
SELECT 
    dt, 
    SUM(amount) AS total_amount
FROM iceberg_db.sales
GROUP BY dt;

-- CTAS with partitioning
CREATE TABLE iceberg_db.sales_by_region
USING iceberg
PARTITIONED BY (region)
AS
SELECT 
    id,
    amount,
    dt,
    region
FROM iceberg_db.sales
WHERE dt >= '2025-01-01';

-- CTAS with aggregations
CREATE TABLE iceberg_db.monthly_revenue
USING iceberg
PARTITIONED BY (month)
AS
SELECT 
    DATE_TRUNC('month', dt) as month,
    region,
    COUNT(*) as transaction_count,
    SUM(amount) as total_revenue
FROM iceberg_db.sales
WHERE dt >= '2024-01-01'
GROUP BY DATE_TRUNC('month', dt), region;

-- CTAS copying entire table with filters
CREATE TABLE iceberg_db.sales_2024 AS
SELECT * 
FROM iceberg_db.sales
WHERE YEAR(dt) = 2024;

Configure partitioning and clustering

When working with Iceberg partitions, adhere to the following best practices for better result:
  • Partitioning strategies: How you partition your data affects query performance. The common strategies include:
    • Identity partitioning: Partition by the exact column value.
    • Date transforms: Extract year, month, day, or hour from date columns.
    • Bucket transforms: Apply hash-based distribution for high-cardinality columns.
    • Truncate transforms: Partition by string prefixes.
  • Hidden partitioning: Iceberg lets you simplify queries while maintaining performance. Key points include:
    • Queries don’t need to reference partitions in the WHERE clause.
    • Iceberg automatically prunes irrelevant partitions.
    • Partition columns don’t appear separately in query results.
  • Choosing partition columns: Selecting the right partition columns is critical. Consider the following:
    • Use columns frequently referenced in WHERE filters.
    • Prefer low-to-medium cardinality columns.
    • Choose common column names such as, date, region, category, or country.
    • Avoid over-partitioning, ideally < 1 GB per partition.
  • Partition evolution: You can change partitioning without rewriting old data. Consider the following:
    • Modifying the partition spec for new data causes the new data to use the new partitions.
    • Because old data remains in the original partitions, it continues to be accessible.
    • As a result, both old and new data remain accessible seamlessly.
The following example shows different partitioning types as well as how queries automatically use partitions without requiring explicit partition syntax.
-- Identity partitioning (exact values)
CREATE TABLE iceberg_db.events (
    event_id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_date DATE,
    region STRING
) USING iceberg
PARTITIONED BY (event_date, region)
LOCATION 's3://datalake/iceberg/events/';

-- Date transform partitioning (monthly)
CREATE TABLE iceberg_db.logs (
    log_id BIGINT,
    log_timestamp TIMESTAMP,
    message STRING
) USING iceberg
PARTITIONED BY (months(log_timestamp))
LOCATION 's3://datalake/iceberg/logs/';

-- Bucket partitioning for high cardinality
CREATE TABLE iceberg_db.user_events (
    event_id BIGINT,
    user_id BIGINT,
    event_data STRING,
    event_timestamp TIMESTAMP
) USING iceberg
PARTITIONED BY (
    days(event_timestamp),
    bucket(100, user_id)
)
LOCATION 's3://datalake/iceberg/user_events/';

-- Query automatically uses partitions (no partition syntax needed)
SELECT *
FROM iceberg_db.sales
WHERE dt = '2025-01-01'  -- Partition pruning automatic
  AND region = 'US';

Iceberg tables in a Hive Metastore

Iceberg tables integrate with Hive Metastore to leverage Hive’s capabilities while keeping Iceberg’s advanced features like schema evolution and partitioning. Hive Metastore does the following:
  • Syncs table metadata between Iceberg and Hive
  • Manages namespace resolution for Iceberg tables that appear as external tables in Hive
  • Stores partition information in Iceberg metadata files
  • Doesn’t require Hive partition syntax for Iceberg tables
When using a Hive Metastore, do the following:
  • Use SHOW PARTITIONS to view partition values.
  • Ensure a Metastore connection is available for table operations.

Reading/querying tables

Iceberg tables support standard SQL and DataFrame queries, while automatically optimizing reads with features like time travel, predicate pushdown, and column pruning.

Query Iceberg tables with SQL

You can execute SQL queries against Iceberg tables with automatic partition pruning and column-level filtering for optimized performance. When querying Iceberg tables with SQL, adhere to the following best practices for better result:
  • SQL querying: Consider the following:
    • Use fully qualified table names such as catalog_db.table.
    • Write standard SELECT queries with WHERE, GROUP BY, and ORDER BY.
    • Rely on Iceberg to handle partition pruning transparently.
    • Avoid special partition syntax in queries.
  • Filtering and selection: How you define your query determines what Iceberg reads. Consider the following:
    • Use WHERE clauses to filter rows.
    • Select only the required columns to limit data reads.
    • Join Iceberg tables with other Iceberg or external tables.
    • Apply aggregate functions like SUM, COUNT, and AVG.
  • Automatic optimizations: Iceberg optimizes queries at runtime. These optimizations include:
    • Partition pruning based on WHERE predicates
    • Column pruning to read only selected fields
    • Predicate pushdown using file-level statistics
    • Reduced data scanning and improved performance
The following example shows common query patterns on Iceberg tables, such as filtering rows, selecting specific columns, aggregating data, and joining data with other tables.
-- Basic query with filtering
SELECT * FROM iceberg_db.sales WHERE dt = '2025-01-01';

-- Select specific columns with filtering
SELECT 
    id,
    amount,
    dt
FROM iceberg_db.sales
WHERE dt >= '2025-01-01'
  AND amount > 1000;

-- Aggregation query
SELECT 
    dt,
    COUNT(*) as transaction_count,
    SUM(amount) as total_amount,
    AVG(amount) as avg_amount
FROM iceberg_db.sales
WHERE dt BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY dt
ORDER BY dt;

-- Join with another table
SELECT 
    s.id,
    s.amount,
    s.dt,
    c.customer_name,
    c.region
FROM iceberg_db.sales s
JOIN iceberg_db.customers c ON s.customer_id = c.customer_id
WHERE s.dt = '2025-01-01';

Query Iceberg tables with the DataFrame API

You can use the Spark DataFrame API to read and query Iceberg tables programmatically. This provides flexibility for complex transformations and pipeline integration. When querying Iceberg tables with the DataFrame API, adhere to the following best practices for better result:
  • Reading Iceberg tables: How you load the table determines how you work with the data. Consider the following:
    • Use spark.read.format("iceberg").load().
    • Specify the table name or path when loading.
    • Work with the Spark DataFrame result.
  • DataFrame operations: Use the Spark DataFrame API to shape and analyze data. Consider the following:
    • filter() for row-level filtering
    • select() for column projection
    • join() to combine multiple tables
    • groupBy() and agg() for aggregations
  • Integration with pipelines: Iceberg fits naturally into Spark workflows. Consider the following:
    • Chain multiple DataFrame operations.
    • Combine Iceberg tables with other Spark data sources.
    • Write the results to new tables or output files.
The following example shows how to read Iceberg tables using the Spark DataFrame API and apply everyday operations like filtering, aggregation, and joins.
# Read Iceberg table using the Spark DataFrame API
df = spark.read.format("iceberg").load("iceberg_db.sales")

# Show first few rows
df.show()

# Filter data
df_filtered = df.filter(df.dt == '2025-01-01')
df_filtered.show()

# Select specific columns with filter
df_subset = df.select("id", "amount", "dt") \
    .filter(df.dt >= '2025-01-01') \
    .filter(df.amount > 1000)

# Aggregation
from pyspark.sql import functions as F

df_summary = df.filter(df.dt >= '2025-01-01') \
    .groupBy("dt") \
    .agg(
        F.count("*").alias("transaction_count"),
        F.sum("amount").alias("total_amount"),
        F.avg("amount").alias("avg_amount")
    ) \
    .orderBy("dt")

df_summary.show()

# Join tables
df_customers = spark.read.format("iceberg").load("iceberg_db.customers")

df_joined = df.join(df_customers, "customer_id") \
    .select(
        df.id,
        df.amount,
        df.dt,
        df_customers.customer_name,
        df_customers.region
    ) \
    .filter(df.dt == '2025-01-01')

df_joined.show()

Read historical snapshots

You can query Iceberg tables as they existed at previous points in time using snapshot IDs or timestamps. This enables auditing and reproducible analytics. Historical snapshots are sometimes referred to as time-traveling. When reading historical snapshots in Iceberg, adhere to the following best practices for better result:
  • Timestamp-based time travel: Query the table as it existed at a specific point in time. Consider the following:
    • Use FOR SYSTEM_TIME AS OF in SQL.
    • Specify the exact timestamp you want to query.
    • Read data as it existed at that moment.
  • Snapshot-based time travel: Query a precise table state using a snapshot ID. Consider the following:
    • Use FOR SYSTEM_VERSION AS OF with a snapshot ID.
    • Gain precise control for reproducible queries.
    • Retrieve snapshot IDs from table metadata.
  • DataFrame API time travel: Use the DataFrame API to programmatically time-travel. Consider the following:
    • Use .option("as-of-timestamp") for timestamp-based reads.
    • Use .option("snapshot-id") for snapshot-based reads.
    • Continue using the same read.format("iceberg") API.
  • Limitations: Be aware of the following constraints when using time travel:
    • Rely on snapshot retention policies.
    • Avoid querying expired snapshots.
    • Expect slower queries when accessing archived files.
  • Common use cases: Time travel is useful in scenarios such as:
    • Auditing historical data states
    • Debugging pipeline failures
    • Meeting compliance and reporting requirements
    • Comparing metrics across different table versions
The following example shows how to query historical table states using SQL.
-- Time travel using timestamp
SELECT * FROM iceberg_db.sales 
FOR SYSTEM_TIME AS OF '2025-01-20';

-- Time travel with filtering
SELECT 
    id,
    amount,
    dt
FROM iceberg_db.sales 
FOR SYSTEM_TIME AS OF '2025-01-20 10:30:00'
WHERE dt >= '2025-01-15';

-- Time travel using snapshot ID
SELECT * FROM iceberg_db.sales
FOR SYSTEM_VERSION AS OF 1234567890;
The following example shows how to query historical table states using the Spark DataFrame API.
# DataFrame API: Time travel with timestamp
df_historical = spark.read \
    .option("as-of-timestamp", "2025-01-20 00:00:00") \
    .format("iceberg") \
    .load("iceberg_db.sales")

df_historical.filter(df_historical.dt >= '2025-01-15').show()

# DataFrame API: Time travel with snapshot ID
df_snapshot = spark.read \
    .option("snapshot-id", 1234567890) \
    .format("iceberg") \
    .load("iceberg_db.sales")

df_snapshot.show()

Predicate pushdown and column pruning

You can use Iceberg’s metadata to skip unnecessary data files and read only the required columns. This significantly improves query performance. When optimizing historical Iceberg queries, adhere to the following best practices for better result:
  • Predicate pushdown: When reducing the data scanned by applying filters before reading files, consider the following:
    • Apply WHERE conditions against file-level statistics.
    • Skip files that don’t match the predicates.
    • Use min/max stats in manifest files.
    • Avoid reading unnecessary file contents.
  • Column pruning: When reading only the columns you need to save, I/O and memory, consider the following:
    • Use SELECT for specific columns instead of SELECT *.
    • Use Parquet’s columnar format to read only the requested columns.
  • Partition pruning: When skipping entire partitions based on filters, consider the following:
    • Filter on partition columns to skip irrelevant partitions.
    • Place partition filters first in the WHERE clause for maximum pruning.
    • Prioritize pruning large tables because it yields the most benefit.
  • Combined optimization: When applying all techniques together for maximum performance, consider the following:
    • Use EXPLAIN to verify applied optimizations.
    • Enable metadata caching for tables that are frequently accessed.
The following example shows how to apply Iceberg query optimizations using SQL.
-- Partition Pruning: Skips all partitions except 2025-01-01
SELECT * FROM iceberg_db.sales
WHERE dt = '2025-01-01';

-- Metadata / Manifest Pruning: Uses file stats to skip files
SELECT * FROM iceberg_db.sales
WHERE region = 'EU';

-- Snapshot Pruning for time travel queries
SELECT * FROM iceberg_db.sales
FOR SYSTEM_TIME AS OF '2025-01-20';

-- Combined: Partition + predicate + column pruning
SELECT 
    id,
    amount,
    customer_id
FROM iceberg_db.sales
WHERE dt = '2025-01-01'  -- Partition pruning
  AND amount > 1000      -- Predicate pushdown
  AND region = 'EU';     -- Additional predicate
The following example shows how to apply Iceberg query optimizations using the Spark DataFrame API.
# Metadata Caching in Spark
spark.conf.set("spark.sql.catalog.iceberg.cache-enabled", True)

# DataFrame API with optimizations
from pyspark.sql import functions as F

# Column pruning: Select only needed columns
df = spark.read.format("iceberg").load("iceberg_db.sales") \
    .select("id", "amount", "dt", "region")

# Predicate pushdown: Filter pushed to Iceberg
df_filtered = df.filter(
    (F.col("dt") == "2025-01-01") &  # Partition pruning
    (F.col("amount") > 1000) &       # Predicate pushdown
    (F.col("region") == "EU")        # Predicate pushdown
)

df_filtered.show()

Writing and inserting data

Writing data to Iceberg always creates a new snapshot. Each write provide ACID guarantees, and you can append new data, overwrite existing data, or merge changes without breaking readers.

Append data

You can add new data to Iceberg tables without affecting existing records. This creates new snapshots that maintain a complete table history. When appending data to Iceberg tables, adhere to the following best practices for better result:
  • SQL INSERT INTO: Append data using SQL-based workflows. Key points include:
    • Use the INSERT INTO table SELECT syntax.
    • Append rows from a source query.
    • Create new data files and a new snapshot.
    • Leave existing data unchanged INTO.
  • DataFrame write API: Write data programmatically using Spark. Consider the following:
    • Use df.write.format("iceberg").mode("append") for basic appends.
    • Use the legacy write() API for simple operations.
    • Prefer writeTo() for advanced Iceberg features.
    • For each write, Iceberg creates a new snapshot.
  • Atomic operations: Iceberg guarantees consistency during writes. Keep in mind:
    • Writes are atomic and either fully succeed or fail
    • Partial writes are never visible to readers
    • Concurrent writes are safe
The following example shows how to append data to an Iceberg table using an SQL INSERT INTO statement.
-- SQL: Insert from SELECT
INSERT INTO iceberg_db.sales 
SELECT * FROM staging.sales;

-- SQL: Insert with filtering
INSERT INTO iceberg_db.sales
SELECT 
    id,
    amount,
    dt,
    region
FROM staging.daily_sales
WHERE dt = '2025-01-24';
The following example shows how to append data using the Spark DataFrame API.
# DataFrame API: Append mode
df = spark.read.parquet("s3://landing/sales/2025-01-24/")

df.write.format("iceberg").mode("append").save("iceberg_db.sales")

# Alternative: writeTo() API (recommended for Iceberg)
df.writeTo("iceberg_db.sales").append()

Overwrite data

You can overwrite data to replace table contents entirely or specific partitions. This creates new snapshots while maintaining history through time travel. When overwriting data in Iceberg tables, consider the impact of the following approaches:
  • Full table overwrite: This approach replaces all data in the table. Keep in mind:
    • It marks old data files for deletion
    • A new snapshot points only to the new files
    • This affects all data and requires extra caution
  • Partition overwrite: This approach limits the overwrite to specific partitions. Consider the following:
    • Selected partition gets replaced
    • All other partitions remain unchanged
    • Safer than a full table overwrite
    • Requires a partitioned table
  • Dynamic partition overwrite: This approach overwrites only the partitions present in the incoming data. Key points include:
    • Automatically determines which partitions to replace
    • Only affected partitions are overwritten
    • This is the most common overwrite pattern
    • Safer for partitioned tables
The following example shows how to overwrite data using SQL.
-- SQL: Overwrite entire table
INSERT OVERWRITE TABLE iceberg_db.sales
SELECT * FROM staging.sales_full_refresh;

-- SQL: Overwrite specific partition
INSERT OVERWRITE TABLE iceberg_db.sales
SELECT * 
FROM staging.daily_sales
WHERE dt = '2025-01-24';
The following example shows how to overwrite data using the Spark DataFrame API.
# DataFrame API: Full overwrite
df_full = spark.read.parquet("s3://landing/sales/full/")

df_full.write.format("iceberg") \
    .mode("overwrite") \
    .save("iceberg_db.sales")

# DataFrame API: Dynamic partition overwrite
df_partition = spark.read.parquet("s3://landing/sales/2025-01-24/")

df_partition.writeTo("iceberg_db.sales") \
    .overwritePartitions()

Merge, update, and delete operations

You can perform upserts, updates, and deletes using MERGE INTO statements for complex data modification patterns, such as Change Data Capture (CDC) and slowly changing dimensions. When modifying existing data in Iceberg tables, consider the following operations:
  • SQL MERGE INTO: Use this when you need to update existing rows and insert new ones in a single operation. Key points include:
    • Match source and target rows using a join condition.
    • Update rows that match the condition.
    • Insert rows that don’t match.
    • Execute as a single atomic operation.
    • Use MERGE INTO because it’s more efficient than a DELETE + INSERT operation.
    • Run compaction after heavy MERGE INTO, UPDATE, or DELETE operations to reduce the number of files.
  • SQL UPDATE: Use this to change values in existing rows. Keep in mind:
    • Modify only rows that match the condition.
    • Rewrite the affected data files.
    • Create a new snapshot after the update.
  • SQL DELETE: Use this to remove rows from a table. Consider the following:
    • Remove rows that match the condition.
    • Rewrite affected data files without the deleted rows.
    • Access old data through time travel if needed.
The following example shows how to modify existing Iceberg table data using MERGE INTO, UPDATE, and DELETE operations.
-- MERGE: Basic upsert pattern
MERGE INTO iceberg_db.sales AS target
USING updates AS source
ON target.id = source.id
WHEN MATCHED THEN 
    UPDATE SET target.amount = source.amount
WHEN NOT MATCHED THEN 
    INSERT *;

-- MERGE: Selective column updates
MERGE INTO iceberg_db.customers AS target
USING staging.customer_updates AS source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN 
    UPDATE SET 
        target.email = source.email,
        target.phone = source.phone,
        target.updated_at = current_timestamp()
WHEN NOT MATCHED THEN 
    INSERT (customer_id, name, email, phone, created_at)
    VALUES (source.customer_id, source.name, source.email, 
            source.phone, current_timestamp());

-- MERGE: With DELETE condition
MERGE INTO iceberg_db.inventory AS target
USING staging.inventory_updates AS source
ON target.product_id = source.product_id
WHEN MATCHED AND source.status = 'DISCONTINUED' THEN 
    DELETE
WHEN MATCHED THEN 
    UPDATE SET target.quantity = source.quantity
WHEN NOT MATCHED THEN 
    INSERT *;

-- UPDATE: Modify existing rows
UPDATE iceberg_db.customers
SET status = 'INACTIVE'
WHERE last_login < '2024-01-01';

-- DELETE: Remove rows
DELETE FROM iceberg_db.sales
WHERE dt < '2024-01-01';

Best practices for S3 storage and partitioning

You can optimize write operations for S3 storage with proper file sizing, compression, and partitioning strategies. When tuning Iceberg write performance, adhere to the following best practices for better result:
  • Target file size: File size directly affects read efficiency and metadata overhead. Consider the following:
    • Aim for data files around the default 512 MB.
    • Adjust file size using table properties when needed.
    • Balance larger files with smaller files.
    • Avoid creating small files below 100 MB.
  • Compression: Compression choices affect both storage cost and performance. Key considerations include:
    • Use Snappy compression for fast reads and writes by default.
    • Consider gzip for higher compression ratios.
    • Use zstd for a balance between compression and speed.
    • Tune compression through table properties.
  • Partitioning strategy: Good partitioning improves both write and query performance. Keep in mind:
    • Partition by frequently filtered columns.
    • Aim for partitions that are 1 GB or larger.
    • Avoid over-partitioning, which creates many small files.
    • Prefer date-based partitioning for time-series workloads.
  • Write patterns: How you write data impacts file layout and long-term performance. Consider the following:
    • Favor batch writes over frequent small appends.
    • Use dynamic partition overwrite for incremental loads.
    • Schedule compaction after heavy write activity.
The following example shows how to tune Iceberg write performance using file size, compression, and efficient write patterns.
# Configure write properties
spark.conf.set("spark.sql.catalog.iceberg.write.target-file-size-bytes", "536870912")  # 512MB
spark.conf.set("spark.sql.catalog.iceberg.write.parquet.compression-codec", "zstd")

# Write with optimal file size
df.write.format("iceberg") \
    .option("write.target-file-size-bytes", "536870912") \
    .mode("append") \
    .save("iceberg_db.sales")

# Batch write pattern (efficient)
df_daily = spark.read.parquet("s3://landing/sales/2025-01-24/")
df_daily.writeTo("iceberg_db.sales").append()

# Avoid: Many small appends (inefficient)
# for file in small_files:
#     df_small = spark.read.parquet(file)
#     df_small.writeTo("iceberg_db.sales").append()  # Creates many small files

Snapshot management during writes

You can manage snapshots during writes to understand how each operation creates a new snapshot and control snapshot growth for optimal metadata performance. When working with Iceberg snapshots, consider the following:
  • Snapshot creation: Each write operation generates a new snapshot. Keep in mind:
    • Snapshots point to all current data files
    • Previous snapshots remain available for time travel
    • Metadata grows with every write
  • Snapshot accumulation: Frequent writes produce many snapshots. Consider the following:
    • Add less snapshot because each one adds to the metadata storage and slows query planning.
    • Monitor snapshot count via metadata queries.
    • Periodically set old snapshots to expire so you can maintain performance.
  • Snapshot history: Historical snapshots enable auditing and reproducibility. Key considerations include:
    • Balance retention with performance needs.
    • Use a typical retention period of 7-30 days.
    • Configure retention policies based on your requirements.
    • Keep sufficient history for auditing needs.
The following example shows how writes create snapshots in Iceberg.
# Normal write creates snapshot
df.writeTo("iceberg_db.sales").append()
# New snapshot created automatically

# Multiple writes create multiple snapshots
for batch in data_batches:
    batch.writeTo("iceberg_db.sales").append()
    # Each append creates new snapshot

# View snapshot history
spark.sql("""
    SELECT 
        committed_at,
        snapshot_id,
        operation
    FROM iceberg_db.sales.snapshots
    ORDER BY committed_at DESC
    LIMIT 10
""").show()

Schema and partition evolution

Iceberg supports non-disruptive changes to a schema and its partitions.

Add or drop columns

You can add new columns or remove obsolete columns from table schemas without modifying existing data files. When adding or dropping columns in an Iceberg table, adhere to the following best practices for better result:
  • Adding columns: Add new fields without rewriting existing data. Consider the following:
    • Use ALTER TABLE ADD COLUMN.
    • Rely on Iceberg to update the schema in metadata immediately.
    • Leave existing Parquet files unchanged.
    • Expect old rows to return NULL for the new column.
    • Expect the operation to complete with zero downtime.
  • Dropping columns: Remove columns safely without touching physical data. Keep in mind:
    • Use ALTER TABLE DROP COLUMN.
    • Rely on Iceberg to remove the column from the schema definition.
    • Expect physical data to remain in Parquet files.
    • Expect reads to automatically skips a dropped column.
    • Run compaction later to reclaim disk space.
  • Multiple operations: Apply several add or drop operations over time. Consider the following:
    • Execute multiple ADD or DROP operations in sequence.
    • Expect each change to create a new schema version.
    • Rely on Iceberg to track all schema versions in metadata.
The following example shows several ways to add or drop Iceberg columns.
-- Add new column
ALTER TABLE iceberg_db.sales ADD COLUMN region STRING;

-- Add column with position
ALTER TABLE iceberg_db.sales 
ADD COLUMN discount_amount DOUBLE AFTER amount;

-- Add multiple columns
ALTER TABLE iceberg_db.sales ADD COLUMN promo_code STRING;
ALTER TABLE iceberg_db.sales ADD COLUMN channel STRING;

-- Drop obsolete column
ALTER TABLE iceberg_db.sales DROP COLUMN obsolete_column;

-- Drop multiple columns
ALTER TABLE iceberg_db.sales DROP COLUMN temp_field;
ALTER TABLE iceberg_db.sales DROP COLUMN legacy_id;

Change column data types

You can change column data types to wider, compatible types without rewriting data, as long as the change follows safe type promotion rules. When changing column data types in Iceberg, adhere to the following best practices for better result:
  • Safe type promotions: Perform only data type changes that don’t risk data loss. These include:
    • Integer widening: INT > BIGINT
    • Float widening: FLOAT > DOUBLE
    • Decimal precision increase: DECIMAL(10,2) > DECIMAL(20,2)
    • Data conversion isn’t required
  • Unsupported conversions: Avoid data type changes that require rewriting data. These include:
    • Narrowing types: BIGINT > INT
    • Changing type families: STRING > INT
    • Altering decimal scale
    • Such changes require rewriting existing data files
  • Type promotion rules: Follow these guidelines to maintain compatibility:
    • The new data type must be wider than the old one
    • Ensures old files remain readable with the new schema
    • Maintain data consistency across queries
The following example shows how to change a column’s data type in Iceberg safely.
-- Safe type promotions
ALTER TABLE iceberg_db.sales 
ALTER COLUMN quantity TYPE BIGINT;

ALTER TABLE iceberg_db.sales 
ALTER COLUMN price TYPE DOUBLE;

ALTER TABLE iceberg_db.sales 
ALTER COLUMN total_amount TYPE DECIMAL(20,4);

-- These would fail (unsafe conversions)
-- ALTER TABLE iceberg_db.sales ALTER COLUMN amount TYPE INT;  -- Cannot narrow
-- ALTER TABLE iceberg_db.sales ALTER COLUMN id TYPE STRING;   -- Cannot change type family

Partition evolution

You can evolve a table’s partitioning over time by changing its partition specifications without rewriting existing data. When evolving partitions in Iceberg, adhere to the following best practices for better result:
  • Add partition fields: Add new partitions without rewriting existing data. Key points include:
    • Use ALTER TABLE ADD PARTITION FIELD.
    • Expect new writes to use the updated partition spec.
    • Expect old data to remain in original partitions.
    • Expect queries to access both old and new partitions seamlessly.
  • Replace partition fields: Change the partitioning scheme for future data. Consider the following:
    • Use ALTER TABLE REPLACE PARTITION FIELD.
    • Apply the new partition spec for future writes.
    • Expect the historical data to remain unchanged.
  • Drop partition fields: Safely remove unnecessary partition fields. Key points include:
    • Use ALTER TABLE DROP PARTITION FIELD.
    • Iceberg removes the field from the partition spec.
    • Expect the existing partitions to remain queryable.
    • Expect new data writes to use the updated spec.
  • Hidden partitioning: Iceberg simplifies queries regardless of partition changes. Keep in mind:
    • You don’t need to know the partition structure.
    • Iceberg manages partition evolution automatically.
    • You aren’t required to modify queries.
The following example shows how to add, replace, and drop Iceberg partition fields while queries continue to work across both old and new partitions.
-- Add new partition field
ALTER TABLE iceberg_db.events
ADD PARTITION FIELD hours(event_timestamp);

-- Replace partition field (change granularity)
ALTER TABLE iceberg_db.events
REPLACE PARTITION FIELD days(event_timestamp) WITH months(event_timestamp);

-- Drop partition field
ALTER TABLE iceberg_db.sales
DROP PARTITION FIELD region;

-- Query works seamlessly with evolved partitions
SELECT * 
FROM iceberg_db.events
WHERE event_timestamp >= '2025-01-24'  -- Works with both old and new partitions
  AND event_type = 'click';

Snapshot-based versioning

You can track all schema and partition changes through snapshots. This enables version history and rollback capabilities. When managing schema changes in Iceberg, adhere to the following best practices for better result:
  • Schema versioning: Iceberg tracks every schema modification. Keep in mind:
    • Each change creates a new schema version
    • The table metadata records versions
    • Old schema versions remain available for time travel
    • Readers automatically use the correct schema version for the snapshot they query
  • Snapshot linkage: Schema versions link to snapshots. Consider the following:
    • Queries of historical snapshots use the schema at that time
    • Reads remain forward and backward-compatible
  • Rollback support: You can revert schema changes safely. Consider the following:
    • Rollback to a previous snapshot if needed.
    • The table reverts to the old schema automatically.
    • It’s useful for recovering from accidental or bad changes.
The following example shows several ways to manage an Iceberg schema.
-- View schema history
DESCRIBE HISTORY iceberg_db.sales;

-- View current schema
DESCRIBE iceberg_db.sales;

-- Query with old schema via time travel
SELECT * 
FROM iceberg_db.sales
FOR SYSTEM_TIME AS OF '2025-01-20'  -- Uses schema from that time
WHERE dt = '2025-01-19';

-- Rollback table (reverts schema and data)
CALL iceberg.system.rollback_to_snapshot('iceberg_db.sales', 1234567890);
When doing snapshot-based versioning, follow these best practices:
  • Plan schema changes during maintenance windows
  • Test changes in a development environment first
  • Document schema evolution for team awareness
  • Use time travel to verify schema changes
  • Coordinate partition evolution with query patterns

Snapshots and time travel

Snapshots provide immutable versions for auditing and recovery.

Query historical snapshots

You can access table data as it existed at previous points in time using timestamps or snapshot IDs. When querying historical snapshots in Iceberg, adhere to the following best practices for better result:
  • Timestamp-based queries: Access the table as it existed at a specific time. Key points include:
    • Use FOR SYSTEM_TIME AS OF with the desired timestamp.
    • Iceberg finds the nearest snapshot before that timestamp.
    • Iceberg returns data exactly as it existed at that moment.
  • Snapshot ID queries: Access a precise table state using a snapshot ID. Consider the following:
    • Use FOR SYSTEM_VERSION AS OF with the snapshot ID.
    • Iceberg provides exact control over which snapshot to query.
    • Iceberg enables reproducible queries with the exact snapshot.
  • Common use cases: Time travel is helpful in scenarios such as:
    • Auditing: “What was the balance on Dec 31?”
    • Debugging: “Which data caused the error?”
    • Compliance: “Show Q4 report as originally published”
    • Comparison: “How did metrics change?”
The following example shows several ways to query historical snapshots.
-- Query data as of specific timestamp
SELECT * FROM iceberg_db.sales 
FOR SYSTEM_TIME AS OF '2025-01-20';

-- Time travel with filtering and aggregation
SELECT 
    dt,
    SUM(amount) as total_revenue
FROM iceberg_db.sales 
FOR SYSTEM_TIME AS OF '2025-01-20 23:59:59'
WHERE dt >= '2025-01-15'
GROUP BY dt
ORDER BY dt;

-- Query specific snapshot by ID
SELECT * FROM iceberg_db.sales
FOR SYSTEM_VERSION AS OF 1234567890;

-- Compare snapshots
SELECT 
    'Current' as version,
    COUNT(*) as record_count,
    SUM(amount) as total_amount
FROM iceberg_db.sales
UNION ALL
SELECT 
    'Yesterday' as version,
    COUNT(*) as record_count,
    SUM(amount) as total_amount
FROM iceberg_db.sales
FOR SYSTEM_TIME AS OF CURRENT_DATE - INTERVAL 1 DAY;

Restore a previous table version

You can rollback a table to a previous snapshot/version. This reverts both the data and schema changes while maintaining complete history. When rolling back an Iceberg table, adhere to the following best practices for better result:
  • Rollback operation: Use rollback to revert the table to a previous version. Key points include:
    • Use ROLLBACK TO SNAPSHOT to create a new snapshot pointing to the old version.
    • A rollback operation only updates metadata, it doesn’t delete any data files.
    • A rollback operation is nearly instant since it’s metadata-only.
  • What gets restored: Rolling back affects the following aspects of the table:
    • Table data via file pointers
    • Table schema
    • Partition specification
    • Table properties
  • Safety considerations: Rollbacks are safe and auditable. Keep in mind:
    • Original snapshots remain intact
    • You can undo a rollback if needed
    • You don’t lose data
    • All operations are fully auditable
The following example shows different ways to perform a rollback.
-- List available snapshots
CALL iceberg.system.snapshots('iceberg_db.sales');

-- Rollback to specific snapshot ID
CALL iceberg.system.rollback_to_snapshot('iceberg_db.sales', 1234567890);

-- Rollback to timestamp (finds nearest snapshot)
CALL iceberg.system.rollback_to_timestamp(
    'iceberg_db.sales',
    TIMESTAMP '2025-01-20 00:00:00'
);

-- Verify rollback worked
SELECT COUNT(*), MAX(dt) 
FROM iceberg_db.sales;

Work with snapshot IDs and timestamps

You can identify and reference specific snapshots using snapshot IDs or commit timestamps. This is helpful for precise table version control. When working with Iceberg snapshots, adhere to the following best practices for better result:
  • Snapshot ID: Each snapshot has a unique identifier. Key points include:
    • Snapshot ID is a long integer
    • Expired snapshot ID’s aren’t reused
    • Snapshot ID enables exact and reproducible queries
  • Commit timestamp: Snapshots also record their creation time. It includes a few additional points:
    • Stored as a timezone-aware timestamp
    • Used for “as-of” queries to access historical table states
    • Timestamp may not match exactly, Iceberg picks the nearest snapshot
  • Listing snapshots: You can inspect table history and operations. The additional points include:
    • Ability to query system procedures to list snapshots
    • View snapshot metadata, such as operations and summaries
    • Understand the type of change recorded in each snapshot
The following example shows how to list, query, and find snapshots at specific timestamps.
-- List all snapshots with details
CALL iceberg.system.snapshots('iceberg_db.sales');

/* Output columns:
   committed_at: Timestamp when snapshot created
   snapshot_id: Unique snapshot identifier
   parent_id: Previous snapshot ID
   operation: append, overwrite, delete, etc.
   summary: Statistics about the operation
*/

-- Query specific snapshot
SELECT * FROM iceberg_db.sales
FOR SYSTEM_VERSION AS OF 5234567890123456789;  -- Exact snapshot

-- Find snapshot at specific time
SELECT 
    snapshot_id,
    committed_at,
    operation
FROM iceberg_db.sales.snapshots
WHERE committed_at <= TIMESTAMP '2025-01-20 00:00:00'
ORDER BY committed_at DESC
LIMIT 1;

-- Use that snapshot ID for query
SELECT * FROM iceberg_db.sales
FOR SYSTEM_VERSION AS OF <snapshot_id_from_above>;
When working with snapshot IDs and timestamps, follow these best practices:
  • Document important snapshot IDs
  • Use timestamps for ad-hoc queries
  • Use snapshot IDs for reproducible reports
  • List snapshots before rollback operations
  • Coordinate snapshot retention with audit requirements

Data pruning and maintenance operations

Data pruning and maintenance operations improve query performance by reducing the amount of data scanned. They also keep table files and metadata optimized for fast and consistent queries.

Compaction

You can consolidate small data files into larger files to improve query performance and reduce metadata overhead. Compaction is important because small files can affect query performance in the following ways:
  • Many small data files increase metadata operations and API calls
  • More data files make query planning slower
  • Opening many small files is less efficient than reading fewer large files
  • Each data file adds to metadata, creating bloat in the manifest file due to file tracking
Consider running compaction in the following scenarios:
  • After high-volume inserts with small batches
  • Following CDC loads that create many small files
  • When streaming ingestion creates hourly micro-batches
  • As part of a scheduled maintenance, either daily or weekly
  • When query planning time exceeds acceptable threshold
There are two types of compaction to consider:
  • Data file compaction: Merges small Parquet files into larger ones
  • Manifest compaction: consolidates manifest metadata files
They both perform data compaction first, then manifest optimization. The following example shows different ways to run data file compaction in Iceberg.
-- Compact all data files in a table
CALL iceberg.system.rewrite_data_files('iceberg_db.sales');

-- Compact specific partition only
CALL iceberg.system.rewrite_data_files(
    table => 'iceberg_db.sales',
    where => 'dt = ''2025-01-24'''
);

-- Compact with target file size
CALL iceberg.system.rewrite_data_files(
    table => 'iceberg_db.sales',
    options => map('target-file-size-bytes', '536870912')  -- 512MB
);

-- Compact files smaller than threshold
CALL iceberg.system.rewrite_data_files(
    table => 'iceberg_db.sales',
    options => map('min-file-size-bytes', '104857600')  -- Only files < 100MB
);

Manifest optimization

You can optimize query planning by managing how Iceberg stores and maintains manifest metadata. Manifest files accumulate naturally as the table evolves, which results in the following:
  • Each write operation creates or updates manifest files
  • Manifest files track data files and their statistics
  • Many small manifests build up over time
  • Each additional manifest adds to the query planning overhead
As the manifest count grows, query planning becomes slower, which results in the following:
  • The query engine must read every manifest file during planning
  • More manifests mean more S3 read operations
  • Planning time increases before query execution starts
  • Metadata storage costs rise with the manifest count
Rewriting manifests helps reduce this overhead and improve performance in the following ways:
  • Consolidates many small manifest files into fewer ones
  • Reduces the total manifest count significantly
  • Speeds up query planning
  • Works efficiently when run periodically as part of maintenance
The following example shows different ways to optimize manifest metadata by rewriting manifest files.
-- Rewrite manifest files for table
CALL iceberg.system.rewrite_manifests('iceberg_db.sales');

-- Rewrite manifests for specific partitions
CALL iceberg.system.rewrite_manifests(
    table => 'iceberg_db.sales',
    where => 'dt >= ''2025-01-01'''
);

-- Check manifest count before/after
SELECT COUNT(*) as manifest_count
FROM iceberg_db.sales.manifests;

Expire old snapshots

You can control storage growth and metadata size by expiring snapshots that are no longer needed. Snapshots naturally accumulate in the following ways as the table changes:
  • Each write operation creates a new snapshot
  • Older snapshots continue to reference files that are no longer part of the table
  • Snapshot metadata grows over time
  • Unreferenced files increase S3 storage costs
As snapshots accumulate, it becomes necessary to remove older ones. Consider the following before deleting the old snapshots:
  • Storage costs have grown noticeably
  • Time-travel history exceeds auditing or debugging needs
  • Enough time has passed to meet retention requirements. This is commonly 7-90 days.
  • Snapshot expiration runs as part of scheduled maintenance
When expiring snapshots, the following retention strategies control what Iceberg removes or keeps:
  • Keep the most recent N snapshots to retain the most recent table changes.
  • Expire snapshots older than a timestamp to enforce time-based retention.
  • Combine the previous two strategies that keep the most recent N snapshots or anything newer than a given timestamp.
The following example shows how to expire old Iceberg snapshots using different retention strategies.
-- Expire snapshots older than a specific date
CALL iceberg.system.expire_snapshots(
    'iceberg_db.sales',
    TIMESTAMP '2025-01-01 00:00:00'
);

-- Keep only the last 10 snapshots
CALL iceberg.system.expire_snapshots(
    table => 'iceberg_db.sales',
    retain_last => 10
);

-- Combined: Keep the last 5 OR newer than 30 days
CALL iceberg.system.expire_snapshots(
    table => 'iceberg_db.sales',
    older_than => TIMESTAMP '2024-12-25 00:00:00',
    retain_last => 5
);

-- Aggressive cleanup for space reclamation
CALL iceberg.system.expire_snapshots(
    table => 'iceberg_db.sales',
    older_than => CURRENT_TIMESTAMP - INTERVAL 7 DAY,
    retain_last => 3
);

Remove orphan files

You can reclaim storage and keep Iceberg tables clean by removing orphan files that are no longer tracked in metadata. Orphan files appear in a table for several reasons:
  • Write operations that fail mid-execution
  • Aborted transactions that don’t commit
  • Manual file operations ran directly in S3
  • Partially completed maintenance tasks, such as failed compactions
These orphan files can create the following problems over time:
  • They consume S3 storage unnecessarily
  • They aren’t tracked in table metadata
  • Normal Iceberg operations can’t remove them
  • They accumulate if not cleaned up regularly
The following example shows how to safely remove orphan files in Iceberg, with options for retention, dry runs, and targeting specific locations.
-- Remove orphan files (safe with 3-day retention)
CALL iceberg.system.remove_orphan_files('iceberg_db.sales');

-- Custom retention period (7 days)
CALL iceberg.system.remove_orphan_files(
    table => 'iceberg_db.sales',
    older_than => TIMESTAMP '2025-01-17 00:00:00'
);

-- Dry run to see what would be deleted
CALL iceberg.system.remove_orphan_files(
    table => 'iceberg_db.sales',
    dry_run => true
);

-- Remove from specific location
CALL iceberg.system.remove_orphan_files(
    table => 'iceberg_db.sales',
    location => 's3://datalake/iceberg/sales/data/'
);

Table layout optimization

Optimizing a table layout helps organize how Iceberg stores data within table files. You can optimize the table layout by setting a proper sort order in your Iceberg tables. This involves the following steps:
  1. Defining the column order within data files so related data stays together.
  2. Co-locating related data to improve the accuracy of file statistics.
  3. Applying sorting during writes and compaction to maintain this layout.
Setting a proper sort order provides several performance benefits:
  • Enables more effective metadata pruning using min/max statistics
  • Speeds up scans since related data were previously stored together
  • Makes future compaction and maintenance more efficient
  • Allows queries to skip large portions of files
The following example shows how to set a sort order in Iceberg tables to improve data layout and query performance.
-- Set sort order on existing table
ALTER TABLE iceberg_db.sales
WRITE ORDERED BY (dt, customer_id);

-- Create table with sort order
CREATE TABLE iceberg_db.orders (
    order_id BIGINT,
    customer_id BIGINT,
    order_date DATE,
    amount DOUBLE
) USING iceberg
PARTITIONED BY (order_date)
TBLPROPERTIES (
    'write.format.default'='parquet',
    'write.parquet.compression-codec'='zstd'
)
LOCATION 's3://datalake/iceberg/orders/';

ALTER TABLE iceberg_db.orders
WRITE ORDERED BY (order_date, customer_id);

-- Multiple sort columns
ALTER TABLE iceberg_db.events
WRITE ORDERED BY (event_date, event_type, user_id);

Running maintenance operations

You can keep Iceberg tables healthy by running regular maintenance operations. Recommended schedules vary based on write volume. Some of these schedules include the following:
  • Daily maintenance: These are for high-write tables and require that you do the following:
    • Compact data files after overnight ingestion
    • Check for orphan files from failed jobs
  • Weekly maintenance: These are for medium-write tables and require the following:
    • Run full compaction cycles
    • Optimize manifests
    • Expire snapshots older than 30 days
  • Monthly maintenance: These are for low-write tables and require the following:
    • Perform comprehensive cleanups
    • Remove orphan files
    • Expire very old snapshots
Automation helps keep these maintenance consistent. For example, an Airflow DAG can handle compaction and snapshot expiration automatically:
from airflow.decorators import dag, task
from datetime import datetime

@dag(schedule_interval="0 2 * * 0", start_date=datetime(2025, 1, 1))
def iceberg_maintenance():
    
    @task.pyspark
    def compact_tables(spark, sc):
        spark.sql("CALL iceberg.system.rewrite_data_files('iceberg_db.sales')")
        spark.sql("CALL iceberg.system.rewrite_manifests('iceberg_db.sales')")
    
    @task.pyspark
    def expire_snapshots(spark, sc):
        spark.sql("""
            CALL iceberg.system.expire_snapshots(
                'iceberg_db.sales',
                TIMESTAMP '2024-12-01 00:00:00',
                retain_last => 10
            )
        """)
    
    compact_tables() >> expire_snapshots()

dag = iceberg_maintenance()
When you aren’t sure when you need to perform a maintenance operation, you can identify this by monitoring the following indicators:
  • Too many small files: Query planning exceeds 30 seconds
  • Large manifest lists: More than 100 manifests per snapshot
  • Slow planning times: Increased query startup time
  • High metadata storage: Manifest files exceeding 1 GB
  • Storage growth: Faster than data ingestion rate
The following example shows a typical maintenance workflow.
-- Step 1: Check table health
SELECT 
    COUNT(*) as file_count,
    AVG(file_size_in_bytes) / 1024 / 1024 as avg_file_mb
FROM iceberg_db.sales.files;

SELECT COUNT(*) as manifest_count
FROM iceberg_db.sales.manifests;

-- Step 2: Compact if many small files
CALL iceberg.system.rewrite_data_files('iceberg_db.sales');

-- Step 3: Optimize manifests
CALL iceberg.system.rewrite_manifests('iceberg_db.sales');

-- Step 4: Expire old snapshots
CALL iceberg.system.expire_snapshots(
    table => 'iceberg_db.sales',
    older_than => CURRENT_TIMESTAMP - INTERVAL 30 DAY,
    retain_last => 10
);

-- Step 5: Remove orphan files
CALL iceberg.system.remove_orphan_files('iceberg_db.sales');

-- Step 6: Verify improvements
SELECT 
    COUNT(*) as file_count,
    AVG(file_size_in_bytes) / 1024 / 1024 as avg_file_mb
FROM iceberg_db.sales.files;

Logging and monitoring

Iceberg provides tools to monitor table operations and performance. You can also debug issues using table metadata, snapshots, and Spark logs.

Monitor Iceberg metadata

You can monitor Iceberg table operations, including snapshots, schema changes, and table operations. When monitoring Iceberg tables, adhere to the following best practices for better result:
  • Snapshot monitoring: Snapshots let you see how a table evolves. To monitor a snapshot, do the following:
    • Query the snapshots system table.
    • View the operation history.
    • Examine the data and file statistics.
  • Schema evolution monitoring: Monitor changes to a table schema to understand its history. Consider the following when monitoring:
    • Review the schema history.
    • Identify the added or removed columns’ timestamp.
    • Monitor changes to the partition specification.
  • Operational monitoring: Monitor table activity to maintain performance and reliability. This includes:
    • Monitoring write operations
    • Monitoring compaction activities
    • Reviewing maintenance procedures
The following example shows how to monitor an Iceberg table’s snapshots, schema changes, operations, and file statistics.
-- List snapshots for a table
CALL iceberg.system.snapshots('iceberg_db.sales');

-- View detailed snapshot history
SELECT 
    committed_at,
    snapshot_id,
    parent_id,
    operation,
    summary
FROM iceberg_db.sales.snapshots
ORDER BY committed_at DESC
LIMIT 20;

-- Track recent write operations
SELECT 
    committed_at,
    operation,
    summary['added-records'] as records_added,
    summary['added-data-files'] as files_added
FROM iceberg_db.sales.snapshots
WHERE committed_at >= CURRENT_TIMESTAMP - INTERVAL 7 DAY
  AND operation IN ('append', 'overwrite')
ORDER BY committed_at DESC;

-- View schema changes
DESCRIBE HISTORY iceberg_db.sales;

-- Check table properties
SHOW TBLPROPERTIES iceberg_db.sales;

-- Monitor file statistics
SELECT 
    COUNT(*) as total_files,
    SUM(file_size_in_bytes) / 1024 / 1024 / 1024 as total_size_gb,
    AVG(file_size_in_bytes) / 1024 / 1024 as avg_file_mb
FROM iceberg_db.sales.files;

Debug failed writes and rollback

You can investigate write failures, identify root causes, and recoveries from failed operations using snapshots and logs. When debugging Iceberg operations, adhere to the following best practices for better result:
  • Common failure scenarios: Typical issues you may encounter include:
    • Permission errors due to Ranger policies
    • Connectivity problems to S3
    • Metastore unavailability
    • Conflicts from concurrent writes
  • Investigation steps: To diagnose problems, the following steps should help:
    • Check Spark or Kyuubi logs for errors.
    • Review snapshot history to identify anomalies.
    • Identify orphan files left by failed operations.
    • Verify table and storage permissions.
  • Recovery actions: After identifying the cause, consider these fixes:
    • Roll back to the last good snapshot.
    • Remove any orphan files.
    • Retry writes after resolving the underlying issue.
The following SQL example shows how to check snapshots, roll back, and clean up orphan files in Iceberg.
-- Check recent snapshot history for anomalies
SELECT 
    committed_at,
    snapshot_id,
    operation,
    summary
FROM iceberg_db.sales.snapshots
WHERE committed_at >= CURRENT_TIMESTAMP - INTERVAL 1 HOUR
ORDER BY committed_at DESC;

-- Rollback to previous snapshot if write failed
CALL iceberg.system.snapshots('iceberg_db.sales');

-- Rollback to last good snapshot
CALL iceberg.system.rollback_to_snapshot('iceberg_db.sales', 1234567890);

-- Check for orphan files from failed write
CALL iceberg.system.remove_orphan_files(
    table => 'iceberg_db.sales',
    dry_run => true,
    older_than => CURRENT_TIMESTAMP - INTERVAL 1 HOUR
);

-- Clean up orphans if found
CALL iceberg.system.remove_orphan_files('iceberg_db.sales');
The following Python example shows how to append data to an Iceberg table in Spark and inspect logs when the write fails.
# Inspect failed write logs in Spark
df = spark.read.parquet("s3://landing/sales/2025-01-24/")

try:
    df.write.format("iceberg").mode("append").save("iceberg_db.sales")
except Exception as e:
    print(f"Write failed: {str(e)}")
    # Check Spark driver/executor logs for detailed errors
    # Navigate to the Spark History Server for full stack trace

Access Kyuubi and Spark logs

You can access execution logs from Kyuubi and Spark for detailed error investigation and performance debugging. The Spark History Server helps you inspect app-level executions. Key steps to inspect execution logs include:
  1. Navigate to https://spark-history.<client>.nx1cloud.com/.
    When you purchase NexusOne, you receive a client name. Replace client with your assigned client name.
  2. Find your app by its timestamp or name.
  3. View driver and executor logs.
  4. Review task-level failures for troubleshooting.
Knowing where Kyuubi and Spark store different logs helps pinpoint issues. The main locations include:
  • Driver stdout/stderr: App-level errors
  • Executor stderr: Task-level failures
  • Event logs: Detailed execution timeline
After an error occurs, Iceberg operations produce different error types. Pay attention to the following error types:
  • org.apache.iceberg.exceptions.CommitFailedException: Occurs with concurrent writes
  • java.nio.file.AccessDeniedException: Indicates permission issues
  • org.apache.hadoop.fs.s3a.AWSClientIOException: S3 connectivity problems
  • java.lang.OutOfMemoryError: Insufficient memory for execution

Security and access control

Iceberg security relies on controlling access at the catalog and storage layer so it can protect table metadata and data files.

Access control via a Hive Metastore

Within NexusOne, Iceberg enforces table-level access control via a Hive Metastore integration so it can control who can read, write, or modify Iceberg tables. When managing access to Iceberg tables, adhere to the following best practices for better result:
  • Permission types: Different permissions control how users interact with tables. Common permissions include:
    • SELECT to read table data
    • INSERT to write new data
    • UPDATE and DELETE to modify existing data
    • ALTER to change the table schema
    • DROP to remove the table entirely
  • Grant patterns: How you grant permissions affects long-term manageability. Recommended patterns include:
    • Granting permissions to specific users when needed
    • Granting permissions to roles instead of individuals
    • Using Role-Based Access Control (RBAC) for scalability
  • Metastore integration: Access control relies on metastore enforcement. Key points include:
    • Permissions live in the Hive Metastore
    • Permissions enforced during query execution and across all Iceberg operations
The following example shows how to grant, revoke, and inspect permissions on Iceberg tables.
-- Grant SELECT permission to role
GRANT SELECT ON TABLE iceberg_db.sales TO ROLE analyst;

-- Grant multiple permissions
GRANT SELECT, INSERT ON TABLE iceberg_db.sales TO ROLE data_engineer;

-- Grant to specific user
GRANT SELECT ON TABLE iceberg_db.customers TO USER john.doe;

-- Grant ALL privileges to admin role
GRANT ALL ON TABLE iceberg_db.sales TO ROLE admin;

-- Revoke permissions
REVOKE INSERT ON TABLE iceberg_db.sales FROM ROLE analyst;

-- View current permissions
SHOW GRANT ON TABLE iceberg_db.sales;

S3 bucket permissions and Ranger policies

You can control data access at the storage layer through S3 bucket policies and Apache Ranger for fine-grained authorization. Iceberg relies on S3 permissions to control access to table data and metadata. Iceberg requires the following permissions for common operations:
  • Read access to query Iceberg tables
  • Write access for data ingestion and write operations
  • List access to discover data and metadata files
  • Delete access for maintenance tasks such as compaction and cleanup
For centralized security management, Apache Ranger integrates with Iceberg to manage access control. Key capabilities include:
  • Policy-based access control to define who can access which resources
  • Fine-grained permissions at the database, table, and column levels
  • Audit logging to track access attempts and policy evaluations
  • Dynamic policies that update permissions without service restarts
In practice, permissions are often assigned based on user roles. Common scenarios include:
  • Read-only analysts:
    • SELECT access to specific tables
    • Read-only access to S3
    • No ALTER or INSERT permissions
  • Data engineers:
    • SELECT, INSERT, UPDATE, and DELETE on tables
    • Read and write access to S3
    • ALTER permissions for schema changes
  • Table administrators:
    • Full permissions on tables
    • Complete S3 access, including delete
    • Ability to drop tables and manage policies
Contact the Platform team to:
  • Configure Ranger policies for new tables or databases
  • Request permission changes
  • Troubleshoot access denied errors
  • Review audit logs for security investigations

Integration with Spark

Iceberg is natively supported in Spark for seamless lakehouse operations.

Access Iceberg Tables using Spark SQL

You can query Iceberg tables using the Spark SQL engine with an automatic catalog resolution through Hive Metastore. When executing queries to access the Iceberg tables, adhere to the following best practices for better result:
  • Catalog resolution: The catalog resolves Iceberg tables by mapping table names to their metadata. This means:
    • You access the Iceberg tables via the Hive Metastore.
    • Use the database.table naming convention to reference tables.
    • Spark automatically recognizes the Iceberg table format.
  • SQL operations: You interact with Iceberg using standard SQL. This includes:
    • Regular SQL syntax for querying data
    • Data Definition Language (DDL) operations such as CREATE, ALTER, and DROP
    • Data Manipulation Language (DML) operations such as INSERT, UPDATE, DELETE, and MERGE
  • S3 storage paths: File locations stay abstracted away from queries. Key points include:
    • Data files live in S3-backed storage
    • Spark reads file paths from Iceberg metadata
    • Queries never need to reference storage paths directly
The following example shows how to access Iceberg tables using Spark SQL.
-- Query Iceberg table
SELECT id, amount FROM iceberg_db.sales WHERE dt = '2025-01-01';

-- Aggregation query
SELECT 
    dt,
    COUNT(*) as txn_count,
    SUM(amount) as total_amount
FROM iceberg_db.sales
WHERE dt >= '2025-01-01'
GROUP BY dt
ORDER BY dt;

-- Join tables
SELECT 
    s.id,
    s.amount,
    c.customer_name
FROM iceberg_db.sales s
JOIN iceberg_db.customers c ON s.customer_id = c.customer_id
WHERE s.dt = '2025-01-01';

-- DDL operations
CREATE TABLE iceberg_db.new_table (
    col1 STRING,
    col2 INT
) USING iceberg;

ALTER TABLE iceberg_db.sales ADD COLUMN new_col STRING;

-- DML operations
INSERT INTO iceberg_db.sales SELECT * FROM staging.sales;

UPDATE iceberg_db.sales SET amount = amount * 1.1 WHERE dt = '2025-01-01';

DELETE FROM iceberg_db.sales WHERE dt < '2024-01-01';

Read and write in Spark apps

You can use the Spark DataFrame API to read from and write to Iceberg tables programmatically in apps and notebooks. When working with Iceberg tables in Spark, adhere to the following best practices for better result:
  • Reading tables: Reading data is straightforward. Consider the following:
    • Use spark.read.format("iceberg").load() to load a table. This returns a DataFrame ready for transformations.
    • Iceberg supports all standard Spark DataFrame operations.
  • Writing tables: Writing data offers flexibility depending on your needs. Key considerations include:
    • Use df.write.format("iceberg") for basic writes.
    • Use df.writeTo() to leverage advanced Iceberg features.
    • Iceberg supports append and overwrite modes.
  • Application patterns: Iceberg fits well in various Spark workflows, including:
    • Read-transform-write pipelines
    • Complex multi-step transformations
    • Integration with other data sources
The following example shows how to read, transform, aggregate, and write to Iceberg tables using Spark.
# Read Iceberg table
df = spark.read.format("iceberg").load("iceberg_db.sales")
df.show()

# Filter and transform
df_filtered = df.filter(df.dt == '2025-01-01') \
    .select("id", "amount", "customer_id")

# Aggregation
from pyspark.sql import functions as F

df_summary = df.groupBy("dt") \
    .agg(
        F.count("*").alias("txn_count"),
        F.sum("amount").alias("total_amount")
    )

# Write to Iceberg table
df_filtered.write.format("iceberg").mode("append").save("iceberg_db.sales")

# Advanced write with writeTo
df_filtered.writeTo("iceberg_db.sales").append()

# Read from one table, write to another
df_source = spark.read.format("iceberg").load("iceberg_db.raw_sales")

df_transformed = df_source.filter(df_source.amount > 0) \
    .withColumn("processed_date", F.current_date())

df_transformed.writeTo("iceberg_db.processed_sales").append()

SQL vs. DataFrame API usage

When working with Iceberg in Spark, you can interact with tables using either Spark SQL or the Spark DataFrame API. The Spark SQL approach includes the following characteristics:
  • Uses declarative SQL syntax
  • Easier for analysts familiar with SQL
  • Good for ad-hoc queries
  • Offers limited programmatic control
The Spark DataFrame API approach includes the following characteristics:
  • Provides full programmatic control
  • Better for complex transformations
  • Integrates smoothly with Python or Scala code
  • Reusable within apps
Choosing the right approach depends on your use case. Use Spark SQL when the goal includes:
  • Writing ad-hoc analytical queries
  • Managing tables with DDL
  • Performing simple filtering and aggregations
  • Working interactively in notebooks
Use the DataFrame API when the goal includes:
  • Building production Spark pipelines
  • Performing complex, multi-step transformations
  • Needing programmatic control flow
  • Integrating with other systems
The following example shows how to use a Spark SQL query to analyze an Iceberg table quickly.
-- SQL: Good for quick analysis
SELECT 
    dt,
    region,
    SUM(amount) as total
FROM iceberg_db.sales
WHERE dt >= '2025-01-01'
GROUP BY dt, region
ORDER BY dt, total DESC;
The following example shows how to use the Spark DataFrame API to build a multi-step transformation pipeline and write the results back to an Iceberg table.
# DataFrame API: Good for pipelines
from pyspark.sql import functions as F

df = spark.read.format("iceberg").load("iceberg_db.sales")

# Complex transformation pipeline
df_result = df \
    .filter(F.col("dt") >= "2025-01-01") \
    .filter(F.col("amount") > 0) \
    .withColumn("amount_category", 
                F.when(F.col("amount") > 1000, "high")
                 .when(F.col("amount") > 100, "medium")
                 .otherwise("low")) \
    .groupBy("dt", "region", "amount_category") \
    .agg(
        F.count("*").alias("txn_count"),
        F.sum("amount").alias("total_amount")
    ) \
    .orderBy("dt", F.desc("total_amount"))

# Write results
df_result.writeTo("iceberg_db.sales_analysis").overwritePartitions()

Catalog resolution and storage paths

In NexusOne, Spark resolves Iceberg table locations through the Hive Metastore and accesses data files in S3. The process includes the following steps:
  1. Hive Metastore catalog: When Spark uses the Hive Metastore, it does the following:
    • Queries the metastore for table metadata
    • Returns the table’s S3 location and Iceberg format information
  2. Table lookup: For example, when running SELECT * FROM iceberg_db.sales, Spark does the following:
    • Queries the metastore for table metadata
    • Retrieves the S3 location: s3://datalake/iceberg/sales/
    • Reads Iceberg metadata files from that location
  3. Data file discovery: Iceberg metadata tells Spark which files to read. Keep in mind:
    • Iceberg metadata contains data file paths
    • Spark reads only the required files
    • Partition pruning applies automatically
S3 storage paths organize Iceberg tables and files effectively. Key paths include:
  • Table location: Base S3 path for the table
  • Data directory: s3://datalake/iceberg/sales/data/
  • Metadata directory: s3://datalake/iceberg/sales/metadata/
  • Partition paths: Organized by partition values for Iceberg internal use
The following example shows how Spark resolves an Iceberg table location and reads data files from S3.
# Spark automatically resolves table location
df = spark.read.format("iceberg").load("iceberg_db.sales")

# Behind the scenes:
# 1. Spark queries Hive Metastore for "iceberg_db.sales"
# 2. Metastore returns: 
#    - Location: s3://datalake/iceberg/sales/
#    - Format: iceberg
# 3. Spark reads metadata from: s3://datalake/iceberg/sales/metadata/
# 4. Metadata contains list of data files in: s3://datalake/iceberg/sales/data/
# 5. Spark reads only necessary data files based on query

# View table location
spark.sql("DESCRIBE EXTENDED iceberg_db.sales").filter("col_name = 'Location'").show()

# Manual path access (not recommended - use catalog instead)
# df = spark.read.format("iceberg").load("s3://datalake/iceberg/sales/")
The following configuration shows how Spark connects to the Iceberg catalog.
# Spark configuration for Iceberg catalog
spark.conf.get("spark.sql.catalog.iceberg.type")  # Returns: hive
spark.conf.get("spark.sql.catalog.iceberg.uri")   # Returns: thrift://metastore:9083
spark.conf.get("spark.sql.catalog.iceberg.warehouse")  # Returns: s3://datalake/warehouse/
When working with Iceberg tables in Spark, follow these best practices:
  • Always access tables via the catalog using the database.table format
  • Avoid direct access to S3 paths
  • Let Spark handle catalog resolution automatically
  • Verify Metastore connectivity before running operations
  • Use fully qualified table names for clarity

Performance and optimization

To improve query performance in Iceberg, you can take advantage of metadata pruning and caching.

Query-time partition pruning

You can automatically skip reading entire partitions that don’t match query predicates using query-time partition pruning. This dramatically reduces the data scanned. The following describes how query-time partition pruning works:
  1. Query parser identifies partition column filters.
  2. Iceberg checks partition values in metadata.
  3. Iceberg scans only matching partitions.
  4. Iceberg skips irrelevant partitions entirely.
Partition pruning works effectively with the following requirements:
  • The table is already partitioned
  • Filter includes partition columns
  • Predicates use simple conditions such as equality or range
Performance also significantly impacts partition pruning in the following ways:
  • Reduces data scanned by 90% or more for time-series workloads
  • Applies the fastest optimization automatically
  • Operates without additional configuration
The following example shows how using partition pruning can make queries much faster.
-- Excellent: Partition pruning on date column
SELECT * FROM iceberg_db.sales
WHERE dt = '2025-01-01';  -- Scans only 1 day's partition

-- Good: Range query with partition pruning
SELECT * FROM iceberg_db.sales
WHERE dt BETWEEN '2025-01-01' AND '2025-01-07';  -- Scans 7 days

-- Poor: No partition filter
SELECT * FROM iceberg_db.sales
WHERE amount > 1000;  -- Must scan all partitions

Metadata and manifest pruning

You can use file-level min/max statistics to skip data files that don’t contain matching values without reading file contents. The following describes how metadata and manifest pruning work using Iceberg statistics:
  1. Each manifest file stores min/max values for each column
  2. The query compares its predicates against these statistics
  3. Iceberg skips files that fall outside the predicate range
  4. Only relevant files are actually read, reducing unnecessary file opens and I/O
Manifest pruning works best under the following conditions:
  • Sorted or clustered data
  • Range queries on non-partition columns
  • Well-maintained tables compacted regularly
Keep the following limitations in mind:
  • Works only on first-level predicates
  • Less effective on randomly distributed data
  • Requires accurate column statistics
The following example shows how Iceberg uses manifest and partition pruning to limit data scanned and improve query performance.
-- Manifest pruning on non-partition column
SELECT * FROM iceberg_db.sales
WHERE region = 'EU';  -- Uses manifest stats to skip files

-- Combined: Partition + manifest pruning
SELECT * FROM iceberg_db.sales
WHERE dt = '2025-01-01'  -- Partition pruning
  AND amount > 10000;    -- Manifest pruning on amount column

-- Check statistics
SELECT 
    file_path,
    partition,
    record_count,
    lower_bounds['amount'] as min_amount,
    upper_bounds['amount'] as max_amount
FROM iceberg_db.sales.files
WHERE partition = 'dt=2025-01-01';

Snapshot pruning for queries

You can optimize time travel queries by efficiently locating and reading historical snapshots without scanning intermediate states. The following describes how snapshot pruning works for queries in Iceberg:
  1. Snapshot metadata lives in manifest lists
  2. Time-travel targets a specific snapshot directly
  3. Iceberg avoids replaying the entire write history
  4. Historical queries run quickly
Snapshot pruning works best in the following scenarios:
  • Auditing: Checking the table’s state at a specific point in time
  • Debugging: Investigating issues using historical data
  • Compliance: Accessing data as of regulatory or reporting dates
Performance also significantly impacts snapshot pruning in the following ways:
  • Historical queries run nearly as fast as current queries
  • Query speed depends on snapshot retention
  • Queries may be slower for archived files in a different storage tier
The following example shows how snapshot pruning lets you query historical table states and compare them with current data efficiently.
-- Time travel with snapshot pruning
SELECT * FROM iceberg_db.sales
FOR SYSTEM_TIME AS OF '2025-01-20';  -- Fast snapshot lookup

-- Compare current vs historical efficiently
WITH current AS (
    SELECT dt, SUM(amount) as total
    FROM iceberg_db.sales
    GROUP BY dt
),
historical AS (
    SELECT dt, SUM(amount) as total
    FROM iceberg_db.sales
    FOR SYSTEM_TIME AS OF '2025-01-15'
    GROUP BY dt
)
SELECT 
    c.dt,
    c.total as current_total,
    h.total as historical_total,
    c.total - h.total as difference
FROM current c
JOIN historical h ON c.dt = h.dt;

Metadata caching

You can cache table metadata in memory to speed up query planning and reduce metastore load. The following metadata gets cached:
  • Table schemas
  • Partition specifications
  • Manifest lists
  • File metadata
Metadata caching works effectively when you configure the following:
  • Enable caching per catalog.
  • Set a Time To Live (TTL) for cache entries.
  • Configure the cache size.
Metadata caching provides the following performance benefits:
  • Faster query planning from seconds to milliseconds
  • Reduced metastore load
  • Better concurrency for read-heavy workloads
The following example shows how to enable and configure metadata caching in Spark for faster Iceberg queries.
# Enable metadata caching in Spark
spark.conf.set("spark.sql.catalog.iceberg.cache-enabled", True)

# Configure cache TTL (time-to-live)
spark.conf.set("spark.sql.catalog.iceberg.cache.expiration-interval-ms", "300000")  # 5 minutes

# Set cache size
spark.conf.set("spark.sql.catalog.iceberg.cache.max-total-entries", "10000")

# Query benefits from cached metadata
df = spark.read.format("iceberg").load("iceberg_db.sales")
df.filter("dt = '2025-01-01'").show()  # Fast planning due to cached manifests
When using Metadata caching, follow these best practices:
  • Enable caching for read-heavy tables.
  • Set appropriate TTL based on write frequency.
  • Monitor cache hit rates.
  • Disable for frequently written tables.
  • Use separate caches for different catalogs.

Additional resource

  • For more details about Apache Iceberg, refer to the Apache Iceberg official documentation.
  • For more details about how NexusOne integrates Apache Spark, refer to the Apache Spark.