Key features
- ACID transactions: Keep data consistent using serializable or snapshot isolation. It defaults to a serializable isolation and ensures that multiple jobs can read from and write to a table at the same time, with conflicting writes detected at commit and retried.
- Schema evolution: Add, drop, or rename columns without rewriting data, with schema changes tracked in table metadata, so all readers see the updated schema.
- Partitioning flexibility: Allows you to change partitioning schemes, for example, by date instead of by region. You can do this without touching existing files. New data uses the new partitioning scheme, old data remains accessible, and queries scan only relevant partitions.
- Time travel: Query historical table states using snapshot IDs or timestamps for auditing, debugging, and reproducible analytics.
Iceberg components
Iceberg’s architecture comprises the following components from top to bottom:- Catalog: Contains a pointer to an Iceberg table’s metadata file
- Metadata file: Contains all information about a table. It’s a
metadata.jsonfile, and it comprises the following:- Snapshots: Records the table’s content at specific points in time
- Manifest lists: Tracks all manifest files in a snapshot
- Manifest files: Tracks data files
- Data files: Parquet, ORC, or Avro files storing the table’s data
- Storage layer: S3, HDFS, or other object stores holding the metadata and data files
Environment configuration
This section describes how the NexusOne team configured Iceberg. The current version information supported by the NexusOne environment includes the following details:- Iceberg version:
1.8 - Table format version:
2.0 - File format: Defaults to Parquet, but also supports Avro and ORC
- Catalog type: Hive Metastore
Iceberg best practices
This section provides practical Iceberg examples that adhere to best practices.Managing tables
You can create, alter, and drop tables with ACID guarantees. Spark SQL or the DataFrame API helps execute these operations, and they also leverage the Hive Metastore for cataloging.Create Iceberg tables
Creating an Iceberg table lets you store and manage large datasets with ACID guarantees and efficient queries. When creating an Iceberg table, adhere to the following best practices for better result:- Table structure: How you structure your table affects data storing and access.
Key considerations:
- Define column names and data types to shapes the table schema.
- Specify the
USINGiceberg clause to determine the table format. - Set the table location in S3 to determine where you store data and metadata.
- The Hive Metastore then automatically registers the table.
- Partitioning strategy: How you organize your data impacts query performance.
Key considerations:
- Use the
PARTITIONED BYclause with columns of low-to-medium cardinality to improve efficiency. - Choose columns such as date, region, or category to help queries skip irrelevant data.
- Use proper partitioning to allow queries to scan only relevant data.
- Use the
- Table location: A clear location structure helps manage both data and metadata files
effectively. Key considerations:
- Specify the S3 path with the
LOCATIONclause to determine where files reside. - Follow a consistent pattern, such as
s3://datalake/iceberg/schema/table/to improve organization. - Organize data and metadata files effectively to simplify maintenance.
- Specify the S3 path with the
- Hive Metastore integration: Proper integration ensures consistent access and
metadata management. Key actions include:
- The table registers automatically in the catalog
- Metadata stays synchronized between Iceberg and Hive
- Using fully qualified names ensures consistent table access
Drop Iceberg tables
Dropping an Iceberg table removes the table definition and associated Iceberg metadata while optionally deleting the underlying data. When dropping an Iceberg table, adhere to the following best practices for better result:- Dropping a table in the standard way: When you want to remove the table from a catalog
but keep the data intact using
DROP TABLE, keep in mind that this does the following:- Removes the table from the Hive Metastore
- Keeps data files in S3 by default
- Retains associated Iceberg metadata in the table location
- Makes the table schema inaccessible through the catalog
- Dropping a table with a purge: When you want to permanently delete both a table and
its data using a
DROP TABLEwithPURGE, keep in mind that this does the following:- Removes the table from the catalog
- Deletes all data and associated Iceberg metadata files from S3
- Permanently removes the table with no recovery option
- Requires extra caution in production environments
- Safety considerations: Always take precautions before dropping a table.
Consider the following:
- Verify the table name before dropping it.
- Check for downstream dependencies.
- Take a final snapshot before deletion.
- Coordinate with the team when are all sharing a table.
Alter table schema
Altering a table schema lets you add, drop, or change column types without rewriting data files. When altering an Iceberg table, adhere to the following best practices for better result:- Adding columns: How you add new columns determines whether existing data changes
or downtime occurs. Use an
ALTER TABLE ADD COLUMNsyntax. This adds the column to the table schema, and old rows read asNULLfor the new column. - Dropping columns: Remove columns safely without touching existing data files.
Use the
ALTER TABLE DROP COLUMNsyntax. - Changing column types: Only perform safe type promotions to avoid data loss.
Use the
ALTER TABLE ALTER COLUMNsyntax.- Allowed changes include:
INTtoBIGINTFLOATtoDOUBLE
- Disallowed changes include:
STRINGtoINT- Other unsafe type conversions that risk data loss
- Allowed changes include:
- Renaming columns: Change column names logically in metadata only. Physical column names in Parquet files remain unchanged, this ensures backward compatibility.
Use a create table as select
Create Table as Select (CTAS) creates a new Iceberg table and populates it with data from aSELECT query in a single operation. Then it uses the query results to
automatically define the table’s schema.
When using a CTAS in an Iceberg table, adhere to the following best practices for better result:
- Partitioning in CTAS: How you define partitions affects query performance.
Consider the following:
- Add the
PARTITIONED BYclause before theASkeyword. - Ensure partition columns appear in the
SELECTlist. - Iceberg creates partitions based on the actual data values.
- Add the
- Why use CTAS: CTAS creates a table and loads data in one step. The advantages include:
- Iceberg automatically infers the schema from the
SELECTquery results. - Iceberg creates tables and summaries quickly.
- Schemas don’t need to be explicitly defined.
- Iceberg automatically infers the schema from the
- Common use cases: Typical scenarios where CTAS works well include:
- Creating summary tables from fact tables
- Materializing complex query results
- Copying tables with transformations
- Creating filtered subsets of data
Configure partitioning and clustering
When working with Iceberg partitions, adhere to the following best practices for better result:- Partitioning strategies: How you partition your data affects query performance.
The common strategies include:
- Identity partitioning: Partition by the exact column value.
- Date transforms: Extract year, month, day, or hour from date columns.
- Bucket transforms: Apply hash-based distribution for high-cardinality columns.
- Truncate transforms: Partition by string prefixes.
- Hidden partitioning: Iceberg lets you simplify queries while maintaining performance.
Key points include:
- Queries don’t need to reference partitions in the
WHEREclause. - Iceberg automatically prunes irrelevant partitions.
- Partition columns don’t appear separately in query results.
- Queries don’t need to reference partitions in the
- Choosing partition columns: Selecting the right partition columns is critical.
Consider the following:
- Use columns frequently referenced in
WHEREfilters. - Prefer low-to-medium cardinality columns.
- Choose common column names such as, date, region, category, or country.
- Avoid over-partitioning, ideally < 1 GB per partition.
- Use columns frequently referenced in
- Partition evolution: You can change partitioning without rewriting old data.
Consider the following:
- Modifying the partition spec for new data causes the new data to use the new partitions.
- Because old data remains in the original partitions, it continues to be accessible.
- As a result, both old and new data remain accessible seamlessly.
Iceberg tables in a Hive Metastore
Iceberg tables integrate with Hive Metastore to leverage Hive’s capabilities while keeping Iceberg’s advanced features like schema evolution and partitioning. Hive Metastore does the following:- Syncs table metadata between Iceberg and Hive
- Manages namespace resolution for Iceberg tables that appear as external tables in Hive
- Stores partition information in Iceberg metadata files
- Doesn’t require Hive partition syntax for Iceberg tables
- Use
SHOW PARTITIONSto view partition values. - Ensure a Metastore connection is available for table operations.
Reading/querying tables
Iceberg tables support standard SQL and DataFrame queries, while automatically optimizing reads with features like time travel, predicate pushdown, and column pruning.Query Iceberg tables with SQL
You can execute SQL queries against Iceberg tables with automatic partition pruning and column-level filtering for optimized performance. When querying Iceberg tables with SQL, adhere to the following best practices for better result:- SQL querying: Consider the following:
- Use fully qualified table names such as
catalog_db.table. - Write standard
SELECTqueries withWHERE,GROUP BY, andORDER BY. - Rely on Iceberg to handle partition pruning transparently.
- Avoid special partition syntax in queries.
- Use fully qualified table names such as
- Filtering and selection: How you define your query determines what Iceberg reads.
Consider the following:
- Use
WHEREclauses to filter rows. - Select only the required columns to limit data reads.
- Join Iceberg tables with other Iceberg or external tables.
- Apply aggregate functions like
SUM,COUNT, andAVG.
- Use
- Automatic optimizations: Iceberg optimizes queries at runtime.
These optimizations include:
- Partition pruning based on
WHEREpredicates - Column pruning to read only selected fields
- Predicate pushdown using file-level statistics
- Reduced data scanning and improved performance
- Partition pruning based on
Query Iceberg tables with the DataFrame API
You can use the Spark DataFrame API to read and query Iceberg tables programmatically. This provides flexibility for complex transformations and pipeline integration. When querying Iceberg tables with the DataFrame API, adhere to the following best practices for better result:- Reading Iceberg tables: How you load the table determines how you work with the data.
Consider the following:
- Use
spark.read.format("iceberg").load(). - Specify the table name or path when loading.
- Work with the Spark DataFrame result.
- Use
- DataFrame operations: Use the Spark DataFrame API to shape and analyze data.
Consider the following:
filter()for row-level filteringselect()for column projectionjoin()to combine multiple tablesgroupBy()andagg()for aggregations
- Integration with pipelines: Iceberg fits naturally into Spark workflows.
Consider the following:
- Chain multiple DataFrame operations.
- Combine Iceberg tables with other Spark data sources.
- Write the results to new tables or output files.
Read historical snapshots
You can query Iceberg tables as they existed at previous points in time using snapshot IDs or timestamps. This enables auditing and reproducible analytics. Historical snapshots are sometimes referred to as time-traveling. When reading historical snapshots in Iceberg, adhere to the following best practices for better result:- Timestamp-based time travel: Query the table as it existed at a specific point
in time. Consider the following:
- Use
FOR SYSTEM_TIME AS OFin SQL. - Specify the exact timestamp you want to query.
- Read data as it existed at that moment.
- Use
- Snapshot-based time travel: Query a precise table state using a snapshot ID.
Consider the following:
- Use
FOR SYSTEM_VERSION AS OFwith a snapshot ID. - Gain precise control for reproducible queries.
- Retrieve snapshot IDs from table metadata.
- Use
- DataFrame API time travel: Use the DataFrame API to programmatically time-travel.
Consider the following:
- Use
.option("as-of-timestamp")for timestamp-based reads. - Use
.option("snapshot-id")for snapshot-based reads. - Continue using the same
read.format("iceberg")API.
- Use
- Limitations: Be aware of the following constraints when using time travel:
- Rely on snapshot retention policies.
- Avoid querying expired snapshots.
- Expect slower queries when accessing archived files.
- Common use cases: Time travel is useful in scenarios such as:
- Auditing historical data states
- Debugging pipeline failures
- Meeting compliance and reporting requirements
- Comparing metrics across different table versions
Predicate pushdown and column pruning
You can use Iceberg’s metadata to skip unnecessary data files and read only the required columns. This significantly improves query performance. When optimizing historical Iceberg queries, adhere to the following best practices for better result:- Predicate pushdown: When reducing the data scanned by applying filters before reading
files, consider the following:
- Apply
WHEREconditions against file-level statistics. - Skip files that don’t match the predicates.
- Use min/max stats in manifest files.
- Avoid reading unnecessary file contents.
- Apply
- Column pruning: When reading only the columns you need to save, I/O and memory,
consider the following:
- Use
SELECTfor specific columns instead ofSELECT *. - Use Parquet’s columnar format to read only the requested columns.
- Use
- Partition pruning: When skipping entire partitions based on filters,
consider the following:
- Filter on partition columns to skip irrelevant partitions.
- Place partition filters first in the
WHEREclause for maximum pruning. - Prioritize pruning large tables because it yields the most benefit.
- Combined optimization: When applying all techniques together for maximum performance,
consider the following:
- Use
EXPLAINto verify applied optimizations. - Enable metadata caching for tables that are frequently accessed.
- Use
Writing and inserting data
Writing data to Iceberg always creates a new snapshot. Each write provide ACID guarantees, and you can append new data, overwrite existing data, or merge changes without breaking readers.Append data
You can add new data to Iceberg tables without affecting existing records. This creates new snapshots that maintain a complete table history. When appending data to Iceberg tables, adhere to the following best practices for better result:- SQL
INSERT INTO: Append data using SQL-based workflows. Key points include:- Use the
INSERT INTOtableSELECTsyntax. - Append rows from a source query.
- Create new data files and a new snapshot.
- Leave existing data unchanged
INTO.
- Use the
- DataFrame write API: Write data programmatically using Spark. Consider the following:
- Use
df.write.format("iceberg").mode("append")for basic appends. - Use the legacy
write()API for simple operations. - Prefer
writeTo()for advanced Iceberg features. - For each write, Iceberg creates a new snapshot.
- Use
- Atomic operations: Iceberg guarantees consistency during writes. Keep in mind:
- Writes are atomic and either fully succeed or fail
- Partial writes are never visible to readers
- Concurrent writes are safe
INSERT INTO statement.
Overwrite data
You can overwrite data to replace table contents entirely or specific partitions. This creates new snapshots while maintaining history through time travel. When overwriting data in Iceberg tables, consider the impact of the following approaches:- Full table overwrite: This approach replaces all data in the table. Keep in mind:
- It marks old data files for deletion
- A new snapshot points only to the new files
- This affects all data and requires extra caution
- Partition overwrite: This approach limits the overwrite to specific partitions.
Consider the following:
- Selected partition gets replaced
- All other partitions remain unchanged
- Safer than a full table overwrite
- Requires a partitioned table
- Dynamic partition overwrite: This approach overwrites only the partitions present
in the incoming data. Key points include:
- Automatically determines which partitions to replace
- Only affected partitions are overwritten
- This is the most common overwrite pattern
- Safer for partitioned tables
Merge, update, and delete operations
You can perform upserts, updates, and deletes usingMERGE INTO statements
for complex data modification patterns, such as Change Data Capture (CDC) and
slowly changing dimensions.
When modifying existing data in Iceberg tables, consider the following operations:
- SQL
MERGE INTO: Use this when you need to update existing rows and insert new ones in a single operation. Key points include:- Match source and target rows using a join condition.
- Update rows that match the condition.
- Insert rows that don’t match.
- Execute as a single atomic operation.
- Use
MERGE INTObecause it’s more efficient than aDELETE+INSERToperation. - Run compaction after heavy
MERGE INTO,UPDATE, orDELETEoperations to reduce the number of files.
- SQL
UPDATE: Use this to change values in existing rows. Keep in mind:- Modify only rows that match the condition.
- Rewrite the affected data files.
- Create a new snapshot after the update.
- SQL
DELETE: Use this to remove rows from a table. Consider the following:- Remove rows that match the condition.
- Rewrite affected data files without the deleted rows.
- Access old data through time travel if needed.
MERGE INTO,
UPDATE, and DELETE operations.
Best practices for S3 storage and partitioning
You can optimize write operations for S3 storage with proper file sizing, compression, and partitioning strategies. When tuning Iceberg write performance, adhere to the following best practices for better result:- Target file size: File size directly affects read efficiency and metadata overhead.
Consider the following:
- Aim for data files around the default 512 MB.
- Adjust file size using table properties when needed.
- Balance larger files with smaller files.
- Avoid creating small files below 100 MB.
- Compression: Compression choices affect both storage cost and performance.
Key considerations include:
- Use Snappy compression for fast reads and writes by default.
- Consider gzip for higher compression ratios.
- Use zstd for a balance between compression and speed.
- Tune compression through table properties.
- Partitioning strategy: Good partitioning improves both write and query performance.
Keep in mind:
- Partition by frequently filtered columns.
- Aim for partitions that are 1 GB or larger.
- Avoid over-partitioning, which creates many small files.
- Prefer date-based partitioning for time-series workloads.
- Write patterns: How you write data impacts file layout and long-term performance.
Consider the following:
- Favor batch writes over frequent small appends.
- Use dynamic partition overwrite for incremental loads.
- Schedule compaction after heavy write activity.
Snapshot management during writes
You can manage snapshots during writes to understand how each operation creates a new snapshot and control snapshot growth for optimal metadata performance. When working with Iceberg snapshots, consider the following:- Snapshot creation: Each write operation generates a new snapshot.
Keep in mind:
- Snapshots point to all current data files
- Previous snapshots remain available for time travel
- Metadata grows with every write
- Snapshot accumulation: Frequent writes produce many snapshots. Consider the following:
- Add less snapshot because each one adds to the metadata storage and slows query planning.
- Monitor snapshot count via metadata queries.
- Periodically set old snapshots to expire so you can maintain performance.
- Snapshot history: Historical snapshots enable auditing and reproducibility. Key considerations include:
- Balance retention with performance needs.
- Use a typical retention period of 7-30 days.
- Configure retention policies based on your requirements.
- Keep sufficient history for auditing needs.
Schema and partition evolution
Iceberg supports non-disruptive changes to a schema and its partitions.Add or drop columns
You can add new columns or remove obsolete columns from table schemas without modifying existing data files. When adding or dropping columns in an Iceberg table, adhere to the following best practices for better result:- Adding columns: Add new fields without rewriting existing data. Consider the following:
- Use
ALTER TABLE ADD COLUMN. - Rely on Iceberg to update the schema in metadata immediately.
- Leave existing Parquet files unchanged.
- Expect old rows to return
NULLfor the new column. - Expect the operation to complete with zero downtime.
- Use
- Dropping columns: Remove columns safely without touching physical data. Keep in mind:
- Use
ALTER TABLE DROP COLUMN. - Rely on Iceberg to remove the column from the schema definition.
- Expect physical data to remain in Parquet files.
- Expect reads to automatically skips a dropped column.
- Run compaction later to reclaim disk space.
- Use
- Multiple operations: Apply several add or drop operations over time. Consider the following:
- Execute multiple
ADDorDROPoperations in sequence. - Expect each change to create a new schema version.
- Rely on Iceberg to track all schema versions in metadata.
- Execute multiple
Change column data types
You can change column data types to wider, compatible types without rewriting data, as long as the change follows safe type promotion rules. When changing column data types in Iceberg, adhere to the following best practices for better result:- Safe type promotions: Perform only data type changes that don’t risk data loss. These include:
- Integer widening:
INT>BIGINT - Float widening:
FLOAT>DOUBLE - Decimal precision increase:
DECIMAL(10,2)>DECIMAL(20,2) - Data conversion isn’t required
- Integer widening:
- Unsupported conversions: Avoid data type changes that require rewriting data. These include:
- Narrowing types:
BIGINT>INT - Changing type families:
STRING>INT - Altering decimal scale
- Such changes require rewriting existing data files
- Narrowing types:
- Type promotion rules: Follow these guidelines to maintain compatibility:
- The new data type must be wider than the old one
- Ensures old files remain readable with the new schema
- Maintain data consistency across queries
Partition evolution
You can evolve a table’s partitioning over time by changing its partition specifications without rewriting existing data. When evolving partitions in Iceberg, adhere to the following best practices for better result:- Add partition fields: Add new partitions without rewriting existing data. Key points include:
- Use
ALTER TABLE ADD PARTITION FIELD. - Expect new writes to use the updated partition spec.
- Expect old data to remain in original partitions.
- Expect queries to access both old and new partitions seamlessly.
- Use
- Replace partition fields: Change the partitioning scheme for future data. Consider the following:
- Use
ALTER TABLE REPLACE PARTITION FIELD. - Apply the new partition spec for future writes.
- Expect the historical data to remain unchanged.
- Use
- Drop partition fields: Safely remove unnecessary partition fields. Key points include:
- Use
ALTER TABLE DROP PARTITION FIELD. - Iceberg removes the field from the partition spec.
- Expect the existing partitions to remain queryable.
- Expect new data writes to use the updated spec.
- Use
- Hidden partitioning: Iceberg simplifies queries regardless of partition changes. Keep in mind:
- You don’t need to know the partition structure.
- Iceberg manages partition evolution automatically.
- You aren’t required to modify queries.
Snapshot-based versioning
You can track all schema and partition changes through snapshots. This enables version history and rollback capabilities. When managing schema changes in Iceberg, adhere to the following best practices for better result:- Schema versioning: Iceberg tracks every schema modification. Keep in mind:
- Each change creates a new schema version
- The table metadata records versions
- Old schema versions remain available for time travel
- Readers automatically use the correct schema version for the snapshot they query
- Snapshot linkage: Schema versions link to snapshots. Consider the following:
- Queries of historical snapshots use the schema at that time
- Reads remain forward and backward-compatible
- Rollback support: You can revert schema changes safely. Consider the following:
- Rollback to a previous snapshot if needed.
- The table reverts to the old schema automatically.
- It’s useful for recovering from accidental or bad changes.
- Plan schema changes during maintenance windows
- Test changes in a development environment first
- Document schema evolution for team awareness
- Use time travel to verify schema changes
- Coordinate partition evolution with query patterns
Snapshots and time travel
Snapshots provide immutable versions for auditing and recovery.Query historical snapshots
You can access table data as it existed at previous points in time using timestamps or snapshot IDs. When querying historical snapshots in Iceberg, adhere to the following best practices for better result:- Timestamp-based queries: Access the table as it existed at a specific time. Key points include:
- Use
FOR SYSTEM_TIME AS OFwith the desired timestamp. - Iceberg finds the nearest snapshot before that timestamp.
- Iceberg returns data exactly as it existed at that moment.
- Use
- Snapshot ID queries: Access a precise table state using a snapshot ID. Consider the following:
- Use
FOR SYSTEM_VERSION AS OFwith the snapshot ID. - Iceberg provides exact control over which snapshot to query.
- Iceberg enables reproducible queries with the exact snapshot.
- Use
- Common use cases: Time travel is helpful in scenarios such as:
- Auditing: “What was the balance on Dec 31?”
- Debugging: “Which data caused the error?”
- Compliance: “Show Q4 report as originally published”
- Comparison: “How did metrics change?”
Restore a previous table version
You can rollback a table to a previous snapshot/version. This reverts both the data and schema changes while maintaining complete history. When rolling back an Iceberg table, adhere to the following best practices for better result:- Rollback operation: Use rollback to revert the table to a previous version.
Key points include:
- Use
ROLLBACK TO SNAPSHOTto create a new snapshot pointing to the old version. - A rollback operation only updates metadata, it doesn’t delete any data files.
- A rollback operation is nearly instant since it’s metadata-only.
- Use
- What gets restored: Rolling back affects the following aspects of the table:
- Table data via file pointers
- Table schema
- Partition specification
- Table properties
- Safety considerations: Rollbacks are safe and auditable. Keep in mind:
- Original snapshots remain intact
- You can undo a rollback if needed
- You don’t lose data
- All operations are fully auditable
Work with snapshot IDs and timestamps
You can identify and reference specific snapshots using snapshot IDs or commit timestamps. This is helpful for precise table version control. When working with Iceberg snapshots, adhere to the following best practices for better result:- Snapshot ID: Each snapshot has a unique identifier. Key points include:
- Snapshot ID is a long integer
- Expired snapshot ID’s aren’t reused
- Snapshot ID enables exact and reproducible queries
- Commit timestamp: Snapshots also record their creation time. It includes a few
additional points:
- Stored as a timezone-aware timestamp
- Used for “as-of” queries to access historical table states
- Timestamp may not match exactly, Iceberg picks the nearest snapshot
- Listing snapshots: You can inspect table history and operations.
The additional points include:
- Ability to query system procedures to list snapshots
- View snapshot metadata, such as operations and summaries
- Understand the type of change recorded in each snapshot
- Document important snapshot IDs
- Use timestamps for ad-hoc queries
- Use snapshot IDs for reproducible reports
- List snapshots before rollback operations
- Coordinate snapshot retention with audit requirements
Data pruning and maintenance operations
Data pruning and maintenance operations improve query performance by reducing the amount of data scanned. They also keep table files and metadata optimized for fast and consistent queries.Compaction
You can consolidate small data files into larger files to improve query performance and reduce metadata overhead. Compaction is important because small files can affect query performance in the following ways:- Many small data files increase metadata operations and API calls
- More data files make query planning slower
- Opening many small files is less efficient than reading fewer large files
- Each data file adds to metadata, creating bloat in the manifest file due to file tracking
- After high-volume inserts with small batches
- Following CDC loads that create many small files
- When streaming ingestion creates hourly micro-batches
- As part of a scheduled maintenance, either daily or weekly
- When query planning time exceeds acceptable threshold
- Data file compaction: Merges small Parquet files into larger ones
- Manifest compaction: consolidates manifest metadata files
Manifest optimization
You can optimize query planning by managing how Iceberg stores and maintains manifest metadata. Manifest files accumulate naturally as the table evolves, which results in the following:- Each write operation creates or updates manifest files
- Manifest files track data files and their statistics
- Many small manifests build up over time
- Each additional manifest adds to the query planning overhead
- The query engine must read every manifest file during planning
- More manifests mean more S3 read operations
- Planning time increases before query execution starts
- Metadata storage costs rise with the manifest count
- Consolidates many small manifest files into fewer ones
- Reduces the total manifest count significantly
- Speeds up query planning
- Works efficiently when run periodically as part of maintenance
Expire old snapshots
You can control storage growth and metadata size by expiring snapshots that are no longer needed. Snapshots naturally accumulate in the following ways as the table changes:- Each write operation creates a new snapshot
- Older snapshots continue to reference files that are no longer part of the table
- Snapshot metadata grows over time
- Unreferenced files increase S3 storage costs
- Storage costs have grown noticeably
- Time-travel history exceeds auditing or debugging needs
- Enough time has passed to meet retention requirements. This is commonly 7-90 days.
- Snapshot expiration runs as part of scheduled maintenance
- Keep the most recent N snapshots to retain the most recent table changes.
- Expire snapshots older than a timestamp to enforce time-based retention.
- Combine the previous two strategies that keep the most recent N snapshots or anything newer than a given timestamp.
Remove orphan files
You can reclaim storage and keep Iceberg tables clean by removing orphan files that are no longer tracked in metadata. Orphan files appear in a table for several reasons:- Write operations that fail mid-execution
- Aborted transactions that don’t commit
- Manual file operations ran directly in S3
- Partially completed maintenance tasks, such as failed compactions
- They consume S3 storage unnecessarily
- They aren’t tracked in table metadata
- Normal Iceberg operations can’t remove them
- They accumulate if not cleaned up regularly
Table layout optimization
Optimizing a table layout helps organize how Iceberg stores data within table files. You can optimize the table layout by setting a proper sort order in your Iceberg tables. This involves the following steps:- Defining the column order within data files so related data stays together.
- Co-locating related data to improve the accuracy of file statistics.
- Applying sorting during writes and compaction to maintain this layout.
- Enables more effective metadata pruning using min/max statistics
- Speeds up scans since related data were previously stored together
- Makes future compaction and maintenance more efficient
- Allows queries to skip large portions of files
Running maintenance operations
You can keep Iceberg tables healthy by running regular maintenance operations. Recommended schedules vary based on write volume. Some of these schedules include the following:- Daily maintenance: These are for high-write tables and require
that you do the following:
- Compact data files after overnight ingestion
- Check for orphan files from failed jobs
- Weekly maintenance: These are for medium-write tables and require the following:
- Run full compaction cycles
- Optimize manifests
- Expire snapshots older than 30 days
- Monthly maintenance: These are for low-write tables and require the following:
- Perform comprehensive cleanups
- Remove orphan files
- Expire very old snapshots
- Too many small files: Query planning exceeds 30 seconds
- Large manifest lists: More than 100 manifests per snapshot
- Slow planning times: Increased query startup time
- High metadata storage: Manifest files exceeding 1 GB
- Storage growth: Faster than data ingestion rate
Logging and monitoring
Iceberg provides tools to monitor table operations and performance. You can also debug issues using table metadata, snapshots, and Spark logs.Monitor Iceberg metadata
You can monitor Iceberg table operations, including snapshots, schema changes, and table operations. When monitoring Iceberg tables, adhere to the following best practices for better result:- Snapshot monitoring: Snapshots let you see how a table evolves. To monitor a snapshot,
do the following:
- Query the snapshots system table.
- View the operation history.
- Examine the data and file statistics.
- Schema evolution monitoring: Monitor changes to a table schema to understand its history.
Consider the following when monitoring:
- Review the schema history.
- Identify the added or removed columns’ timestamp.
- Monitor changes to the partition specification.
- Operational monitoring: Monitor table activity to maintain performance and reliability.
This includes:
- Monitoring write operations
- Monitoring compaction activities
- Reviewing maintenance procedures
Debug failed writes and rollback
You can investigate write failures, identify root causes, and recoveries from failed operations using snapshots and logs. When debugging Iceberg operations, adhere to the following best practices for better result:- Common failure scenarios: Typical issues you may encounter include:
- Permission errors due to Ranger policies
- Connectivity problems to S3
- Metastore unavailability
- Conflicts from concurrent writes
- Investigation steps: To diagnose problems, the following steps should help:
- Check Spark or Kyuubi logs for errors.
- Review snapshot history to identify anomalies.
- Identify orphan files left by failed operations.
- Verify table and storage permissions.
- Recovery actions: After identifying the cause, consider these fixes:
- Roll back to the last good snapshot.
- Remove any orphan files.
- Retry writes after resolving the underlying issue.
Access Kyuubi and Spark logs
You can access execution logs from Kyuubi and Spark for detailed error investigation and performance debugging. The Spark History Server helps you inspect app-level executions. Key steps to inspect execution logs include:-
Navigate to
https://spark-history.<client>.nx1cloud.com/.When you purchase NexusOne, you receive a client name. Replaceclientwith your assigned client name. - Find your app by its timestamp or name.
- View driver and executor logs.
- Review task-level failures for troubleshooting.
- Driver stdout/stderr: App-level errors
- Executor stderr: Task-level failures
- Event logs: Detailed execution timeline
org.apache.iceberg.exceptions.CommitFailedException: Occurs with concurrent writesjava.nio.file.AccessDeniedException: Indicates permission issuesorg.apache.hadoop.fs.s3a.AWSClientIOException: S3 connectivity problemsjava.lang.OutOfMemoryError: Insufficient memory for execution
Security and access control
Iceberg security relies on controlling access at the catalog and storage layer so it can protect table metadata and data files.Access control via a Hive Metastore
Within NexusOne, Iceberg enforces table-level access control via a Hive Metastore integration so it can control who can read, write, or modify Iceberg tables. When managing access to Iceberg tables, adhere to the following best practices for better result:- Permission types: Different permissions control how users interact with tables.
Common permissions include:
SELECTto read table dataINSERTto write new dataUPDATEandDELETEto modify existing dataALTERto change the table schemaDROPto remove the table entirely
- Grant patterns: How you grant permissions affects long-term manageability.
Recommended patterns include:
- Granting permissions to specific users when needed
- Granting permissions to roles instead of individuals
- Using Role-Based Access Control (RBAC) for scalability
- Metastore integration: Access control relies on metastore enforcement.
Key points include:
- Permissions live in the Hive Metastore
- Permissions enforced during query execution and across all Iceberg operations
S3 bucket permissions and Ranger policies
You can control data access at the storage layer through S3 bucket policies and Apache Ranger for fine-grained authorization. Iceberg relies on S3 permissions to control access to table data and metadata. Iceberg requires the following permissions for common operations:- Read access to query Iceberg tables
- Write access for data ingestion and write operations
- List access to discover data and metadata files
- Delete access for maintenance tasks such as compaction and cleanup
- Policy-based access control to define who can access which resources
- Fine-grained permissions at the database, table, and column levels
- Audit logging to track access attempts and policy evaluations
- Dynamic policies that update permissions without service restarts
- Read-only analysts:
SELECTaccess to specific tables- Read-only access to S3
- No
ALTERorINSERTpermissions
- Data engineers:
SELECT,INSERT,UPDATE, andDELETEon tables- Read and write access to S3
ALTERpermissions for schema changes
- Table administrators:
- Full permissions on tables
- Complete S3 access, including delete
- Ability to drop tables and manage policies
Contact the Platform team to:
- Configure Ranger policies for new tables or databases
- Request permission changes
- Troubleshoot access denied errors
- Review audit logs for security investigations
Integration with Spark
Iceberg is natively supported in Spark for seamless lakehouse operations.Access Iceberg Tables using Spark SQL
You can query Iceberg tables using the Spark SQL engine with an automatic catalog resolution through Hive Metastore. When executing queries to access the Iceberg tables, adhere to the following best practices for better result:- Catalog resolution: The catalog resolves Iceberg tables by mapping table names
to their metadata. This means:
- You access the Iceberg tables via the Hive Metastore.
- Use the
database.tablenaming convention to reference tables. - Spark automatically recognizes the Iceberg table format.
- SQL operations: You interact with Iceberg using standard SQL. This includes:
- Regular SQL syntax for querying data
- Data Definition Language (DDL) operations such as
CREATE,ALTER, andDROP - Data Manipulation Language (DML) operations such as
INSERT,UPDATE,DELETE, andMERGE
- S3 storage paths: File locations stay abstracted away from queries. Key points include:
- Data files live in S3-backed storage
- Spark reads file paths from Iceberg metadata
- Queries never need to reference storage paths directly
Read and write in Spark apps
You can use the Spark DataFrame API to read from and write to Iceberg tables programmatically in apps and notebooks. When working with Iceberg tables in Spark, adhere to the following best practices for better result:- Reading tables: Reading data is straightforward. Consider the following:
- Use
spark.read.format("iceberg").load()to load a table. This returns a DataFrame ready for transformations. - Iceberg supports all standard Spark DataFrame operations.
- Use
- Writing tables: Writing data offers flexibility depending on your needs.
Key considerations include:
- Use
df.write.format("iceberg")for basic writes. - Use
df.writeTo()to leverage advanced Iceberg features. - Iceberg supports append and overwrite modes.
- Use
- Application patterns: Iceberg fits well in various Spark workflows, including:
- Read-transform-write pipelines
- Complex multi-step transformations
- Integration with other data sources
SQL vs. DataFrame API usage
When working with Iceberg in Spark, you can interact with tables using either Spark SQL or the Spark DataFrame API. The Spark SQL approach includes the following characteristics:- Uses declarative SQL syntax
- Easier for analysts familiar with SQL
- Good for ad-hoc queries
- Offers limited programmatic control
- Provides full programmatic control
- Better for complex transformations
- Integrates smoothly with Python or Scala code
- Reusable within apps
- Writing ad-hoc analytical queries
- Managing tables with DDL
- Performing simple filtering and aggregations
- Working interactively in notebooks
- Building production Spark pipelines
- Performing complex, multi-step transformations
- Needing programmatic control flow
- Integrating with other systems
Catalog resolution and storage paths
In NexusOne, Spark resolves Iceberg table locations through the Hive Metastore and accesses data files in S3. The process includes the following steps:- Hive Metastore catalog: When Spark uses the Hive Metastore, it does the following:
- Queries the metastore for table metadata
- Returns the table’s S3 location and Iceberg format information
- Table lookup: For example, when running
SELECT * FROM iceberg_db.sales, Spark does the following:- Queries the metastore for table metadata
- Retrieves the S3 location:
s3://datalake/iceberg/sales/ - Reads Iceberg metadata files from that location
- Data file discovery: Iceberg metadata tells Spark which files to read. Keep in mind:
- Iceberg metadata contains data file paths
- Spark reads only the required files
- Partition pruning applies automatically
- Table location: Base S3 path for the table
- Data directory:
s3://datalake/iceberg/sales/data/ - Metadata directory:
s3://datalake/iceberg/sales/metadata/ - Partition paths: Organized by partition values for Iceberg internal use
- Always access tables via the catalog using the
database.tableformat - Avoid direct access to S3 paths
- Let Spark handle catalog resolution automatically
- Verify Metastore connectivity before running operations
- Use fully qualified table names for clarity
Performance and optimization
To improve query performance in Iceberg, you can take advantage of metadata pruning and caching.Query-time partition pruning
You can automatically skip reading entire partitions that don’t match query predicates using query-time partition pruning. This dramatically reduces the data scanned. The following describes how query-time partition pruning works:- Query parser identifies partition column filters.
- Iceberg checks partition values in metadata.
- Iceberg scans only matching partitions.
- Iceberg skips irrelevant partitions entirely.
- The table is already partitioned
- Filter includes partition columns
- Predicates use simple conditions such as equality or range
- Reduces data scanned by 90% or more for time-series workloads
- Applies the fastest optimization automatically
- Operates without additional configuration
Metadata and manifest pruning
You can use file-level min/max statistics to skip data files that don’t contain matching values without reading file contents. The following describes how metadata and manifest pruning work using Iceberg statistics:- Each manifest file stores min/max values for each column
- The query compares its predicates against these statistics
- Iceberg skips files that fall outside the predicate range
- Only relevant files are actually read, reducing unnecessary file opens and I/O
- Sorted or clustered data
- Range queries on non-partition columns
- Well-maintained tables compacted regularly
- Works only on first-level predicates
- Less effective on randomly distributed data
- Requires accurate column statistics
Snapshot pruning for queries
You can optimize time travel queries by efficiently locating and reading historical snapshots without scanning intermediate states. The following describes how snapshot pruning works for queries in Iceberg:- Snapshot metadata lives in manifest lists
- Time-travel targets a specific snapshot directly
- Iceberg avoids replaying the entire write history
- Historical queries run quickly
- Auditing: Checking the table’s state at a specific point in time
- Debugging: Investigating issues using historical data
- Compliance: Accessing data as of regulatory or reporting dates
- Historical queries run nearly as fast as current queries
- Query speed depends on snapshot retention
- Queries may be slower for archived files in a different storage tier
Metadata caching
You can cache table metadata in memory to speed up query planning and reduce metastore load. The following metadata gets cached:- Table schemas
- Partition specifications
- Manifest lists
- File metadata
- Enable caching per catalog.
- Set a Time To Live (TTL) for cache entries.
- Configure the cache size.
- Faster query planning from seconds to milliseconds
- Reduced metastore load
- Better concurrency for read-heavy workloads
- Enable caching for read-heavy tables.
- Set appropriate TTL based on write frequency.
- Monitor cache hit rates.
- Disable for frequently written tables.
- Use separate caches for different catalogs.
Additional resource
- For more details about Apache Iceberg, refer to the Apache Iceberg official documentation.
- For more details about how NexusOne integrates Apache Spark, refer to the Apache Spark.