Iceberg Procedures
StarRocks Iceberg Catalog supports a variety of procedures for managing Iceberg tables, including snapshot management, branch management, data maintenance, metadata management, and table management.
You must have the appropriate privileges to execute procedures. For more information about privileges, see Privileges.
Snapshot managementβ
Rollback to snapshotβ
Rolls back the table to a specific snapshot. This operation sets the table's current snapshot to the specified snapshot ID.
rollback_to_snapshot Syntaxβ
ALTER TABLE [catalog.][database.]table_name
EXECUTE rollback_to_snapshot(<snapshot_id>)
Parametersβ
snapshot_id: ID of the snapshot to which you want to roll back the table.
Exampleβ
Roll back the table to snapshot with ID 98765:
ALTER TABLE iceberg.sales.order
EXECUTE rollback_to_snapshot(98765);
Cherry pick a snapshotβ
Cherry picks a specific snapshot and applies it to the current state of the table. This operation creates a new snapshot based on an existing snapshot, while the original snapshot remains unchanged.
cherrypick_snapshot Syntaxβ
ALTER TABLE [catalog.][database.]table_name
EXECUTE cherrypick_snapshot(<snapshot_id>)
Parametersβ
snapshot_id: ID of the snapshot which you want to cherry pick.
Exampleβ
ALTER TABLE iceberg.sales.order
EXECUTE cherrypick_snapshot(54321);
Branch managementβ
Fast forward a branch to anotherβ
Fast-forwards one branch to another branch's latest snapshot. This operation updates the source branch's snapshot to match the target branch's snapshot.
fast_forward Syntaxβ
ALTER TABLE [catalog.][database.]table_name
EXECUTE fast_forward('<from_branch>', '<to_branch>')
Parametersβ
from_branch: The branch you want to fast forward. Wrap the branch name in quotes.to_branch: The branch to which you want to fast forward thefrom_branch. Wrap the branch name in quotes.
Exampleβ
Fast forward the main branch to the branch test-branch:
ALTER TABLE iceberg.sales.order
EXECUTE fast_forward('main', 'test-branch');
Data maintenanceβ
Rewrite data filesβ
Rewrites data files to optimize file layout. This procedure merges small files to improve query performance and reduce metadata overhead.
rewrite_data_files Syntaxβ
ALTER TABLE [catalog.][database.]table_name
EXECUTE rewrite_data_files
("key"=value [,"key"=value, ...])
[WHERE <predicate>]
Parametersβ
rewrite_data_files propertiesβ
"key"=value pairs that declare the manual compaction behaviors. Note that you need to wrap the key in double quotes.
min_file_size_bytesβ
- Description: The upper limit of a small data file. Data files whose size is less than this value will be merged during the compaction.
- Unit: Byte
- Type: Int
- Default: 268,435,456 (256 MB)
batch_sizeβ
- Description: The maximum size of data that can be processed in each batch.
- Unit: Byte
- Type: Int
- Default: 10,737,418,240 (10 GB)
rewrite_allβ
- Description: Whether to rewrite all data files during the compaction, ignoring the parameters that filter data files with specific requirements.
- Unit: -
- Type: Boolean
- Default: false
batch_parallelismβ
- Description: The number of parallel batches to process during the compaction.
- Unit: -
- Type: Int
- Default: 1
WHERE clauseβ
- Description: The filter predicate used to specify the partition(s) to be involved in the compaction.
Exampleβ
The following example performs manual Compaction on specific partitions in the Iceberg table t1. The partitions are represented by the clause WHERE part_col = 'p1'. In these partitions, data files that are smaller than 134,217,728 bytes (128 MB) will be merged during the Compaction.
ALTER TABLE t1 EXECUTE rewrite_data_files("min_file_size_bytes"= 134217728) WHERE part_col = 'p1';
Metadata managementβ
Expire snapshotsβ
Expires snapshots older than a specific timestamp. This operation deletes the data files of the expired snapshots, helping to manage storage usage.
expire_snapshots Syntaxβ
ALTER TABLE [catalog.][database.]table_name
EXECUTE expire_snapshots(
[ [older_than =] '<datetime>' ] [, [retain_last =] <int> ]
)
Parametersβ
older_thanβ
- Description: The timestamp before which snapshots will be removed. If not specified, files older than 5 days (from the current time) will be removed by default. Format: 'YYYY-MM-DD HH:MM:SS'.
- Type: DATETIME
- Required: No
retain_lastβ
- Description: The maximum number of most recent snapshots to retain. The less recent snapshots will be removed when this threshold is reached. If not specified, only one snapshot will be retained by default.
- Type: Integer
- Required: No
Exampleβ
Expire snapshots before '2023-12-17 00:14:38' and retain two snapshots:
-- With the parameter key specified:
ALTER TABLE iceberg.sales.order
EXECUTE expire_snapshots(older_than = '2023-12-17 00:14:38', retain_last = 2);
-- With the parameter key unspecified:
ALTER TABLE iceberg.sales.order
EXECUTE expire_snapshots('2023-12-17 00:14:38', 2);
Remove orphan filesβ
Removes orphan files from the table that are not referenced by any valid snapshot and are older than a specified timestamp. This operation helps clean up unused files and reclaim storage space.
remove_orphan_files Syntaxβ
ALTER TABLE [catalog.][database.]table_name
EXECUTE remove_orphan_files(
[ [older_than =] '<datetime>' ] [, [location =] '<string>' ]
)
Parametersβ
older_thanβ
- Description: The timestamp before which orphan files will be removed. If not specified, files older than 7 days (from the current time) will be removed by default. Format: 'YYYY-MM-DD HH:MM:SS'.
- Type: DATETIME
- Required: No
locationβ
- Description: The directory from which you want to remove orphan files. It must be a sub-directory of the table location. If not specified, the table location will be used by default.
- Type: STRING
- Required: No
Exampleβ
Remove orphan files older than '2024-01-01 00:00:00' from the sub-directory sub_dir of the table location:
-- With the parameter key specified:
ALTER TABLE iceberg.sales.order
EXECUTE remove_orphan_files(older_than = '2024-01-01 00:00:00', location = 's3://iceberg-bucket/iceberg_db/iceberg_table/sub_dir');
-- With the parameter key unspecified:
ALTER TABLE iceberg.sales.order
EXECUTE remove_orphan_files('2024-01-01 00:00:00', 's3://bucket-test/iceberg_db/iceberg_table/sub_dir');
Rewrite manifestsβ
Rewrites data manifest files and merges them by partition to avoid performance degradation caused by excessive small manifests.
This operation rewrites the data manifests of the current snapshot only.
rewrite_manifests Syntaxβ
ALTER TABLE [catalog.][database.]table_name
EXECUTE rewrite_manifests()
Exampleβ
Rewrite the data manifests of the current snapshot:
ALTER TABLE iceberg.sales.order
EXECUTE rewrite_manifests()
Table managementβ
Add filesβ
Adds data files to an Iceberg table from either a source table or a specific location. This procedure supports Parquet and ORC file formats.
add_files Syntaxβ
ALTER TABLE [catalog.][database.]table_name
EXECUTE add_files(
[source_table = '<source_table>' | location = '<location>', file_format = '<format>']
[, recursive = <boolean>]
)
Parametersβ
Either source_table or location must be provided, but not both.
source_tableβ
- Description: The source table from which to add files. Format: 'catalog.database.table'.
- Type: String
- Required: No
locationβ
- Description: The directory path or file path from which to add files.
- Type: String
- Required: No
file_formatβ
- Description: The format of the data files. Supported values: 'parquet', 'orc'.
- Type: String
- Required: No (required when using
location)
recursiveβ
- Description: Whether to recursively scan subdirectories when adding files from a location.
- Type: Boolean
- Default: true
- Required: No
Exampleβ
Add files from a source table:
ALTER TABLE iceberg.sales.order
EXECUTE add_files(source_table = 'hive_catalog.sales.source_order');
Add files from a specific location with Parquet format:
ALTER TABLE iceberg.sales.order
EXECUTE add_files(location = 's3://bucket/data/order/', file_format = 'parquet', recursive = true);
Add files from a single file:
ALTER TABLE iceberg.sales.order
EXECUTE add_files(location = 's3://bucket/data/order/data.parquet', file_format = 'parquet');
Register tableβ
Registers an Iceberg table using a metadata file. This procedure allows you to add an existing Iceberg table to the catalog without migrating data.
register_table Syntaxβ
CALL [catalog.]system.register_table(
database_name = '<database_name>',
table_name = '<table_name>',
metadata_file = '<metadata_file_path>'
)
Parametersβ
database_nameβ
- Description: The name of the database in which to register the table.
- Type: String
- Required: Yes
table_nameβ
- Description: The name of the table to register.
- Type: String
- Required: Yes
metadata_fileβ
- Description: The path to the Iceberg table metadata file (e.g., metadata.json).
- Type: String
- Required: Yes
Exampleβ
Register a table using a metadata file:
CALL iceberg_catalog.system.register_table(
database_name = 'sales',
table_name = 'order',
metadata_file = 's3://bucket/metadata/sales/order/metadata/00001-xxxxx-xxxxx-xxxxx.metadata.json'
);
Or use the current catalog:
CALL system.register_table(
database_name = 'sales',
table_name = 'order',
metadata_file = 's3://bucket/metadata/sales/order/metadata/00001-xxxxx-xxxxx-xxxxx.metadata.json'
);