Version: Candidate-4.1

Iceberg Procedures

StarRocks Iceberg Catalog supports a variety of procedures for managing Iceberg tables, including snapshot management, branch management, data maintenance, metadata management, and table management.

You must have the appropriate privileges to execute procedures. For more information about privileges, see Privileges.

Snapshot management

Rollback to snapshot

Rolls back the table to a specific snapshot. This operation sets the table's current snapshot to the specified snapshot ID.

`rollback_to_snapshot` Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE rollback_to_snapshot(<snapshot_id>)

Parameters

snapshot_id: ID of the snapshot to which you want to roll back the table.

Example

Roll back the table to snapshot with ID 98765:

ALTER TABLE iceberg.sales.order
EXECUTE rollback_to_snapshot(98765);

Cherry pick a snapshot

Cherry picks a specific snapshot and applies it to the current state of the table. This operation creates a new snapshot based on an existing snapshot, while the original snapshot remains unchanged.

`cherrypick_snapshot` Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE cherrypick_snapshot(<snapshot_id>)

Parameters

snapshot_id: ID of the snapshot which you want to cherry pick.

Example

ALTER TABLE iceberg.sales.order
EXECUTE cherrypick_snapshot(54321);

Branch management

Fast forward a branch to another

Fast-forwards one branch to another branch's latest snapshot. This operation updates the source branch's snapshot to match the target branch's snapshot.

`fast_forward` Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE fast_forward('<from_branch>', '<to_branch>')

Parameters

from_branch: The branch you want to fast forward. Wrap the branch name in quotes.
to_branch: The branch to which you want to fast forward the from_branch. Wrap the branch name in quotes.

Example

Fast forward the main branch to the branch test-branch:

ALTER TABLE iceberg.sales.order
EXECUTE fast_forward('main', 'test-branch');

Data maintenance

Rewrite data files

Rewrites data files to optimize file layout. This procedure merges small files to improve query performance and reduce metadata overhead.

`rewrite_data_files` Syntax

ALTER TABLE [catalog.][database.]table_name 
EXECUTE rewrite_data_files
("key"=value [,"key"=value, ...]) 
[WHERE <predicate>]

Parameters

`rewrite_data_files` properties

"key"=value pairs that declare the manual compaction behaviors. Note that you need to wrap the key in double quotes.

`min_file_size_bytes`

Description: The upper limit of a small data file. Data files whose size is less than this value will be merged during the compaction.
Unit: Byte
Type: Int
Default: 268,435,456 (256 MB)

`batch_size`

Description: The maximum size of data that can be processed in each batch.
Unit: Byte
Type: Int
Default: 10,737,418,240 (10 GB)

`rewrite_all`

Description: Whether to rewrite all data files during the compaction, ignoring the parameters that filter data files with specific requirements.
Unit: -
Type: Boolean
Default: false

`batch_parallelism`

Description: The number of parallel batches to process during the compaction.
Unit: -
Type: Int
Default: 1

`WHERE` clause

Description: The filter predicate used to specify the partition(s) to be involved in the compaction.

Example

The following example performs manual Compaction on specific partitions in the Iceberg table t1. The partitions are represented by the clause WHERE part_col = 'p1'. In these partitions, data files that are smaller than 134,217,728 bytes (128 MB) will be merged during the Compaction.

ALTER TABLE t1 EXECUTE rewrite_data_files("min_file_size_bytes"= 134217728) WHERE part_col = 'p1';

Metadata management

Expire snapshots

Expires snapshots older than a specific timestamp. This operation deletes the data files of the expired snapshots, helping to manage storage usage.

`expire_snapshots` Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE expire_snapshots(
    [ [older_than =] '<datetime>' ] [,  [retain_last =] <int> ]
)

Parameters

`older_than`

Description: The timestamp before which snapshots will be removed. If not specified, files older than 5 days (from the current time) will be removed by default. Format: 'YYYY-MM-DD HH:MM:SS'.
Type: DATETIME
Required: No

`retain_last`

Description: The maximum number of most recent snapshots to retain. The less recent snapshots will be removed when this threshold is reached. If not specified, only one snapshot will be retained by default.
Type: Integer
Required: No

Example

Expire snapshots before '2023-12-17 00:14:38' and retain two snapshots:

-- With the parameter key specified:
ALTER TABLE iceberg.sales.order
EXECUTE expire_snapshots(older_than = '2023-12-17 00:14:38', retain_last = 2);

-- With the parameter key unspecified:
ALTER TABLE iceberg.sales.order
EXECUTE expire_snapshots('2023-12-17 00:14:38', 2);

Remove orphan files

Removes orphan files from the table that are not referenced by any valid snapshot and are older than a specified timestamp. This operation helps clean up unused files and reclaim storage space.

`remove_orphan_files` Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE remove_orphan_files(
    [ [older_than =] '<datetime>' ] [,  [location =] '<string>' ]
)

Parameters

`older_than`

Description: The timestamp before which orphan files will be removed. If not specified, files older than 7 days (from the current time) will be removed by default. Format: 'YYYY-MM-DD HH:MM:SS'.
Type: DATETIME
Required: No

`location`

Description: The directory from which you want to remove orphan files. It must be a sub-directory of the table location. If not specified, the table location will be used by default.
Type: STRING
Required: No

Example

Remove orphan files older than '2024-01-01 00:00:00' from the sub-directory sub_dir of the table location:

-- With the parameter key specified:
ALTER TABLE iceberg.sales.order
EXECUTE remove_orphan_files(older_than = '2024-01-01 00:00:00', location = 's3://iceberg-bucket/iceberg_db/iceberg_table/sub_dir');

-- With the parameter key unspecified:
ALTER TABLE iceberg.sales.order
EXECUTE remove_orphan_files('2024-01-01 00:00:00', 's3://bucket-test/iceberg_db/iceberg_table/sub_dir');

Rewrite manifests

Rewrites data manifest files and merges them by partition to avoid performance degradation caused by excessive small manifests.

note

This operation rewrites the data manifests of the current snapshot only.

`rewrite_manifests` Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE rewrite_manifests()

Example

Rewrite the data manifests of the current snapshot:

ALTER TABLE iceberg.sales.order
EXECUTE rewrite_manifests()

Table management

Add files

Adds data files to an Iceberg table from either a source table or a specific location. This procedure supports Parquet and ORC file formats.

`add_files` Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE add_files(
    [source_table = '<source_table>' | location = '<location>', file_format = '<format>']
    [, recursive = <boolean>]
)

Parameters

Either source_table or location must be provided, but not both.

`source_table`

Description: The source table from which to add files. Format: 'catalog.database.table'.
Type: String
Required: No

`location`

Description: The directory path or file path from which to add files.
Type: String
Required: No

`file_format`

Description: The format of the data files. Supported values: 'parquet', 'orc'.
Type: String
Required: No (required when using location)

`recursive`

Description: Whether to recursively scan subdirectories when adding files from a location.
Type: Boolean
Default: true
Required: No

Example

Add files from a source table:

ALTER TABLE iceberg.sales.order
EXECUTE add_files(source_table = 'hive_catalog.sales.source_order');

Add files from a specific location with Parquet format:

ALTER TABLE iceberg.sales.order
EXECUTE add_files(location = 's3://bucket/data/order/', file_format = 'parquet', recursive = true);

Add files from a single file:

ALTER TABLE iceberg.sales.order
EXECUTE add_files(location = 's3://bucket/data/order/data.parquet', file_format = 'parquet');

Register table

Registers an Iceberg table using a metadata file. This procedure allows you to add an existing Iceberg table to the catalog without migrating data.

`register_table` Syntax

CALL [catalog.]system.register_table(
    database_name = '<database_name>',
    table_name = '<table_name>',
    metadata_file = '<metadata_file_path>'
)

Parameters

`database_name`

Description: The name of the database in which to register the table.
Type: String
Required: Yes

`table_name`

Description: The name of the table to register.
Type: String
Required: Yes

`metadata_file`

Description: The path to the Iceberg table metadata file (e.g., metadata.json).
Type: String
Required: Yes

Example

CALL iceberg_catalog.system.register_table(
    database_name = 'sales',
    table_name = 'order',
    metadata_file = 's3://bucket/metadata/sales/order/metadata/00001-xxxxx-xxxxx-xxxxx.metadata.json'
);

Or use the current catalog:

CALL system.register_table(
    database_name = 'sales',
    table_name = 'order',
    metadata_file = 's3://bucket/metadata/sales/order/metadata/00001-xxxxx-xxxxx-xxxxx.metadata.json'
);

Snapshot management​

Rollback to snapshot​

rollback_to_snapshot Syntax​

Parameters​

Example​

Cherry pick a snapshot​

cherrypick_snapshot Syntax​

Parameters​

Example​

Branch management​

Fast forward a branch to another​

fast_forward Syntax​

Parameters​

Example​

Data maintenance​

Rewrite data files​

rewrite_data_files Syntax​

Parameters​

rewrite_data_files properties​

min_file_size_bytes​

batch_size​

rewrite_all​

batch_parallelism​

WHERE clause​

Example​

Metadata management​

Expire snapshots​

expire_snapshots Syntax​

Parameters​

older_than​

retain_last​

Example​

Remove orphan files​

remove_orphan_files Syntax​

Parameters​

older_than​

location​

Example​

Rewrite manifests​

rewrite_manifests Syntax​

Example​

Table management​

Add files​

add_files Syntax​

Parameters​

source_table​

location​

file_format​

recursive​

Example​

Register table​

register_table Syntax​

Parameters​

database_name​

table_name​

metadata_file​

Example​

What did you think of this doc?

Snapshot management

Rollback to snapshot

`rollback_to_snapshot` Syntax

Parameters

Example

Cherry pick a snapshot

`cherrypick_snapshot` Syntax

Parameters

Example

Branch management

Fast forward a branch to another

`fast_forward` Syntax

Parameters

Example

Data maintenance

Rewrite data files

`rewrite_data_files` Syntax

Parameters

`rewrite_data_files` properties

`min_file_size_bytes`

`batch_size`

`rewrite_all`

`batch_parallelism`

`WHERE` clause

Example

Metadata management

Expire snapshots

`expire_snapshots` Syntax

Parameters

`older_than`

`retain_last`

Example

Remove orphan files

`remove_orphan_files` Syntax

Parameters

`older_than`

`location`

Example

Rewrite manifests

`rewrite_manifests` Syntax

Example

Table management

Add files

`add_files` Syntax

Parameters

`source_table`

`location`

`file_format`

`recursive`

Example

Register table

`register_table` Syntax

Parameters

`database_name`

`table_name`

`metadata_file`

Example