Iceberg DML operations
StarRocks Iceberg Catalog supports a variety of Data Manipulation Language (DML) operations, including inserting data into Iceberg tables.
You must have the appropriate privileges to perform DML operations. For more information about privileges, see Privileges.
INSERTβ
Inserts data into an Iceberg table. This feature is supported from v3.1 onwards.
Similar to loading data into StarRocks native tables, if you have the INSERT privilege on an Iceberg table, you can use the INSERT statement to sink the data to the Iceberg table. Currently, only Parquet-formatted Iceberg tables are supported.
Syntaxβ
INSERT {INTO | OVERWRITE} <table_name>
[ (column_name [, ...]) ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
-- If you want to sink data to specified partitions, use the following syntax:
INSERT {INTO | OVERWRITE} <table_name>
PARTITION (par_col1=<value> [, par_col2=<value>...])
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
NULL values are not allowed in partition columns. Therefore, you must make sure that no empty values are loaded into the partition columns of the Iceberg table.
Parametersβ
INTOβ
Appends the data to the Iceberg table.
OVERWRITEβ
Overwrites the existing data of the Iceberg table.
column_nameβ
The name of the destination column to which you want to load data. You can specify one or more columns. Multiple columns are separated with commas (,).
- You can only specify columns that actually exist in the Iceberg table.
- The destination columns must include the partition columns of the Iceberg table.
- The destination columns are mapped one on one in sequence to the columns in the SELECT statement (source columns), regardless of what the destination column names are.
- If no destination columns are specified, the data is loaded into all columns of the Iceberg table.
- If a non-partition source column cannot be mapped to any destination column, StarRocks writes the default value
NULLto the destination column. - If the data types of the source and destination columns mismatch, StarRocks performs an implicit conversion on the mismatched columns. If the conversion fails, a syntax parsing error will be returned.
You cannot specify the column_name property if you have specified the PARTITION clause.
expressionβ
Expression that assigns values to the destination column.
DEFAULTβ
Assigns a default value to the destination column.
queryβ
Query statement whose result will be loaded into the Iceberg table. It can be any SQL statement supported by StarRocks.
PARTITIONβ
The partitions into which you want to load data. You must specify all partition columns of the Iceberg table in this property. The partition columns that you specify in this property can be in a different sequence than the partition columns that you have defined in the table creation statement.
You cannot specify the column_name property if you have specified the PARTITION clause.
Examplesβ
-
Insert three data rows into the
partition_tbl_1table:INSERT INTO partition_tbl_1
VALUES
("buy", 1, "2023-09-01"),
("sell", 2, "2023-09-02"),
("buy", 3, "2023-09-03"); -
Insert the result of a SELECT query, which contains simple computations, into the
partition_tbl_1table:INSERT INTO partition_tbl_1 (id, action, dt) SELECT 1+1, 'buy', '2023-09-03'; -
Insert the result of a SELECT query, which reads data from the
partition_tbl_1table, into the same table:INSERT INTO partition_tbl_1 SELECT 'buy', 1, date_add(dt, INTERVAL 2 DAY)
FROM partition_tbl_1
WHERE id=1; -
Insert the result of a SELECT query into the partitions that meet two conditions,
dt='2023-09-01'andid=1, of thepartition_tbl_2table:INSERT INTO partition_tbl_2 SELECT 'order', 1, '2023-09-01';Or
INSERT INTO partition_tbl_2 partition(dt='2023-09-01',id=1) SELECT 'order'; -
Overwrite all
actioncolumn values in the partitions that meet two conditions,dt='2023-09-01'andid=1, of thepartition_tbl_1table withclose:INSERT OVERWRITE partition_tbl_1 SELECT 'close', 1, '2023-09-01';Or
INSERT OVERWRITE partition_tbl_1 partition(dt='2023-09-01',id=1) SELECT 'close';
DELETEβ
You can use the DELETE statement to delete data from Iceberg tables based on specified conditions. This feature is supported from v4.1 and later.
Syntaxβ
DELETE FROM <table_name> WHERE <condition>
Parametersβ
-
table_name: The name of the Iceberg table you want to delete data from. You can use:- Fully qualified name:
catalog_name.database_name.table_name - Database-qualified name (after setting catalog):
database_name.table_name - Table name only (after setting catalog and database):
table_name
- Fully qualified name:
-
condition: The condition to identify which rows to delete. It can include:- Comparison operators:
=,!=,>,<,>=,<=,<> - Logical operators:
AND,OR,NOT INandNOT INclausesBETWEENandLIKEoperatorsIS NULLandIS NOT NULL- Sub-queries with
INorEXISTS
- Comparison operators:
Examplesβ
Basic DELETE operationsβ
Delete rows matching a simple condition:
DELETE FROM iceberg_catalog.db.table1 WHERE id = 3;
DELETE with IN and NOT INβ
Delete multiple rows using IN clause:
DELETE FROM iceberg_catalog.db.table1 WHERE id IN (18, 20, 22);
DELETE FROM iceberg_catalog.db.table1 WHERE id NOT IN (100, 101, 102);
DELETE with logical operatorsβ
Combine multiple conditions:
DELETE FROM iceberg_catalog.db.table1 WHERE age > 30 AND salary < 70000;
DELETE FROM iceberg_catalog.db.table1 WHERE status = 'inactive' OR last_login < '2023-01-01';
DELETE with pattern matchingβ
Use LIKE for pattern-based deletion:
DELETE FROM iceberg_catalog.db.table1 WHERE name LIKE 'A%';
DELETE FROM iceberg_catalog.db.table1 WHERE email LIKE '%@example.com';
DELETE with range conditionsβ
Use BETWEEN for range-based deletion:
DELETE FROM iceberg_catalog.db.table1 WHERE age BETWEEN 30 AND 40;
DELETE FROM iceberg_catalog.db.table1 WHERE created_date BETWEEN '2023-01-01' AND '2023-12-31';
DELETE with NULL checksβ
Delete rows with or without NULL values:
DELETE FROM iceberg_catalog.db.table1 WHERE name IS NULL;
DELETE FROM iceberg_catalog.db.table1 WHERE email IS NULL AND phone IS NULL;
DELETE FROM iceberg_catalog.db.table1 WHERE age IS NOT NULL;
DELETE with sub-queriesβ
Use sub-queries to identify rows to delete:
-- DELETE with IN sub-query
DELETE FROM iceberg_catalog.db.table1 WHERE id IN (SELECT id FROM temp_table WHERE expired = true);
-- DELETE with EXISTS sub-query
DELETE FROM iceberg_catalog.db.table1 t1 WHERE EXISTS (SELECT user_id FROM inactive_users t2 WHERE t2.user_id = t1.user_id);
TRUNCATEβ
You can use the TRUNCATE TABLE statement to quickly delete all data from Iceberg tables.
Syntaxβ
TRUNCATE TABLE <table_name>
Parametersβ
table_name: The name of the Iceberg table that you want to truncate data from. You can use:- Fully qualified name:
catalog_name.database_name.table_name - Database-qualified name (after setting catalog):
database_name.table_name - Table name only (after setting catalog and database):
table_name
- Fully qualified name:
Examplesβ
Example 1: Truncate a table using fully qualified nameβ
TRUNCATE TABLE iceberg_catalog.my_db.my_table;
Example 2: Truncate a table after setting catalogβ
SET CATALOG iceberg_catalog;
TRUNCATE TABLE my_db.my_table;
Example 3: Truncate a table after setting catalog and databaseβ
SET CATALOG iceberg_catalog;
USE my_db;
TRUNCATE TABLE my_table;