- StarRocks
- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Sparkâ„¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Synchronize data from MySQL in real time
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Gather CBO statistics
- Materialized view
- Colocate Join
- Lateral Join
- Index
- Computing the Number of Distinct Values
- Administration
- Deployment
- Management
- Data Recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE EXTERNAL CATALOG
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Auxiliary Commands
- Data Types
- Keywords
- Function Reference
- Java UDFs
- Window functions
- Aggregate Functions
- Array Functions
- Bit Functions
- Bitmap Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Math Functions
- String Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- Error code
- System limits
- SQL Reference
- FAQ
- Deploy
- Data Migration
- SQL
- Other FAQs
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Iceberg catalog
This topic describes how to create an Iceberg catalog, and how to configure your StarRocks cluster for querying data from Apache Iceberg.
An Iceberg catalog is an external catalog supported in StarRocks 2.4 and later versions. It enables you to query data from Iceberg without loading data into StarRocks or creating external tables.
Usage notes
- StarRocks supports querying data files of Iceberg in the following formats: Parquet and ORC
- StarRocks supports querying compressed data files of Iceberg in the following formats: gzip, Zstd, LZ4, and Snappy.
- StarRocks supports querying Iceberg data of the following types: BOOLEAN, INTEGER, LONG, FLOAT, DOUBLE, DECIMAL(P, S), DATE, TIME, TIMESTAMP, STRING, UUID, LIST, FIXED(L), and BINARY. Note that an error occurs when you query Iceberg data of unsupported data types, including TIMESTAMPTZ, STRUCT, and MAP.
- StarRocks supports querying Versions 1 tables (Analytic Data Tables). Versions 2 tables (Row-level Deletes) are not supported. For the differences between these two types of tables, see Iceberg Table Spec.
- You can use the DESC statement to view the schema of an Iceberg table in StarRocks 2.4 and later versions.
Before you begin
Before you create an Iceberg catalog, configure your StarRocks cluster so that StarRocks can access the data storage system and metadata service of your Iceberg cluster. StarRocks supports two data storage systems for Iceberg: HDFS and Amazon S3. StarRocks supports two metadata services for Iceberg: Hive metastore and custom metadata service. The configurations required for an Iceberg catalog are the same as that required for a Hive catalog. Therefore, see Hive catalog for more information about the configurations.
Create an Iceberg catalog
After you complete the preceding configurations, you can create an Iceberg catalog.
Syntax
CREATE EXTERNAL CATALOG catalog_name
PROPERTIES ("key"="value", ...);
Note
Before querying Iceberg data, you must add the mapping between the domain name and IP address of Hive metastore node to the /etc/hosts path. Otherwise, StarRocks may fail to access Hive metastore when you start a query.
Parameters
catalog_name
: the name of the Iceberg catalog. This parameter is required.
The naming conventions are as follows:- The name can contain letters, digits (0-9), and underscores (_). It must start with a letter.
- The name cannot exceed 64 characters in length.
PROPERTIES
: the properties of the Iceberg catalog. This parameter is required. You need to configure this parameter based on the metadata service used by your Iceberg cluster. In Iceberg, there is a component called catalog, which is used to store the mapping of Iceberg tables and paths of storing Iceberg tables. If you use different metadata services, you need to configure different types of catalogs for your Iceberg cluster.- Hive metastore: If you use Hive metastore, configure HiveCatalog for your Iceberg cluster.
- Custom metadata service: If you use the custom metadata service, configure a custom catalog for your Iceberg cluster.
Hive metastore
If you use Hive metastore for your Iceberg cluster, configure the following properties for the Iceberg catalog.
Property | Required | Description |
---|---|---|
type | Yes | The type of the data source. Set the value to iceberg . |
iceberg.catalog.type | Yes | The type of the catalog configured your Iceberg cluster. Set the value to HIVE . |
iceberg.catalog.hive.metastore.uris | Yes | The URI of the Hive metastore. The parameter value is in the following format: thrift://<IP address of Hive metastore>:<port number> . The port number defaults to 9083. |
Custom metadata service
If you use a custom metadata service for your Iceberg cluster, you need to create a custom catalog class (The class name of the custom catalog cannot be duplicated with the name of the class that already exists in StarRocks) and implement the related interface in StarRocks so that StarRocks can access the custom metadata service. The custom catalog class needs to inherit the abstract class BaseMetastoreCatalog. For information about how to create a custom catalog in StarRocks, see IcebergHiveCatalog. After the custom catalog is created, package the catalog and its related files, and then place them under the fe/lib path of each FE. Then restart each FE.
After you complete the preceding operations, you can create an Iceberg catalog and configure its properties.
Property | Required | Description |
---|---|---|
type | Yes | The type of the data source. Set the value to iceberg . |
iceberg.catalog.type | Yes | The type of the catalog configured your Iceberg cluster. Set the value to CUSTOM . |
iceberg.catalog-impl | Yes | The fully qualified class name of the custom catalog. FEs search for the catalog based on this name. If the custom catalog contains custom configuration items, you must add them to the PROPERTIES parameter as key-value pairs when you create an Iceberg catalog. |
Caching strategy of Iceberg metadata
StarRocks does not cache Iceberg metadata. When you query Iceberg data, Iceberg directly returns the latest data by default.
Use catalog to query Iceberg data
After you complete all the preceding operations, you can use the Iceberg catalog to query Iceberg data. For more information, see Query external data.
References
- To view examples of creating an external catalog, see CREATE EXTERNAL CATALOG.
- To view all catalogs in the current StarRocks cluster, see SHOW CATALOGS.
- To delete an external catalog, see DROP CATALOG.