- StarRocks
- Introduction to StarRocks
- Quick Start
- Deployment
- Deployment overview
- Prepare
- Deploy
- Deploy classic StarRocks
- Deploy and use shared-data StarRocks
- Manage
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache KafkaĀ®
- Bulk load using Apache Sparkā¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Realtime synchronization from MySQL
- Continuously load data from Apache FlinkĀ®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Gather CBO statistics
- Synchronous materialized view
- Asynchronous materialized view
- Colocate Join
- Lateral Join
- Query Cache
- Index
- Computing the Number of Distinct Values
- Sorted streaming aggregate
- Administration
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEWS
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- AUTO_INCREMENT
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- array_agg
- avg
- any_value
- approx_count_distinct
- bitmap
- bitmap_agg
- count
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_agg
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Binary Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- Math Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- starts_with
- strleft
- strright
- substring
- trim
- ucase
- unhex
- upper
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
StarRocks version 3.0
3.0.1
Release date: June 1, 2023
New Features
- [Preview] Supports spilling intermediate computation results of large operators to disks to reduce the memory consumption of large operators. For more information, see Spill to disk.
- Routine Load supports loading Avro data.
- Supports Microsoft Azure Storage (including Azure Blob Storage and Azure Data Lake Storage).
Improvements
- Shared-data clusters support using StarRocks external tables to synchronize data with another StarRocks cluster.
- Added
load_tracking_logs
to Information Schema to record recent loading errors. - Ignores special characters in CREATE TABLE statements. #23885
Bug Fixes
Fixed the following issues:
- Information returned by SHOW CREATE TABLE is incorrect for Primary Key tables. #24237
- BEs may crash during a Routine Load job. #20677
- Null pointer exception (NPE) occurs if you specify unsupported properties when creating a partitioned table. #21374
- Information returned by SHOW TABLE STATUS is incomplete. #24279
3.0.0
Release date: April 28, 2023
New Features
System architecture
- Decouple storage and compute. StarRocks now supports data persistence into S3-compatible object storage, enhancing resource isolation, reducing storage costs, and making compute resources more scalable. Local disks are used as hot data cache for boosting query performance. The query performance of the new shared-data architecture is comparable to the classic architecture (shared-nothing) when local cache is hit. For more information, see Deploy and use shared-data StarRocks.
Storage engine and data ingestion
- The AUTO_INCREMENT attribute is supported to provide globally unique IDs, which simplifies data management.
- Automatic partitioning and partitioning expressions are supported, which makes partition creation easier to use and more flexible.
- Primary Key tables support more complete UPDATE and DELETE syntax, including the use of CTEs and references to multiple tables.
- Added Load Profile for Broker Load and INSERT INTO jobs. You can view the details of a load job by querying the load profile. The usage is the same as Analyze query profile.
Data Lake Analytics
- [Preview] Supports Presto/Trino compatible dialect. Presto/Trino's SQL can be automatically rewritten into StarRocks' SQL pattern. For more information, see the system variable
sql_dialect
. - [Preview] Supports JDBC catalogs.
- Supports using SET CATALOG to manually switch between catalogs in the current session.
Privileges and security
- Provides a new privilege system with full RBAC functionalities, supporting role inheritance and default roles. For more information, see Overview of privileges.
- Provides more privilege management objects and more fine-grained privileges. For more information, see Privileges supported by StarRocks.
Query engine
- Allows more queries on joined tables to benefit from the query cache. For example, the query cache now supports Broadcast Join and Bucket Shuffle Join.
- Supports Global UDFs.
- Dynamic adaptive parallelism: StarRocks can automatically adjust the
pipeline_dop
parameter for query concurrency.
SQL reference
- Added the following privilege-related SQL statements: SET DEFAULT ROLE, SET ROLE, SHOW ROLES, and SHOW USERS.
- Added the following semi-structured data analysis functions: map_apply, map_from_arrays, map_filter, transform_keys, and transform_values.
- array_agg supports ORDER BY.
- Window functions lead and lag support IGNORE NULLS.
- Added string functions replace, hex_decode_binary, and hex_decode_string().
- Added encryption functions base64_decode_binary and base64_decode_string.
- Added math functions sinh, cosh, and tanh.
- Added utility function current_role.
Improvements
Deployment
- Updated Docker image and the related Docker deployment document for version 3.0. #20623 #21021
Storage engine and data ingestion
- Supports more CSV parameters for data ingestion, including SKIP_HEADER, TRIM_SPACE, ENCLOSE, and ESCAPE. See STREAM LOAD, BROKER LOAD, and ROUTINE LOAD.
- The primary key and sort key are decoupled in Primary Key tables. The sort key can be separately specified in
ORDER BY
when you create a table. - Optimized the memory usage of data ingestion into Primary Key tables in scenarios such as large-volume ingestion, partial updates, and persistent primary indexes.
- Supports creating asynchronous INSERT tasks. For more information, see INSERT and SUBMIT TASK. #20609
Materialized view
- Optimized the rewriting capabilities of materialized views, including:
- Supports rewrite of View Delta Join, Outer Join, and Cross Join.
- Optimized SQL rewrite of Union with partition.
- Improved materialized view building capabilities: supporting CTE, select *, and Union.
- Optimized the information returned by SHOW MATERIALIZED VIEWS.
- Supports adding MV partitions in batches, which improves the efficiency of partition addition during materialized view building. #21167
Query engine
- All operators are supported in the pipeline engine. Non-pipeline code will be removed in later versions.
- Improved Big Query Positioning and added big query log. SHOW PROCESSLIST supports viewing CPU and memory information.
- Optimized Outer Join Reorder.
- Optimized error messages in the SQL parsing stage, providing more accurate error positioning and clearer error messages.
Data Lake Analytics
- Optimized metadata statistics collection.
- Supports using SHOW CREATE TABLE to view the creation statements of the tables that are managed by an external catalog and are stored in Apache Hiveā¢, Apache Iceberg, Apache Hudi, or Delta Lake.
Bug Fixes
- Some URLs in the license header of StarRocks' source file cannot be accessed. #2224
- An unknown error is returned during SELECT queries. #19731
- Supports SHOW/SET CHARACTER. #17480
- When the loaded data exceeds the field length supported by StarRocks, the error message returned is not correct. #14
- Supports
show full fields from 'table'
. #17233 - Partition pruning causes MV rewrites to fail. #14641
- MV rewrite fails when the CREATE MATERIALIZED VIEW statement contains
count(distinct)
andcount(distinct)
is applied to the DISTRIBUTED BY column. #16558 - FEs fail to start when a VARCHAR column is used as the partitioning column of a materialized view. #19366
- Window functions LEAD and LAG incorrectly handle IGNORE NULLS. #21001
- Adding temporary partitions conflicts with automatic partition creation. #21222
Behavior Change
- The new role-based access control (RBAC) system supports the previous privileges and roles. However, the syntax of related statements such as GRANT and REVOKE is changed.
- Renamed SHOW MATERIALIZED VIEW as SHOW MATERIALIZED VIEWS.
- Added the following Reserved keywords: AUTO_INCREMENT, CURRENT_ROLE, DEFERRED, ENCLOSE, ESCAPE, IMMEDIATE, PRIVILEGES, SKIP_HEADER, TRIM_SPACE, VARBINARY.
Upgrade Notes
You can upgrade from v2.5 to v3.0 or downgrade from v3.0 to v2.5.
In theory, an upgrade from a version earlier than v2.5 is also supported. To ensure system availability, we recommend that you first upgrade your cluster to v2.5 and then to v3.0.
Take note of the following points when you perform a downgrade from v3.0 to v2.5.
BDBJE
StarRocks upgrades the BDB library in v3.0. However, BDBJE cannot be rolled back. You must use BDB library of v3.0 after a downgrade. Perform the following steps:
After you replace the FE package with a v2.5 package, copy
fe/lib/starrocks-bdb-je-18.3.13.jar
of v3.0 to thefe/lib
directory of v2.5.Delete
fe/lib/je-7.*.jar
.
Privilege system
The new RBAC privilege system is used by default after you upgrade to v3.0. You can only downgrade to v2.5.
After a downgrade, run ALTER SYSTEM CREATE IMAGE to create a new image and wait for the new image to be synchronized to all follower FEs. If you do not run this command, some of the downgrade operations may fail. This command is supported from 2.5.3 and later.
For details about the differences between the privilege system of v2.5 and v3.0, see "Upgrade notes" in Privileges supported by StarRocks.