- StarRocks
- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache KafkaĀ®
- Bulk load using Apache Sparkā¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Synchronize data from MySQL in real time
- Continuously load data from Apache FlinkĀ®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Administration
- Deployment
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE EXTERNAL CATALOG
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Auxiliary Commands
- Data Types
- Keywords
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_sub
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Math Functions
- String Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Deploy
- Data Migration
- SQL
- Query Dump
- Other FAQs
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Edit
StarRocks version 2.5
2.5.3
Release date: March 10, 2023
Improvements
- Optimized query rewrite for materialized views (MVs).
- Supports rewriting queries with Outer Join and Cross Join. #18629
- Optimized the data scan logic for MVs, further accelerating the rewritten queries. #18629
- Enhanced rewrite capabilities for single-table aggregate queries. #18629
- Enhanced rewrite capabilities in View Delta scenarios, which is when the queried tables are a subset of the MV's base tables. #18800
- Optimized the performance and memory usage when the window function RANK() is used as a filter or a sort key. #17553
Bug Fixes
The following bugs are fixed:
- Errors caused by null literals
[]
in ARRAY data. #18563 - Misuse of the low-cardinality optimization dictionary in some complex query scenarios. The dictionary mapping check is now added before applying the dictionary. #17318
- In a single BE environment, Local Shuffle causes GROUP BY to produce duplicate results. #17845
- Misuses of partition-related PROPERTIES for a non-partitioned MV may cause the MV refresh to fail. The partition PROPERTIES check is now performed when users create an MV. #18741
- Errors in parsing Parquet Repetition columns. #17626 #17788 #18051
- The obtained column's nullable information is incorrect. Solution: When CTAS is used to create a Primary Key table, only the primary key columns are non-nullable; non-primary key columns are nullable. #16431
- Some issues caused by deleting data from Primary Key tables. #18768
2.5.2
Release date: February 21, 2023
New Features
- Supports using the Instance Profile and Assumed Role-based credential methods to access AWS S3 and AWS Glue. #15958
- Supports the following bit functions: bit_shift_left, bit_shift_right, and bit_shift_right_logical. #14151
Improvements
- Optimized the memory release logic, which significantly reduces peak memory usage when a query contains a large number of aggregate queries. #16913
- Reduced the memory usage of sorting. The memory consumption is halved when a query involves window functions or sorting. #16937 #17362 #17408
Bug Fixes
The following bugs are fixed:
- Apache Hive external tables that contain MAP and ARRAY data cannot be refreshed. #17548
- Superset cannot identify column types of materialized views. #17686
- BI connectivity fails because SET GLOBAL/SESSION TRANSACTION cannot be parsed. #17295
- The bucket number of dynamic partitioned tables in a Colocate Group cannot be modified and an error message is returned. #17418
- Potential issues caused by a failure in the Prepare stage. #17323
Behavior Change
- Added CHARACTER to the reserved keyword list. #17488
2.5.1
Release date: February 5, 2023
Improvements
- Asynchronous materialized views created based on external catalogs support query rewrite. #11116 #15791
- Allows users to specify a collection period for automatic CBO statistics collection, which prevents cluster performance jitter caused by automatic full collection. #14996
- Added Thrift server queue. Requests that cannot be processed immediately during INSERT INTO SELECT can be pending in the Thrift server queue, preventing requests from being rejected. #14571
- Deprecated the FE parameter
default_storage_medium
. Ifstorage_medium
is not explicitly specified when users create a table, the system automatically infers the storage medium of the table based on BE disk type. For more information, see description ofstorage_medium
in CREATE TABLE. #14394
Bug Fixes
The following bugs are fixed:
- Null pointer exception (NPE) caused by SET PASSWORD. #15247
- JSON data with empty keys cannot be parsed. #16852
- Data of invalid types can be successfully converted into ARRAY data. #16866
- Nested Loop Join cannot be interrupted when an exception occurs. #16875
Behavior Change
- Deprecated the FE parameter
default_storage_medium
. The storage medium of a table is automatically inferred by the system. #14394
2.5.0
Release date: January 22, 2023
New Features
- Supports querying Merge On Read tables using Hudi catalogs and Hudi external tables. #6780
- Supports querying STRUCT and MAP data using Hive catalogs, Hudi catalogs, and Iceberg catalogs. #10677
- Provides Local Cache to improve access performance of hot data stored in external storage systems, such as HDFS. #11597
- Supports creating Delta Lake catalogs, which allow direct queries on data from Delta Lake. #11972
- Hive, Hudi, and Iceberg catalogs are compatible with AWS Glue. #12249
- Supports creating file external tables, which allow direct queries on Parquet and ORC files from HDFS and object stores. #13064
- Supports creating materialized views based on Hive, Hudi, Iceberg catalogs, and materialized views. For more information, see Materialized view. #11116 #11873
- Supports conditional updates for tables that use the Primary Key model. For more information, see Change data through loading. #12159
- Supports Query Cache, which stores intermediate computation results of queries, improving the QPS and reduces the average latency of highly-concurrent, simple queries. #9194
- Supports specifying the priority of Broker Load jobs. For more information, see BROKER LOAD #11029
- Supports specifying the number of replicas for data loading for StarRocks native tables. For more information, see CREATE TABLE. #11253
- Supports query queues. #12594
- Supports isolating compute resources occupied by data loading, thereby limiting the resource consumption of data loading tasks. For more information, see Resource group. #12606
- Supports specifying the following data compression algorithms for StarRocks native tables: LZ4, Zstd, Snappy, and Zlib. For more information, see Data compression. #10097 #12020
- Supports user-defined variables. #10011
- Supports lambda expression and the following higher-order functions: array_map, array_filter, array_sum, and array_sortby. #9461 #9806 #10323 #14034
- Provides the QUALIFY clause that filters the results of window functions. #13239
- Supports using the result returned by the uuid() and uuid_numeric() functions as the default value of a column when you create a table. For more information, see CREATE TABLE. #11155
- Supports the following functions: map_size, map_keys, map_values, max_by, sub_bitmap, bitmap_to_base64, host_name, and date_slice. #11299 #11323 #12243 #11776 #12634 #14225
Improvements
- Optimized the metadata access performance when you query external data using Hive catalogs, Hudi catalogs, and Iceberg catalogs. #11349
- Supports querying ARRAY data using Elasticsearch external tables. #9693
- Optimized the following aspects of materialized views:
- Asynchronous materialized views support automatic and transparent query rewrite based on the SPJG-type materialized views. For more information, see Materialized view. #13193
- Asynchronous materialized views support multiple async refresh mechanisms. For more information, see Materialized view. #12712 #13171 #13229 #12926
- The efficiency of refreshing materialized views is improved. #13167
- StarRocks automatically sets an appropriate number of tablets when you create a table, eliminating the need for manual operations. For more information, see CREATE TABLE. #10614
- Optimized the following aspects of data loading:
- Optimized loading performance in multi-replica scenarios by supporting the "single leader replication" mode. Data loading gains a one-fold performance lift. For more information about "single leader replication", see
replicated_storage
in CREATE TABLE. #10138 - Broker Load and Spark Load no longer need to depend on brokers for data loading when only one HDFS cluster or one Kerberos user is configured. However, if you have multiple HDFS clusters or multiple Kerberos users, you still need to deploy a broker. For more information, see Load data from HDFS or cloud storage and Bulk load using Apache Sparkā¢. #9049 #9228
- Optimized the performance of Broker Load when a large number of small ORC files are loaded. #11380
- Reduced the memory usage when you load data into tables of the Primary Key Model.
- Optimized loading performance in multi-replica scenarios by supporting the "single leader replication" mode. Data loading gains a one-fold performance lift. For more information about "single leader replication", see
- Optimized the
information_schema
database and thetables
andcolumns
tables within. Adds a new tabletable_config
. For more information, see Information Schema. #10033 - Optimized data backup and restore:
- Supports backing up and restoring data from multiple tables in a database at a time. For more information, see Backup and restore data. #11619
- Supports backing up and restoring data from Primary Key tables. For more information, see Backup and restore. #11885
- Optimized the following functions:
- Added an optional parameter for the time_slice function, which is used to determine whether the beginning or end of the time interval is returned. #11216
- Added a new mode
INCREASE
for the window_funnel function to avoid computing duplicate timestamps. #10134 - Supports specifying multiple arguments in the unnest function. #12484
- lead() and lag() functions support querying HLL and BITMAP data. For more information, see Window function. #12108
- The following ARRAY functions support querying JSON data: array_agg, array_sort, array_concat, array_slice, and reverse. #13155
- Optimized the use of some functions. The
current_date
,current_timestamp
,current_time
,localtimestamp
, andlocaltime
functions can be executed without using()
, for example, you can directly runselect current_date;
. # 14319
- Removed some redundant information from FE logs. # 15374
Bug Fixes
The following bugs are fixed:
- The append_trailing_char_if_absent() function may return an incorrect result when the first argument is empty. #13762
- After a table is restored using the RECOVER statement, the table does not exist. #13921
- The result returned by the SHOW CREATE MATERIALIZED VIEW statement does not contain the database and catalog specified in the query statement when the materialized view was created. #12833
- Schema change jobs in the
waiting_stable
state cannot be canceled. #12530 - Running the
SHOW PROC '/statistic';
command on a Leader FE and non-Leader FE returns different results. #12491 - The position of the ORDER BY clause is incorrect in the result returned by SHOW CREATE TABLE. # 13809
- When users use Hive Catalog to query Hive data, if the execution plan generated by FE does not contain partition IDs, BEs fail to query Hive partition data. # 15486.
Behavior Change
- Changed the default value of the
AWS_EC2_METADATA_DISABLED
parameter toFalse
, which means that the metadata of Amazon EC2 is obtained to access AWS resources. - Renamed session variable
is_report_success
toenable_profile
, which can be queried using the SHOW VARIABLES statement. - Added four reserved keywords:
CURRENT_DATE
,CURRENT_TIME
,LOCALTIME
, andLOCALTIMESTAMP
. # 14319 - The maximum length of table and database names can be up to 1023 characters. # 14929 # 15020
- BE configuration items
enable_event_based_compaction_framework
andenable_size_tiered_compaction_strategy
are set totrue
by default, which significantly reduces compaction overheads when there are a large number of tablets or a single tablet has large data volume.
Upgrade Notes
- You can upgrade your cluster to 2.5.0 from 2.0.x, 2.1.x, 2.2.x, 2.3.x, or 2.4.x. However, if you need to perform a rollback, we recommend that you roll back only to 2.4.x.