- StarRocks
- Introduction to StarRocks
- Quick Start
- Deployment
- Deployment overview
- Prepare
- Deploy
- Deploy classic StarRocks
- Deploy and use shared-data StarRocks
- Manage
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Spark™
- Load data using INSERT
- Load data using Stream Load transaction interface
- Realtime synchronization from MySQL
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Gather CBO statistics
- Synchronous materialized view
- Asynchronous materialized view
- Colocate Join
- Lateral Join
- Query Cache
- Index
- Computing the Number of Distinct Values
- Sorted streaming aggregate
- Administration
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEWS
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- AUTO_INCREMENT
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- array_agg
- avg
- any_value
- approx_count_distinct
- bitmap
- bitmap_agg
- count
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_agg
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Binary Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- Math Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- starts_with
- strleft
- strright
- substring
- trim
- ucase
- unhex
- upper
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
StarRocks version 2.2
Release date: April 6, 2023
Improvements
- Optimized the bitmap_contains() function to reduce its memory consumption and improve its performance in some scenarios. #20616
- Optimized the Compaction framework to reduce its CPU resource consumption. #11746
Bug Fixes
The following bugs are fixed:
- If the requested URL in a Stream Load job is not correct, the responsible FE hangs and is unable to handle the HTTP request. #18468
- When the responsible FE collects statistics, it may consume an abnormally large amount of memory, which causes OOM. #16331
- BEs crash if memory release is not properly handled in some queries. #11395
- After the command TRUNCATE TABLE is executed, a NullPointerException may occur and the responsible FE fails to restart. #16773
2.2.10
Release date: December 2, 2022
Improvements
Optimized the error message returned for Routine Load jobs. #12203
Supports the logical operator
&&
. #11819Queries are immediately canceled when the BE crashes, preventing system stuck issues caused by expired queries. #12954
Optimized the FE start script. Java version is now checked during FE start. #14094
Supports deleting large volumes of data from Primary Key tables. #4772
Bug Fixes
The following bugs are fixed:
When users create a view from multiple tables (UNION), BEs crash if the leftmost child of UNION operations uses NULL constants. (#13792)
BEs crash if the Parquet file to query has inconsistent column types with Hive table schema. #8848
When a query contains a large number of OR operators, the planner needs to perform excessive recursive calculations, which causes the query to time out. #12788
The query result is incorrect when the subquery contains a LIMIT clause. #12466
The CREATE VIEW statement fails when double quotation marks in the SELECT clause are mixed with single quotation marks. #13102
2.2.9
Release date: November 15, 2022
Improvements
Added the session variable
hive_partition_stats_sample_size
to control the number of Hive partitions from which to collect statistics. An excessive number of partitions will cause errors in obtaining Hive metadata. #12700Elasticsearch external tables support custom time zones. #12662
Bug Fixes
The following bugs are fixed:
The DECOMMISSION operation is stuck if an error occurs during metadata synchronization for external tables. #12369
Compaction crashes if a column that is newly added is deleted. #12907
SHOW CREATE VIEW does not display the comments that were added when creating the view. #4163
Memory leak in Java UDF may cause OOM. #12418
The node alive status stored in Follower FEs is not accurate in some scenarios because the status depends on
heartbeatRetryTimes
. To fix this issue, a propertyaliveStatus
is added toHeartbeatResponse
to indicate the node alive status. #12481
Behavior Change
Extended the length of Hive STRING columns that can be queried by StarRocks from 64 KB to 1 MB. If a STRING column exceeds 1 MB, it will be processed as a null column during queries. #12986
2.2.8
Release date: October 17, 2022
Bug Fixes
The following bugs are fixed:
BEs may crash if an expression encounters an error in the initialization stage. #11395
BEs may crash if invalid JSON data is loaded. #10804
Parallel writing encounters an error when the pipeline engine is enabled. #11451
BEs crash when the ORDER BY NULL LIMIT clause is used. #11648
BEs crash if the Parquet file to query has inconsistent column type with Hive table schema. #11839
2.2.7
Release date: September 23, 2022
Bug Fixes
The following bugs are fixed:
Data may be lost when users load JSON data into StarRocks. #11054
The output from SHOW FULL TABLES is incorrect. #11126
In previous versions, to access data in a view, users must have permissions on both the base tables and the view. In the current version, users are only required to have permissions on the view. #11290
The result from a complex query that is nested with EXISTS or IN is incorrect. #11415
REFRESH EXTERNAL TABLE fails if the schema of the corresponding Hive table is changed. #11406
An error may occur when a non-leader FE replays the bitmap index creation operation. [#11261](
2.2.6
Release date: September 14, 2022
Bug Fixes
The following bugs are fixed:
The result of
order by... limit...offset
is incorrect when the subquery contains LIMIT. #9698The BE crashes if partial update is performed on a table with large data volume. #9809
Compaction causes BEs to crash if the size of BITMAP data to compact exceeds 2 GB. #11159
The like() and regexp() functions do not work if the pattern length exceeds 16 KB. #10364
Behavior Change
The format used to represent JSON values in an array in the output is modified. Escape characters are no longer used in the returned JSON values. #10790
2.2.5
Release date: August 18, 2022
Improvements
Improved the system performance when the pipeline engine is enabled. #9580
Improved the accuracy of memory statistics for index metadata. #9837
Bug Fixes
The following bugs are fixed:
BEs may be stuck in querying Kafka partition offsets (
get_partition_offset
) during Routine Load. #9937An error occurs when multiple Broker Load threads attempt to load the same HDFS file. #9507
2.2.4
Release date: August 3, 2022
Improvements
Supports synchronizing schema changes on Hive table to the corresponding external table. #9010
Supports loading ARRAY data in Parquet files via Broker Load. #9131
Bug Fixes
The following bugs are fixed:
Broker Load cannot handle Kerberos logins with multiple keytab files. #8820 #8837
Supervisor may fail to restart services if stop_be.sh exits immediately after it is executed. #9175
Incorrect Join Reorder precedence causes error "Column cannot be resolved". #9063 #9487
2.2.3
Release date: July 24, 2022
Bug Fixes
The following bugs are fixed:
An error occurs when users delete a resource group. #8036
Thrift server exits when the number of threads is insufficient. #7974
In some scenarios, join reorder in CBO returns no results. #7099 #7831 #6866
2.2.2
Release date: June 29, 2022
Improvements
Optimized concurrency control for internal processing such as schema change. This reduces pressure on FE metadata management. In addition, the possibility that load jobs may pile up or slow down is reduced in scenarios where huge volume of data needs to be loaded at high concurrency. #6838
Bug Fixes
The following bugs are fixed:
The number of replicas (
replication_num
) created by using CTAS is incorrect. #7036Metadata may be lost after ALTER ROUTINE LOAD is performed. #7068
Pipeline issues that may cause memory leaks. #7295
Deadlock may occur when a Routine Load job is aborted. #6849
Some profile statistics information is inaccurate. #7074 #6789
The get_json_string function incorrectly processes JSON arrays. #7671
2.2.1
Release date: June 2, 2022
Improvements
Optimized the data loading performance and reduced long tail latency by reconstructing part of the hotspot code and reducing lock granularity. #6641
Added the CPU and memory usage information of the machines on which BEs are deployed for each query to the FE audit log. #6208 #6209
Supported JSON data types in the Primary Key tables and Unique Key tables. #6544
Reduced FEs load by reducing lock granularity and deduplicating BE report requests. Optimized the report performance when a large number of BEs are deployed, and solved the issue of Routine Load tasks getting stuck in a large cluster. #6293
Bug Fixes
The following bugs are fixed:
An error occurs when StarRocks parses the escape characters specified in the
SHOW FULL TABLES FROM DatabaseName
statement. #6559FE disk space usage rises sharply (Fix this bug by rolling back the BDBJE version). #6708
BEs become faulty because relevant fields cannot be found in the data returned after columnar scanning is enabled (
enable_docvalue_scan=true
). #6600
2.2.0
Release date: May 22, 2022
New Features
[Preview] The resource group management feature is released. This feature allows StarRocks to isolate and efficiently use CPU and memory resources when StarRocks processes both complex queries and simple queries from different tenants in the same cluster.
[Preview] A Java-based user-defined function (UDF) framework is implemented. This framework supports UDFs that are compiled in compliance with the syntax of Java to extend the capabilities of StarRocks.
[Preview] The Primary Key table supports updates only to specific columns when data is loaded to the Primary Key table in real-time data update scenarios such as order updates and multi-stream joins.
[Preview] JSON data types and JSON functions are supported.
External tables can be used to query data from Apache Hudi. This further improves users' data lake analytics experience with StarRocks. For more information, see External tables.
The following functions are added:
- ARRAY functions: array_agg, array_sort, array_distinct, array_join, reverse, array_slice, array_concat, array_difference, arrays_overlap, and array_intersect
- BITMAP functions: bitmap_max and bitmap_min
- Other functions: retention and square
Improvements
The parser and analyzer of the cost-based optimizer (CBO) are restructured, the code structure is optimized, and syntaxes such as INSERT with Common Table Expression (CTE) are supported. These improvements are made to increase the performance of complex queries, such as queries that involve the reuse of CTEs.
The performance of queries on Apache Hive™ external tables that are stored in cloud object storage services such as AWS Simple Storage Service (S3), Alibaba Cloud Object Storage Service (OSS), and Tencent Cloud Object Storage (COS) is optimized. After the optimization, the performance of object storage-based queries is comparable to that of HDFS-based queries. Additionally, late materialization of ORC files is supported, and queries on small files are accelerated. For more information, see Apache Hive™ external table.
When queries from Apache Hive™ are run by using external tables, StarRocks automatically performs incremental updates to the cached metadata by consuming Hive metastore events such as data changes and partition changes. StarRocks also supports queries on data of the DECIMAL and ARRAY types from Apache Hive™. For more information, see Apache Hive™ external table.
The UNION ALL operator is optimized to run 2 to 25 times faster than before.
A pipeline engine that supports adaptive parallelism and provides optimized profiles is released to improve the performance of simple queries in high concurrency scenarios.
Multiple characters can be combined and used as a single row delimiter for CSV files that are to be imported.
Bug Fixes
Deadlocks occur if data is loaded or changes are committed into tables that are based on the primary Key table. #4998
Frontends (FEs), including FEs that run Oracle Berkeley DB Java Edition (BDB JE), are unstable. #4428, #4666, #2
The result that is returned by the SUM function encounters an arithmetic overflow if the function is invoked on a large amount of data. #3944
The precision of the results that are returned by the ROUND and TRUNCATE functions is unsatisfactory. #4256
A few bugs are detected by Synthesized Query Lancer (SQLancer). For more information, see SQLancer-related issues.
Others
Flink-connector-starrocks supports Apache Flink® v1.14.
Upgrade notes
If you use a StarRocks version later than 2.0.4 or a StarRocks version 2.1.x later than 2.1.6, you can disable the tablet clone feature before the upgrade (
ADMIN SET FRONTEND CONFIG ("max_scheduling_tablets" = "0");
andADMIN SET FRONTEND CONFIG ("max_balancing_tablets" = "0");
). After the upgrade, you can enable this feature (ADMIN SET FRONTEND CONFIG ("max_scheduling_tablets" = "2000");
andADMIN SET FRONTEND CONFIG ("max_balancing_tablets" = "100");
).To roll back to the previous version that was used before the upgrade, add the
ignore_unknown_log_id
parameter to the fe.conf file of each FE and set the parameter totrue
. The parameter is required because new types of logs are added in StarRocks v2.2.0. If you do not add the parameter, you cannot roll back to the previous version. We recommend that you set theignore_unknown_log_id
parameter tofalse
in the fe.conf file of each FE after checkpoints are created. Then, restart the FEs to restore the FEs to the previous configurations.