Skip to main content

StarRocks version 3.0

3.0.9

Release date: January 2, 2024

New features

  • Added the percentile_disc function. #36352
  • Added a new metric max_tablet_rowset_num for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539

Improvements

  • A new value option GROUP_CONCAT_LEGACY is added to the session variable sql_mode to provide compatibility with the implementation logic of the group_concat function in versions earlier than v2.5. #36150
  • When using JDK, the default GC algorithm is G1. #37386
  • The be_tablets view in the information_schema database provides a new field INDEX_DISK, which records the disk usage (measured in bytes) of persistent indexes #35615
  • Queries on MySQL external tables and the external tables within JDBC catalogs support including keywords in the WHERE clause. #35917
  • Supports updates onto the specified partitions of an automatically partitioned table. If the specified partitions do not exist, an error is returned. #34777
  • The Primary Key table size returned by the SHOW DATA statement includes the sizes of .cols files (these are files related to partial column updates and generated columns) and persistent index files. #34898
  • Optimized the performance of persistent index update when compaction is performed on all rowsets of a Primary Key table, which reduces disk read I/O. #36819
  • When the string on the right side of the LIKE operator within the WHERE clause does not include % or _, the LIKE operator is converted into the = operator. #37515
  • Optimized the logic used to compute compaction scores for Primary Key tables, thereby aligning the compaction scores for Primary Key tables within a more consistent range with the other three table types. #36534
  • The result returned by the SHOW ROUTINE LOAD statement now includes the timestamps of consumption messages from each partition. #36222
  • Optimized the performance of some Bitmap-related operations, including:
    • Optimized nested loop joins. #340804 #35003
    • Optimized the bitmap_xor function. #34069
    • Supports Copy on Write to optimize Bitmap performance and reduce memory consumption. #34047

Behavior Change

  • Added the session variable enable_materialized_view_for_insert, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false. #37505
  • Changed the FE configuration item enable_new_publish_mechanism to a static parameter from a dynamic one. You must restart the FE after you modify the parameter settings. #35338
  • Changed the default retention period of trash files to 1 day from the original 3 days. #37113

Parameter Change

Session variables

  • Added session variable cbo_decimal_cast_string_strict, which controls how the CBO converts data from the DECIMAL type to the STRING type. If this variable is set to true, the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). If this variable is set to false, the logic built in versions earlier than v2.5.x prevails and the system processes all valid digits to generate a string. The default value is true. #34208
  • Added session variables transaction_read_only and tx_read_only to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249

FE Parameters

  • Added the FE configuration item routine_load_unstable_threshold_second. #36222
  • Added the FE configuration item http_worker_threads_num, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530
  • Added the FE configuration item default_mv_refresh_immediate, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is true. #37093

BE Parameters

  • Added the BE configuration item enable_stream_load_verbose_log. The default value is false. With this parameter set to true, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113
  • Added the BE configuration item pindex_major_compaction_limit_per_disk to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 1. #37694
  • Added BE configuration items to specify the timeout duration for connecting to object storage:
    • object_storage_connect_timeout_ms: Timeout duration to establish socket connections with object storage. The default value is -1, which means to use the default timeout duration of the SDK configurations.
    • object_storage_request_timeout_ms: Timeout duration to establish HTTP connections with object storage. The default value is -1, which means to use the default timeout duration of the SDK configurations.

Bug Fixes

Fixed the following issues:

  • In some cases, BEs may crash when a Catalog is used to read ORC external tables. #27971
  • The BEs crash if users create persistent indexes in the event of data corruption. #30841
  • BEs occasionally crash after a Bitmap index is added. #26463
  • Failures in replaying replica operations may cause FEs to crash. #32295
  • Setting the FE parameter recover_with_empty_tablet to true may cause FEs to crash. #33071
  • Queries fail during hash joins, causing BEs to crash. #32219
  • In a StarRocks shared-nothing cluster, queries against Iceberg or Hive tables may cause BEs to crash. #34682
  • The error "get_applied_rowsets failed, tablet updates is in error state: tablet:18849 actual row size changed after compaction" is returned for queries. #33246
  • Running show proc '/statistic' may cause a deadlock. #34237
  • The FE performance plunges after the FE configuration item enable_collect_query_detail_info is set to true. #35945
  • Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. #34352
  • After StarRocks is upgraded from v2.4 or earlier to a later version, compaction scores may rise unexpectedly. #34618
  • If INFORMATION_SCHEMA is queried by using the database driver MariaDB ODBC, the CATALOG_NAME column returned in the schemata view holds only null values. #34627
  • FEs crash due to the abnormal data loaded and cannot restart. #34590
  • If schema changes are being executed while a Stream Load job is in the PREPARED state, a portion of the source data to be loaded by the job is lost. #34381
  • Including two or more slashes (/) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. #34601
  • The partition_live_number property added by using the ALTER TABLE statement does not take effect. #34842
  • The array_distinct function occasionally causes the BEs to crash. #36377
  • Deadlocks may occur when users refresh materialized views. #35736
  • Global Runtime Filter may cause BEs to crash in certain scenarios. #35776
  • In some cases, bitmap_to_string may return incorrect result due to data type overflow. #37405

3.0.8

Release date: November 17, 2023

Improvements

  • The COLUMNS view in the system database INFORMATION_SCHEMA can display ARRAY, MAP, and STRUCT columns. #33431

Bug Fixes

Fixed the following issues:

  • When show proc '/current_queries'; is being executed and meanwhile a query begins to be executed, BEs may crash. #34316
  • When data is continuously loaded into a Primary Key table with a sort key specified at a high frequency, compaction failures may occur. #26486
  • If a filtering condition is specified in a Broker Load job, BEs may crash during the data loading in certain circumstances. #29832
  • An unknown error is reported when SHOW GRANTS is executed. #30100
  • BE may crash for specific data types if the target data type specified in the cast() function is the same as the original data type. #31465
  • DATA_TYPE and COLUMN_TYPE for BINARY or VARBINARY data types are displayed as unknown in the information_schema.columns view. #32678
  • Long-time, frequent data loading into a Primary Key table with persistent index enabled may cause BEs to crash. #33220
  • The query result is incorrect when Query Cache is enabled. #32778
  • After a cluster is restarted, the data in a restored table may be inconsistent with the data in that table before being backed up. #33567
  • If RESTORE is executed and meanwhile Compaction takes place, it may cause BEs to crash. #32902

3.0.7

Release date: October 18, 2023

Improvements

  • Window functions COVAR_SAMP, COVAR_POP, CORR, VARIANCE, VAR_SAMP, STD, and STDDEV_SAMP now support the ORDER BY clause and Window clause. #30786
  • The Publish phase of a load job that writes data into a Primary Key table is changed from asynchronous mode to synchronous mode. As such, the data loaded can be queried immediately after the load job finishes. #27055
  • An error instead of NULL is returned if a decimal overflow occurs during queries on the DECIMAL type data. #30419
  • Executing SQL commands with invalid comments now returns results consistent with MySQL. #30210
  • For a StarRocks table that uses RANGE partitioning with only one partitioning column or expression partitioning, SQL predicates containing partition column expressions can also be used for partition pruning. #30421

Bug Fixes

Fixed the following issues:

  • Concurrently creating and deleting databases and tables can, in certain cases, result in the table not being found and further leads to the failure of data loading into that table. #28985
  • Using UDFs may lead to memory leaks in certain cases. #29467 #29465
  • If the ORDER BY clause contains aggregate functions, an error "java.lang.IllegalStateException: null" is returned. #30108
  • If users run queries against data stored in Tencent COS by using their Hive catalog which consists of multiple levels, the query results will be incorrect. #30363
  • If some subcfields of the STRUCT in ARRAY<STRUCT> type data are missing, the data length is incorrect when default values are filled in the missing subcfields during queries, which causes BEs to crash.
  • The version of Berkeley DB Java Edition is upgraded to avoid security vulnerabilities.#30029
  • If users load data into a Primary Key table on which truncate operations and queries are concurrently performed, an error "java.lang.NullPointerException" is thrown in certain cases. #30573
  • If the Schema Change execution time is too long, it may fail because the tablet of the specified version is garbage-collected. #31376
  • If users use CloudCanal to load data into table columns that are set to NOT NULL but have no default value specified, an error "Unsupported dataFormat value is : \N" is thrown. #30799
  • In StarRocks shared-data clusters, the information of table keys is not recorded in information_schema.COLUMNS. As a result, DELETE operations cannot be performed when data is loaded by using Flink Connector. #31458
  • During the upgrade, if the types of certain columns are also upgraded (for example, from Decimal type to Decimal v3 type), compaction on certain tables with specific characteristics may cause BEs to crash. #31626
  • When data is loaded by using Flink Connector, the load job is suspended unexpectedly if there are highly concurrent load jobs and both the number of HTTP threads and the number of Scan threads have reached their upper limits. #32251
  • BEs crash when libcurl is invoked. #31667
  • An error occurs when a column of BITMAP type is added to a Primary Key table. #31763

3.0.6

Release date: September 12, 2023

Behavior Change

  • When using the group_concat function, you must use the SEPARATOR keyword to declare the separator.

New Features

Improvements

  • Supports implicit conversions for all compound predicates and for all expressions in the WHERE clause. You can enable or disable implicit conversions by using the session variable enable_strict_type. The default value of this session variable is false. #21870
  • Unifies the logic between FEs and BEs in converting strings to integers. #29969

Bug Fixes

  • If enable_orc_late_materialization is set to true, an unexpected result is returned when a Hive catalog is used to query STRUCT-type data in ORC files. #27971
  • During data queries through Hive Catalog, if a partitioning column and an OR operator are specified in the WHERE clause, the query result is incorrect. #28876
  • The values returned by the RESTful API action show_data for cloud-native tables are incorrect. #29473
  • If the shared-data cluster stores data in Azure Blob Storage and a table is created, the FE fails to start after the cluster is rolled back to version 3.0. #29433
  • A user has no permission when querying a table in the Iceberg catalog even if the user is granted permission on that table. #29173
  • The Default field values returned by the SHOW FULL COLUMNS statement for columns of the BITMAP or HLL data type are incorrect. #29510
  • Modifying the FE dynamic parameter max_broker_load_job_concurrency using the ADMIN SET FRONTEND CONFIG command does not take effect.
  • The FE may fail to start when a materialized view is being refreshed while its refresh strategy is being modified. #29964 #29720
  • The error unknown error is returned when select count(distinct(int+double)) from table_name is executed. #29691
  • After a Primary Key table is restored, metadata errors occur and cause metadata inconsistencies occur if a BE is restarted. #30135

3.0.5

Release date: August 16, 2023

New Features

Improvements

Bug Fixes

Fixed the following issues:

  • Incorrect table-level scan statistics in FE cause inaccurate metrics for table queries and loading. #27779
  • The query result is not stable if the sort key is modified for a partitioned table. #27850
  • The version number for a tablet is inconsistent between the BE and FE after data is restored. #26518
  • If the bucket number is not specified when users create a Colocation table, the number will be inferred as 0, which causes failures in adding new partitions. #27086
  • When the SELECT result set of INSERT INTO SELECT is empty, the load job status returned by SHOW LOAD is CANCELED. #26913
  • BEs may crash when the input values of the sub_bitmap function are not of the BITMAP type. #27982
  • BEs may crash when the AUTO_INCREMENT column is being updated. #27199
  • Outer join and Anti join rewrite errors for materialized views. #28028
  • Inaccurate estimation of average row size causes Primary Key partial updates to occupy excessively large memory. #27485
  • Activating an inactive materialized view may cause a FE to crash. #27959
  • Queries can not be rewritten to materialized views created based on external tables in a Hudi catalog. #28023
  • The data of a Hive table can still be queried even after the table is dropped and the metadata cache is manually updated. #28223
  • Manually refreshing an asynchronous materialized view via a synchronous call results in multiple INSERT OVERWRITE records in the information_schema.task_runs table. #28060
  • FE memory leak caused by blocked LabelCleaner threads. #28311

3.0.4

Release date: July 18, 2023

New Feature

Queries can be rewritten even when the queries contain a different type of join than the materialized view. #25099

Improvements

  • Optimized the manual refreshing of asynchronous materialized views. Supports using the REFRESH MATERIALIZED VIEW WITH SYNC MODE syntax to synchronously invoke materialized view refresh tasks. #25910
  • If the queried fields are not included in the output columns of a materialized view but are included in the predicate of the materialized view, the query can still be rewritten to benefit from the materialized view. #23028
  • When the SQL dialect (sql_dialect) is set to trino, table aliases are not case-sensitive. #26094 #25282
  • Added a new field table_id to the table Information_schema.tables_config. You can join the table tables_config with the table be_tablets on the column table_id in the database Information_schema to query the names of the database and table to which a tablet belongs. #24061

Bug Fixes

Fixed the following issues:

  • If a query that contains the sum aggregate function is rewritten to directly obtain query results from a single-table materialized view, the values in sum() field may be incorrect due to type inference issues. #25512
  • An error occurs when SHOW PROC is used to view information about tablets in a StarRocks shared-data cluster.
  • The INSERT operation hangs when the length of CHAR data in a STRUCT to be inserted exceeds the maximum length. #25942
  • Some data rows queried fail to be returned for INSERT INTO SELECT with FULL JOIN. #26603
  • An error ERROR xxx: Unknown table property xxx occurs when the ALTER TABLE statement is used to modify the table's property default.storage_medium. #25870
  • An error occurs when Broker Load is used to load empty files. #26212
  • Decommissioning a BE sometimes hangs. #26509

3.0.3

Release date: June 28, 2023

Improvements

  • Metadata synchronization of StarRocks external tables has been changed to occur during data loading. #24739
  • Users can specify partitions when they run INSERT OVERWRITE on tables whose partitions are automatically created. For more information, see Automatic partitioning. #25005
  • Optimized the error message reported when partitions are added to a non-partitioned table. #25266

Bug Fixes

Fixed the following issues:

  • The min/max filter gets the wrong Parquet field when the Parquet file contains complex data types. #23976
  • Load tasks are still queuing even when the related database or table has been dropped. #24801
  • There is a low probability that an FE restart may cause BEs to crash. #25037
  • Load and query jobs occasionally freeze when the variable enable_profile is set to true. #25060
  • Inaccurate error message is displayed when INSERT OVERWRITE is executed on a cluster with less than three alive BEs. #25314

3.0.2

Release date: June 13, 2023

Improvements

  • Predicates in a UNION query can be pushed down after the query is rewritten by an asynchronous materialized view. #23312
  • Optimized the auto tablet distribution policy for tables. #24543
  • Removed the dependency of NetworkTime on system clocks, which fixes incorrect NetworkTime caused by inconsistent system clocks across servers. #24858

Bug Fixes

Fixed the following issues:

  • A schema change sometimes may be hung if data loading occurs simultaneously with the schema change. #23456
  • Queries encounter an error when the session variable pipeline_profile_level is set to 0. #23873
  • CREATE TABLE encounters an error when cloud_native_storage_type is set to S3.
  • LDAP authentication succeeds even when no password is used. #24862
  • CANCEL LOAD fails if the table involved in the load job does not exist. #24922

Upgrade Notes

If your system has a database named starrocks, change it to another name using ALTER DATABASE RENAME before the upgrade. This is because starrocks is the name of a default system database that stores privilege information.

3.0.1

Release date: June 1, 2023

New Features

  • [Preview] Supports spilling intermediate computation results of large operators to disks to reduce the memory consumption of large operators. For more information, see Spill to disk.
  • Routine Load supports loading Avro data.
  • Supports Microsoft Azure Storage (including Azure Blob Storage and Azure Data Lake Storage).

Improvements

  • Shared-data clusters support using StarRocks external tables to synchronize data with another StarRocks cluster.
  • Added load_tracking_logs to Information Schema to record recent loading errors.
  • Ignores special characters in CREATE TABLE statements. #23885

Bug Fixes

Fixed the following issues:

  • Information returned by SHOW CREATE TABLE is incorrect for Primary Key tables. #24237
  • BEs may crash during a Routine Load job. #20677
  • Null pointer exception (NPE) occurs if you specify unsupported properties when creating a partitioned table. #21374
  • Information returned by SHOW TABLE STATUS is incomplete. #24279

Upgrade Notes

If your system has a database named starrocks, change it to another name using ALTER DATABASE RENAME before the upgrade. This is because starrocks is the name of a default system database that stores privilege information.

3.0.0

Release date: April 28, 2023

New Features

System architecture

  • Decouple storage and compute. StarRocks now supports data persistence into S3-compatible object storage, enhancing resource isolation, reducing storage costs, and making compute resources more scalable. Local disks are used as hot data cache for boosting query performance. The query performance of the new shared-data architecture is comparable to the classic architecture (shared-nothing) when local disk cache is hit. For more information, see Deploy and use shared-data StarRocks.

Storage engine and data ingestion

  • The AUTO_INCREMENT attribute is supported to provide globally unique IDs, which simplifies data management.
  • Automatic partitioning and partitioning expressions are supported, which makes partition creation easier to use and more flexible.
  • Primary Key tables support more complete UPDATE and DELETE syntax, including the use of CTEs and references to multiple tables.
  • Added Load Profile for Broker Load and INSERT INTO jobs. You can view the details of a load job by querying the load profile. The usage is the same as Analyze query profile.

Data Lake Analytics

  • [Preview] Supports Presto/Trino compatible dialect. Presto/Trino's SQL can be automatically rewritten into StarRocks' SQL pattern. For more information, see the system variable sql_dialect.
  • [Preview] Supports JDBC catalogs.
  • Supports using SET CATALOG to manually switch between catalogs in the current session.

Privileges and security

  • Provides a new privilege system with full RBAC functionalities, supporting role inheritance and default roles. For more information, see Overview of privileges.
  • Provides more privilege management objects and more fine-grained privileges. For more information, see Privileges supported by StarRocks.

Query engine

  • Allows more queries on joined tables to benefit from the query cache. For example, the query cache now supports Broadcast Join and Bucket Shuffle Join.
  • Supports Global UDFs.
  • Dynamic adaptive parallelism: StarRocks can automatically adjust the pipeline_dop parameter for query concurrency.

SQL reference

Improvements

Deployment

Storage engine and data ingestion

  • Supports more CSV parameters for data ingestion, including SKIP_HEADER, TRIM_SPACE, ENCLOSE, and ESCAPE. See STREAM LOAD, BROKER LOAD, and ROUTINE LOAD.
  • The primary key and sort key are decoupled in Primary Key tables. The sort key can be separately specified in ORDER BY when you create a table.
  • Optimized the memory usage of data ingestion into Primary Key tables in scenarios such as large-volume ingestion, partial updates, and persistent primary indexes.
  • Supports creating asynchronous INSERT tasks. For more information, see INSERT and SUBMIT TASK. #20609

Materialized view

  • Optimized the rewriting capabilities of materialized views, including:
    • Supports rewrite of View Delta Join, Outer Join, and Cross Join.
    • Optimized SQL rewrite of Union with partition.
  • Improved materialized view building capabilities: supporting CTE, select *, and Union.
  • Optimized the information returned by SHOW MATERIALIZED VIEWS.
  • Supports adding MV partitions in batches, which improves the efficiency of partition addition during materialized view building. #21167

Query engine

  • All operators are supported in the pipeline engine. Non-pipeline code will be removed in later versions.
  • Improved Big Query Positioning and added big query log. SHOW PROCESSLIST supports viewing CPU and memory information.
  • Optimized Outer Join Reorder.
  • Optimized error messages in the SQL parsing stage, providing more accurate error positioning and clearer error messages.

Data Lake Analytics

  • Optimized metadata statistics collection.
  • Supports using SHOW CREATE TABLE to view the creation statements of the tables that are managed by an external catalog and are stored in Apache Hive™, Apache Iceberg, Apache Hudi, or Delta Lake.

Bug Fixes

  • Some URLs in the license header of StarRocks' source file cannot be accessed. #2224
  • An unknown error is returned during SELECT queries. #19731
  • Supports SHOW/SET CHARACTER. #17480
  • When the loaded data exceeds the field length supported by StarRocks, the error message returned is not correct. #14
  • Supports show full fields from 'table'. #17233
  • Partition pruning causes MV rewrites to fail. #14641
  • MV rewrite fails when the CREATE MATERIALIZED VIEW statement contains count(distinct) and count(distinct) is applied to the DISTRIBUTED BY column. #16558
  • FEs fail to start when a VARCHAR column is used as the partitioning column of a materialized view. #19366
  • Window functions LEAD and LAG incorrectly handle IGNORE NULLS. #21001
  • Adding temporary partitions conflicts with automatic partition creation. #21222

Behavior Change

  • The new role-based access control (RBAC) system supports the previous privileges and roles. However, the syntax of related statements such as GRANT and REVOKE is changed.
  • Renamed SHOW MATERIALIZED VIEW as SHOW MATERIALIZED VIEWS.
  • Added the following Reserved keywords: AUTO_INCREMENT, CURRENT_ROLE, DEFERRED, ENCLOSE, ESCAPE, IMMEDIATE, PRIVILEGES, SKIP_HEADER, TRIM_SPACE, VARBINARY.

Upgrade Notes

You can upgrade from v2.5 to v3.0 or downgrade from v3.0 to v2.5.

In theory, an upgrade from a version earlier than v2.5 is also supported. To ensure system availability, we recommend that you first upgrade your cluster to v2.5 and then to v3.0.

Take note of the following points when you perform a downgrade from v3.0 to v2.5.

BDBJE

StarRocks upgrades the BDB library in v3.0. However, BDBJE cannot be rolled back. You must use BDB library of v3.0 after a downgrade. Perform the following steps:

  1. After you replace the FE package with a v2.5 package, copy fe/lib/starrocks-bdb-je-18.3.13.jar of v3.0 to the fe/lib directory of v2.5.

  2. Delete fe/lib/je-7.*.jar.

Privilege system

The new RBAC privilege system is used by default after you upgrade to v3.0. You can only downgrade to v2.5.

After a downgrade, run ALTER SYSTEM CREATE IMAGE to create a new image and wait for the new image to be synchronized to all follower FEs. If you do not run this command, some of the downgrade operations may fail. This command is supported from 2.5.3 and later.

For details about the differences between the privilege system of v2.5 and v3.0, see "Upgrade notes" in Privileges supported by StarRocks.