- StarRocks
- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Spark™
- Load data using INSERT
- Load data using Stream Load transaction interface
- Realtime synchronization from MySQL
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Using StarRocks
- Administration
- Deployment
- Management
- Data Recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- Keywords
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- DROP FILE
- INSTALL PLUGIN
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROCESSLIST
- SHOW TABLE STATUS
- SHOW FILE
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE EXTERNAL CATALOG
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SELECT
- SHOW ALTER
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Data Types
- Auxiliary Commands
- Function Reference
- Date Functions
- convert_tz
- curdate
- current_timestamp
- curtime
- datediff
- date_add
- date_format
- date_sub
- date_trunc
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- from_days
- from_unixtime
- hour
- minute
- month
- monthname
- now
- quarter
- second
- str_to_date
- timediff
- time_slice
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- weekofyear
- year
- hours_diff
- minutes_diff
- months_diff
- seconds_diff
- weeks_diff
- years_diff
- Aggregate Functions
- Geographic Functions
- String Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Bitmap Functions
- Array Functions
- cast function
- hash function
- Cryptographic Functions
- Math Functions
- Utility Functions
- Date Functions
- System variables
- Error code
- System limits
- SQL Reference
- FAQ
- Deploy
- Data Migration
- SQL
- Other FAQs
- Benchmark
StarRocks version 2.3
2.3.3
Release date: September 27, 2022
Bug Fixes
The following bugs are fixed:
Query result may be inaccurate when you query an Hive external table stored as a text file. #11546
Nested arrays are not supported when you query Parquet files. #10983
Queries or a query may time out if concurrent queries that read data from StarRocks and external data sources are routed to the same resource group, or a query reads data from StarRocks and external data sources. #10983
When the Pipeline execution engine is enabled by default, the parameter parallel_fragment_exec_instance_num is changed to 1. It will cause data loading by using INSERT INTO to be slow. #11462
BE may crash if there are mistakes when a expression is initialized. #11396
The error heap-buffer-overflow may occur if you execute ORDER BY LIMIT. #11185
Schema change fails if you restart Leader FE in the meantime. #11561
2.3.2
Release date: September 7, 2022
New Features
Late materialization is supported to accelerate range filter-based queries on external tables in Parquet format. #9738
The SHOW AUTHENTICATION statement is added to display user authentication-related information. #9996
Improvements
A configuration item is provided to control whether StarRocks recursively traverses all data files for the bucketed Hive table from which StarRocks queries data. #10239
The resource group type
realtime
is renamed asshort_query
. #10247StarRocks no longer distinguishes between uppercase letters and lowercase letters in Hive external tables by default. #10187
Bug Fixes
The following bugs are fixed:
Queries on an Elasticsearch external table may unexpectedly exit when the table is divided into multiple shards. #10369
StarRocks throws errors when sub-queries are rewritten as common table expressions (CTEs). #10397
StarRocks throws errors when a large amount of data is loaded. #10370 #10380
When the same Thrift service IP address is configured for multiple catalogs, deleting one catalog invalidates the incremental metadata updates in the other catalogs. #10511
The statistics of memory consumption from BEs are inaccurate. #9837
StarRocks throws errors for queries on Primary Key tables. #10811
Queries on logical views are not allowed even when you have SELECT permissions on these views. #10563
StarRocks does not impose limits on the naming of logical views. Now logical views need to follow the same naming conventions as tables. #10558
2.3.1
Release date: August 22, 2022
Improvements
- Broker Load supports transforming the List type in Parquet files into non-nested ARRAY data type. #9150
- Optimized the performance of JSON-related functions (json_query, get_json_string, and get_json_int). #9623
- Optimized the error message: During a query on Hive, Iceberg, or Hudi, if the data type of the column to query is not supported by StarRocks, the system throws an exception on the column. #10139
- Reduced the scheduling latency of resource groups to optimize resource isolation performance. #10122
Bug Fixes
The following bugs are fixed:
- Wrong result is returned from the query on Elasticsearch external tables due to incorrect pushdown of the
limit
operator. #9952 - Query on Oracle external tables fails when the
limit
operator is used. #9542 - BE is blocked when all Kafka Brokers are stopped during a Routine Load. #9935
- BE crashes during a query on a Parquet file whose data type mismatches that of the corresponding external table. #10107
- Query times out because the scan range of external tables is empty. #10091
- The system throws an exception when the ORDER BY clause is included in a sub-query. #10180
- Hive Metastore hangs when Hive metadata is reloaded asynchronously. #10132
2.3.0
New Features
The Primary Key model supports complete DELETE WHERE syntax. For more information, see DELETE.
The Primary Key model supports persistent primary key indexes. You can choose to persist the primary key index on disk rather than in memory, significantly reducing memory usage. For more information, see Primary Key model.
Global dictionary can be updated during real-time data ingestion,optimizing query performance and delivering 2X query performance for string data.
The CREATE TABLE AS SELECT statement can be executed asynchronously. For more information, see CREATE TABLE AS SELECT.
Support the following resource group-related features:
- Monitor resource groups: You can view the resource group of the query in the audit log and obtain the metrics of the resource group by calling APIs. For more information, see Monitor and Alerting.
- Limit the consumption of large queries on CPU, memory, and I/O resources: You can route queries to specific resource groups based on the classifiers or by configuring session variables. For more information, see Resource group.
JDBC external tables can be used to conveniently query data in Oracle, PostgreSQL, MySQL, SQLServer, ClickHouse, and other databases. StarRocks also supports predicate pushdown, improving query performance. For more information, see External table for a JDBC-compatible database.
[Preview] A new Data Source Connector framework is released to support external catalogs. You can use external catalogs to directly access and query Hive data without creating external tables. For more information, see Use catalogs to manage internal and external data.
Added the following functions:
Improvements
The compaction mechanism can merge large volume of metadata more quickly. This prevents metadata squeezing and excessive disk usage that can occur shortly after frequent data updates.
Optimized the performance of loading Parquet files and compressed files.
Optimized the mechanism of creating materialized views. After the optimization, materialized views can be created at a speed up to 10 times faster than before.
Optimized the performance of the following operators:
- TopN and sort operators
- Equivalence comparison operators that contain functions can use Zone Map indexes when these operators are pushed down to scan operators.
Optimized Apache Hive™ external tables.
- When Apache Hive™ tables are stored in Parquet, ORC, or CSV format, schema changes caused by ADD COLUMN or REPLACE COLUMN on Hive can be synchronized to StarRocks when you execute the REFRESH statement on the corresponding Hive external table. For more information, see Hive external table.
hive.metastore.uris
can be modified for Hive resources. For more information, see ALTER RESOURCE.
Optimized the performance of Apache Iceberg external tables. A custom catalog can be used to create an Iceberg resource. For more information, see Apache Iceberg external table.
Optimized the performance of Elasticsearch external tables. Sniffing the addresses of the data nodes in an Elasticsearch cluster can be disabled. For more information, see Elasticsearch external table.
When the sum() function accepts a numeric string, it implicitly converts the numeric string.
The year(), month(), and day() functions support the DATE data type.
Bug Fixes
Fixed the following bugs:
CPU utilization surges due to an excessive number of tablets.
Issues that cause "fail to prepare tablet reader" to occur.
The CTAS statement cannot be run successfully when the statement includes a JSON function. #6498
Others
StarGo, a cluster management tool, can deploy, start, upgrade, and roll back clusters and manage multiple clusters. For more information, see Deploy StarRocks with StarGo.
The pipeline engine is enabled by default when you upgrade StarRocks to version 2.3 or deploy StarRocks. The pipeline engine can improve the performance of simple queries in high concurrency scenarios and complex queries. If you detect significant performance regressions when using StarRocks 2.3, you can disable the pipeline engine by executing the
SET GLOBAL
statement to setenable_pipeline_engine
tofalse
.The SHOW GRANTS statement is compatible with the MySQL syntax and displays the privileges assigned to a user in the form of GRANT statements.
It is recommended that the memory_limitation_per_thread_for_schema_change ( BE configuration item) use the default value 2 GB, and data is written to disk when data volume exceeds this limit. Therefore, if you have previously set this parameter to a larger value, it is recommended that you set it to 2 GB, otherwise a schema change task may take up a large amount of memory.