- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Spark™
- Load data using INSERT
- Realtime synchronization from MySQL
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Using StarRocks
- Data Recovery
- User Privilege and Authentication
- Performance Tuning
- SQL Reference
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- DROP FILE
- INSTALL PLUGIN
- SHOW BACKENDS
- SHOW BROKER
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW TABLE STATUS
- SHOW FILE
- UNINSTALL PLUGIN
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- GROUP BY
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SHOW ALTER
- SHOW BACKUP
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Data Types
- Auxiliary Commands
- Function Reference
- Date Functions
- Aggregate Functions
- Geographic Functions
- String Functions
- JSON Functions
- Bitmap Functions
- Array Functions
- Cast Functions
- Hash Functions
- Cryptographic Functions
- Math Functions
- Utility Functions
- System variables
- Error code
- System limits
- SQL Reference
- Duplicate model: The data model stores detailed data in StarRocks. The model can be specified during table build, and the data in the metric columns will not be aggregated.
- Base table: The table created by
CREATE TABLEin StarRocks.
- Materialized Views table: Pre-calculated data set that contains results of a query. Abbreviated as MV.
In a practical business scenario, there are usually two coexisting use cases: 1) aggregation analysis of fixed dimensions and 2) analysis of arbitrary dimensions of the original detailed data.
For example, consider an E-commerce business, the order data contains the following dimensions:
The two mentioned use cases are both valid since the users need to:
- get the sales number of a certain item on a certain day, so we need to aggregate price on the
- analyze the transaction details of a certain item by a certain person on a certain day.
In the existing StarRocks data model, if you create only one table with aggregation model, for example a table with
sum(price), you are not able to analyze detailed data since the aggregation drops some information of the data. If only a duplicate model is built, you can run queries on all the dimension, but the queries will not be accelerated because of lack of Rollup support. If you build a table with the aggregation model and a table with the duplicate model at the same time, you can get both the performance and can run queries on all the dimensions, but the two tables are not related to each other, so you need to choose the analysis table manually. It doesn’t offer good flexibility and usability.
Queries that use aggregation functions such as
count can be executed more efficiently in tables that already contain aggregated data. You would want the improved efficiency when querying large amounts of data. The data in the table is materialized in a storage node and can be kept consistent with the base table in incremental updates. After a user creates a MVs table, the query optimizer will select and query the most efficient MVs table instead of using the base table. Since the data in MVs tables is typically much smaller than the base table, queries are more efficient.
CREATE MATERIALIZED VIEW materialized_view_name AS SELECT id, SUM(clicks) AS sum_clicks FROM database_name.base_table GROUP BY id ORDER BY id’
The creation of a materialized view is currently an asynchronous operation. The command for materialized view creation returns immediately, but the creation may still be running. You can use the
DESC "base_table_name" ALL command to see the current materialized view of the base table. You can use the
SHOW ALTER TABLE MATERIALIZED VIEW FROM "database_name" command to check the status of the current and the historical materialized views.
partition columnin the base table must be one of the
group bycolumns of the created materialized view
- Currently, only single-table materialized views are supported. No multi-table joins.
- Aggregation is not supported for key columns, but only for value columns.The type of aggregation operator cannot be changed.
- The materialized view must contain at least one key column.
- Expression calculation is not supported.
- Queries a specified materialized view is not supported.
DROP MATERIALIZED VIEW [IF EXISTS] [db_name]. <mv_name>
- View all materialized views under this database
SHOW MATERIALIZED VIEW [IN|FROM db_name]
- View the table structure of the specified materialized view
DESC table_name all
DESC/DESCRIBE mv_name is no longer supported)
- View the processing progress of a materialized view
SHOW ALTER MATERIALIZED VIEW FROM db_name
- Cancel the materialized view being created
CANCEL ALTER MATERIALIZED VIEW FROM db_name.table_name
- Confirm which materialized views were hit by the query
Users do not need to know the existence of the materialized view when running the queries, which means the name of the materialized view does not have to be explicitly specified. The query optimizer can automatically determine whether the query can be routed to the appropriate materialized view. Whether the query plan is rewritten or not, it can be seen with
explain SQL, which can be executed on the MySQL client:
Explain SQL: 0:OlapScanNode TABLE: lineorder4 PREAGGREGATION: ON partitions=1/1 RollUptable: lineorder4 tabletRatio=1/1 tabletList=15184 cardinality=119994608 avgRowSize=26.375498 numNodes=1 tuple ids: 0
RollUptable field indicates which materialized view was hit. If the
PREAGGREGATION field is
On, the query will be faster. If the
PREAGGREGATION field is
Off, the reason will be shown. For example:
PREAGGREGATION: OFF. Reason: The aggregate operator does not match.
This indicates the physical view cannot be used in the StarRocks storage engine because the aggregation function of the query does not match the one defined in the physical view, and requires on-site aggregation.
Incremental imports to the base table are applied to all associated MVs tables. Data cannot be queried until the import to the base table and all MVs tables are all complete. StarRocks ensures that the data is consistent between the base and MVs tables. There is no data difference between querying the base table and the MVs table.
The materialized view must be an aggregation of a single table. Only the following aggregation functions are supported currently.
- Performs HLL aggregation on duplicate data and uses HLL functions to analyze querieddata. (This is mainly used for quick and non-exact de-duplication calculations. To use
HLL_UNION aggregationfor duplicate data, you need to call the
hll_hashfunction first to transform the original data.)
create materialized view dt_uv as select dt, page_id, HLL_UNION(hll_hash(user_id)) from user_view group by dt, page_id; select ndv(user_id) from user_view; #The query can hit the materialized view
HLL_UNIONaggregation operator is not supported for
PERCENTILEtype columns currently.
Users can perform BITMAP aggregation on duplicate data and use the BITMAP function to analyze the data. It is mainly used to quickly calculate the exact de-duplication of
count(distinct). To use
BITMAP_UNIONaggregation on the duplicate data, you need to call the
to_bitmap functionfirst to convert the original data.
create materialized view dt_uv as select dt, page_id, bitmap_union(to_bitmap(user_id)) from user_view group by dt, page_id; select count(distinct user_id) from user_view; #Query can hit the materialized view
- Currently, only
BITINTtypes are supported, and the stored content must be a positive integer (including 0).
There is no need to explicitly specify a MV’s name during the query, StarRocks can intelligently route to the best MV based on the query SQL. The rules for MV selection are as follows.
- Select the MV that contains all query columns
- Select the most matching MV by the column defined in the query’s sorting and filtering condition.
- Select the most matching MV by the column defined in the query’s joining condition.
- Select the MV with the smallest number of rows
- Select the MV with the smallest number of columns
- The RollUp table model must accommodate the type of the base table (Use aggregate model for aggregate tables and duplicate model for duplicate tables.).
Deleteoperation is not allowed if a key in the
wherecondition does not exist in a RollUp table. In this case, users can delete the materialized view first, then perform the
deleteoperation, and finally re-add the materialized view.
- If the materialized view contains columns of the
REPLACEaggregation, it must contain all key columns.