- Introduction
- Quick Start
- Table Design
- Data Loading
- Data Export
- Using StarRocks
- Administration
- FAQs
- Deploy
- Data Migration
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Materialized View
Termnology
- Duplicate model: The data model stores detailed data in StarRocks. The model can be specified during table build, and the data in the metric columns will not be aggregated.
- Base table: The table created by
CREATE TABLE
in StarRocks. - Materialized Views table: Pre-calculated data set that contains results of a query. Abbreviated as MV.
Usage Scenarios
In a practical business scenario, there are usually two coexisting use cases: 1) aggregation analysis of fixed dimensions and 2) analysis of arbitrary dimensions of the original detailed data.
For example, consider an E-commerce business, the order data contains the following dimensions:
item_id
sold_time
customer_id
price
The two mentioned use cases are both valid since the users need to:
- get the sales number of a certain item on a certain day, so we need to aggregate price on the
item_id
andsold_time
dimensions; - analyze the transaction details of a certain item by a certain person on a certain day.
In the existing StarRocks data model, if you create only one table with aggregation model, for example a table with item_id
, sold_time
, customer_id
, sum(price)
, you are not able to analyze detailed data since the aggregation drops some information of the data. If only a duplicate model is built, you can run queries on all the dimension, but the queries will not be accelerated because of lack of Rollup support. If you build a table with the aggregation model and a table with the duplicate model at the same time, you can get both the performance and can run queries on all the dimensions, but the two tables are not related to each other, so you need to choose the analysis table manually. It doesn’t offer good flexibility and usability.
How to use
Queries that use aggregation functions such as sum
and count
can be executed more efficiently in tables that already contain aggregated data. You would want the improved efficiency when querying large amounts of data. The data in the table is materialized in a storage node and can be kept consistent with the base table in incremental updates. After a user creates a MVs table, the query optimizer will select and query the most efficient MVs table instead of using the base table. Since the data in MVs tables is typically much smaller than the base table, queries are more efficient.
Build a materialized view:
CREATE MATERIALIZED VIEW materialized_view_name
AS SELECT id, SUM(clicks) AS sum_clicks
FROM database_name.base_table
GROUP BY id ORDER BY id’
The creation of a materialized view is currently an asynchronous operation. The command for materialized view creation returns immediately, but the creation may still be running. You can use the DESC "base_table_name" ALL
command to see the current materialized view of the base table. You can use the SHOW ALTER TABLE MATERIALIZED VIEW FROM "database_name"
command to check the status of the current and the historical materialized views.
Restrictions:
- The
partition column
in the base table must be one of thegroup by
columns of the created materialized view - Currently, only single-table materialized views are supported. No multi-table joins.
- Aggregation is not supported for key columns, but only for value columns.The type of aggregation operator cannot be changed.
- The materialized view must contain at least one key column.
- Expression calculation is not supported.
- Queries a specified materialized view is not supported.
- The
DROP MATERIALIZED VIEW
DROP MATERIALIZED VIEW [IF EXISTS] [db_name]. <mv_name>
View materialized views:
- View all materialized views under this database
SHOW MATERIALIZED VIEW [IN|FROM db_name]
- View the table structure of the specified materialized view
DESC table_name all
( DESC/DESCRIBUE mv_name
is no longer supported)
- View the processing progress of a materialized view
SHOW ALTER MATERIALIZED VIEW FROM db_name
- Cancel the materialized view being created
CANCEL ALTER MATERIALIZED VIEW FROM db_name.table_name
- Confirm which materialized views were hit by the query
Users do not need to know the existence of the materialized view when running the queries, which means the name of the materialized view does not have to be explicitly specified. The query optimizer can automatically determine whether the query can be routed to the appropriate materialized view. Whether the query plan is rewritten or not, it can be seen with explain SQL
, which can be executed on the MySQL client:
Explain SQL:
0:OlapScanNode
TABLE: lineorder4
PREAGGREGATION: ON
partitions=1/1
RollUptable: lineorder4
tabletRatio=1/1
tabletList=15184
cardinality=119994608
avgRowSize=26.375498
numNodes=1
tuple ids: 0
The RollUptable
field indicates which materialized view was hit. If the PREAGGREGATION
field is On
, the query will be faster. If the PREAGGREGATION
field is Off
, the reason will be shown. For example:
PREAGGREGATION: OFF. Reason: The aggregate operator does not match.
This indicates the physical view cannot be used in the StarRocks storage engine because the aggregation function of the query does not match the one defined in the physical view, and requires on-site aggregation.
Importing Data
Incremental imports to the base table are applied to all associated MVs tables. Data cannot be queried until the import to the base table and all MVs tables are all complete. StarRocks ensures that the data is consistent between the base and MVs tables. There is no data difference between querying the base table and the MVs table.
Notes
Materialized view function support
The materialized view must be an aggregation of a single table. Only the following aggregation functions are supported currently.
- COUNT
- MAX
- MIN
- SUM
- PERCENTILE_APPROX
- HLL_UNION
- Performs HLL aggregation on duplicate data and uses HLL functions to analyze querieddata. (This is mainly used for quick and non-exact de-duplication calculations. To use
HLL_UNION aggregation
for duplicate data, you need to call thehll_hash
function first to transform the original data.)
create materialized view dt_uv as
select dt, page_id, HLL_UNION(hll_hash(user_id))
from user_view
group by dt, page_id;
select ndv(user_id) from user_view; #The query can hit the materialized view
HLL_UNION
aggregation operator is not supported forDECIMAL
/BITMAP
/HLL
/PERCENTILE
type columns currently.BITMAP_UNION
Users can perform BITMAP aggregation on duplicate data and use the BITMAP function to analyze the data. It is mainly used to quickly calculate the exact de-duplication of
count(distinct)
. To useBITMAP_UNION
aggregation on the duplicate data, you need to call theto_bitmap function
first to convert the original data.
create materialized view dt_uv as
select dt, page_id, bitmap_union(to_bitmap(user_id))
from user_view
group by dt, page_id;
select count(distinct user_id) from user_view; #Query can hit the materialized view
- Currently, only
TINYINT
/SMALLINT
/INT
/BITINT
types are supported, and the stored content must be a positive integer (including 0).
Intelligent routing of materialized views
There is no need to explicitly specify a MV’s name during the query, StarRocks can intelligently route to the best MV based on the query SQL. The rules for MV selection are as follows.
- Select the MV that contains all query columns
- Select the most matching MV by the column defined in the query’s sorting and filtering condition.
- Select the most matching MV by the column defined in the query’s joining condition.
- Select the MV with the smallest number of rows
- Select the MV with the smallest number of columns
Other restrictions
- The RollUp table model must accommodate the type of the base table (Use aggregate model for aggregate tables and duplicate model for duplicate tables.).
- The
Delete
operation is not allowed if a key in thewhere
condition does not exist in a RollUp table. In this case, users can delete the materialized view first, then perform thedelete
operation, and finally re-add the materialized view. - If the materialized view contains columns of the
REPLACE
aggregation, it must contain all key columns.