- StarRocks
- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Sparkâ„¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Synchronize data from MySQL in real time
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Gather CBO statistics
- Materialized view
- Colocate Join
- Lateral Join
- Index
- Computing the Number of Distinct Values
- Administration
- Deployment
- Management
- Data Recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE EXTERNAL CATALOG
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Auxiliary Commands
- Data Types
- Keywords
- Function Reference
- Java UDFs
- Window functions
- Aggregate Functions
- Array Functions
- Bit Functions
- Bitmap Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Math Functions
- String Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- Error code
- System limits
- SQL Reference
- FAQ
- Deploy
- Data Migration
- SQL
- Other FAQs
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Materialized view
This topic describes how to create, use, and manage a materialized view.
Overview
A materialized view in StarRocks is a special physical table that holds pre-computed query results from a base table. On the one hand, when you perform complex queries on the base table, the relevant pre-computed results can be used directly in the query execution to avoid repeated calculations and improve query efficiency. On the other hand, you can build models based on your data warehouse through materialized views to provide a unified data specification to upper-layer applications, cover the underlying implementation, or protect the raw data security of the base table.
Basic concepts
Materialized view
You can understand materialized views from two perspectives: materialization and view. Materialization means storing and reusing the pre-computed results for query acceleration. A view is essentially a table built on the basis of other tables. It is usually used to build mathematical models.
Base table
Base table is the driving table of its materialized view.
Query rewrite
Query rewrite means that when executing a query on a base table with materialized views built on, the system automatically judges whether or not the pre-computed results can be reused in processing the query. If they can be reused, the system will load the data directly from the relevant materialized view to avoid the time- and resource-consuming computation or join operation.
Refresh
Refresh refers to the data synchronization of the materialized view when the data in the base tables changes. There are two generic refresh strategies: ON DEMAND refresh and ON COMMIT refresh. ON DEMAND refresh is triggered manually or regularly. ON COMMIT refresh is triggered each time when data in the base tables changes.
Scenario
Materialized views are useful in the following situations:
Query acceleration
Materialized views well serve the needs for accelerating predictable queries and queries that use same sub-query repetitively. Through materialized views, the system can directly use the pre-computed intermediate query result set to process such queries. It significantly reduces the load pressure caused by a large number of complex queries, and also greatly shortens the query processing time. StarRocks implements transparent acceleration based on materialized views, and ensures that, when querying the source table directly, the result must be based on the latest data.
Data warehouse modeling
Through materialized views, you can build new tables based on one or more base tables to achieve the following:
Structured SQL statements, unified semantics
You can offer a unified SQL statement structure and data format to the upper-layer application to avoid repetitive development and computation.
Simple interface
You can cover the underlying implementation and ensure the simplicity of the interface for the upper-layer application.
Data security
You can shield the raw data of the base table through the materialized view to ensure the security of the sensitive data.
Use cases
Accelerating queries with repetitive aggregate functions
Suppose that most queries in your data warehouse include the same sub-query with an aggregate function, and these queries have consumed a huge proportion of your computing resources. Based on this sub-query, you can create a materialized view, which will compute and store all results of the sub-query. After the materialized view is built, the system will rewrite all queries that contain the sub-query, load the intermediate result stored in the materialized view, and thus accelerate these queries.
Regular JOIN of multiple tables
Suppose that you need to regularly join multiple tables in your data warehouse to make a new wide table. You can build a materialized view for these tables, and set an async refresh mechanism that triggers the building regularly, so that you do not have to bother with doing it yourself. After the materialized view is built, query results are returned directly from the materialized view, and thus the latency caused by JOIN operations is avoided.
Data warehouse layering
Suppose that your data warehouse contains a mass of raw data, and queries in it require a complex set of ETL operations. You can build multiple layers of materialized views to stratify the data in your data warehouse, and thus decompose the query into a series of simple sub-queries. It will significantly reduce repetitive computation, and, more importantly, help the DBA identify the problem with ease and efficiency. Beyond that, data warehouse layering helps decouple the raw data and statistic data, protecting the security of the sensitive raw data.
Accelerate queries with a materialized view
If there are a considerable number of predictable queries or queries that use a same set of sub-query results repetitively, you can build materialized views to accelerate these queries.
Preparation
Before creating the materialized view, check if your data warehouse is eligible for query acceleration through materialized views. For example, check if the queries reuse certain subquery statements.
The following example is based on the table sales_records
, which contains the transaction ID record_id
, salesperson ID seller_id
, store ID store_id
, date sale_date
, and sales amount sale_amt
for each transaction. Follow these steps to create the table and insert data into it:
CREATE TABLE sales_records(
record_id INT,
seller_id INT,
store_id INT,
sale_date DATE,
sale_amt BIGINT
) DISTRIBUTED BY HASH(record_id);
INSERT INTO sales_records
VALUES
(001,01,1,"2022-03-13",8573),
(002,02,2,"2022-03-14",6948),
(003,01,1,"2022-03-14",4319),
(004,03,3,"2022-03-15",8734),
(005,03,3,"2022-03-16",4212),
(006,02,2,"2022-03-17",9515);
The business scenario of this example demands frequent analyses on the sales amounts of different stores. As a result, the sum function is used on each query, consuming a massive amount of computing resources. You can run the query to record its time, and view its query profile by using EXPLAIN command.
MySQL > SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
+----------+-----------------+
| store_id | sum(`sale_amt`) |
+----------+-----------------+
| 2 | 16463 |
| 3 | 12946 |
| 1 | 12892 |
+----------+-----------------+
3 rows in set (0.02 sec)
MySQL > EXPLAIN SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:3: store_id | 6: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 3: store_id |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| 3:AGGREGATE (merge finalize) |
| | output: sum(6: sum) |
| | group by: 3: store_id |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 3: store_id |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(5: sale_amt) |
| | group by: 3: store_id |
| | |
| 0:OlapScanNode |
| TABLE: sales_records |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: sales_records |
| tabletRatio=10/10 |
| tabletList=12049,12053,12057,12061,12065,12069,12073,12077,12081,12085 |
| cardinality=1 |
| avgRowSize=2.0 |
| numNodes=0 |
+-----------------------------------------------------------------------------+
45 rows in set (0.00 sec)
It can be observed that the query takes about 0.02 seconds, and no materialized view is used to accelerate the query because the output of rollup
section in the query profile is sales_records
, which is the base table.
Create a materialized view
You can create a materialized view based on a specific query statement by using the following SQL command.
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [database.]mv_name
AS (query);
For detailed instructions and parameter references, see SQL Reference - CREATE MATERIALIZED VIEW.
Based on the table sales_records
and the query statement mentioned above, the following example creates the materialized view store_amt
to analyze the sum of sales amount in each store.
CREATE MATERIALIZED VIEW store_amt AS
SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
Query with the materialized view
The materialized view you created contains the complete set of pre-computed results in accordance with the query statement. Subsequent queries will use the data within it. You can run the same query to test the query time as you did in the preparation.
MySQL > SELECT store_id, SUM(sale_amt)
FROM sales_records
GROUP BY store_id;
+----------+-----------------+
| store_id | sum(`sale_amt`) |
+----------+-----------------+
| 2 | 16463 |
| 3 | 12946 |
| 1 | 12892 |
+----------+-----------------+
3 rows in set (0.01 sec)
It can be observed that the query time is reduced to 0.01 seconds.
Check if a query hits the materialized view
Run EXPLAIN command again to check if the query hits the materialized view.
MySQL > EXPLAIN SELECT store_id, SUM(sale_amt) FROM sales_records GROUP BY store_id;
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:3: store_id | 6: sum |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: 3: store_id |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| 3:AGGREGATE (merge finalize) |
| | output: sum(6: sum) |
| | group by: 3: store_id |
| | |
| 2:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: 3: store_id |
| |
| 1:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(5: sale_amt) |
| | group by: 3: store_id |
| | |
| 0:OlapScanNode |
| TABLE: sales_records |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: store_amt |
| tabletRatio=10/10 |
| tabletList=12092,12096,12100,12104,12108,12112,12116,12120,12124,12128 |
| cardinality=6 |
| avgRowSize=2.0 |
| numNodes=0 |
+-----------------------------------------------------------------------------+
45 rows in set (0.00 sec)
As you can see, now the output of rollup
section in the query profile is store_amt
, which is the materialized view you have built. That means this query has hit the materialized view.
Check the building status of a materialized view
Creating a single-table materialized view is an asynchronous operation. Running CREATE MATERIALIZED VIEW command successfully indicates that the task of creating the materialized view is submitted successfully. You can view the building status of the materialized view in a database via SHOW ALTER MATERIALIZED VIEW command.
MySQL > SHOW ALTER MATERIALIZED VIEW\G
*************************** 1. row ***************************
JobId: 12090
TableName: sales_records
CreateTime: 2022-08-25 19:41:10
FinishedTime: 2022-08-25 19:41:39
BaseIndexName: sales_records
RollupIndexName: store_amt
RollupId: 12091
TransactionId: 10
State: FINISHED
Msg:
Progress: NULL
Timeout: 86400
1 row in set (0.00 sec)
The RollupIndexName
section indicates the name of the materialized view, and State
section indicates if the building is completed.
Check the schema of a materialized view
You can use DESC tbl_name ALL command to check the schema of a table and its subordinate materialized views.
MySQL > DESC sales_records ALL;
+---------------+---------------+-----------+--------+------+-------+---------+-------+
| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |
+---------------+---------------+-----------+--------+------+-------+---------+-------+
| sales_records | DUP_KEYS | record_id | INT | Yes | true | NULL | |
| | | seller_id | INT | Yes | true | NULL | |
| | | store_id | INT | Yes | true | NULL | |
| | | sale_date | DATE | Yes | false | NULL | NONE |
| | | sale_amt | BIGINT | Yes | false | NULL | NONE |
| | | | | | | | |
| store_amt | AGG_KEYS | store_id | INT | Yes | true | NULL | |
| | | sale_amt | BIGINT | Yes | false | NULL | SUM |
+---------------+---------------+-----------+--------+------+-------+---------+-------+
8 rows in set (0.00 sec)
Drop a materialized view
Under the following circumstances, you need to drop a materialized view:
You have created a wrong materialized view and you need to drop it before the building completed.
You have created too many materialized views, which results in a huge drop in load performance, and some of the materialized views are duplicate.
The frequency of the involved queries is low, and you can tolerate a relatively high query latency.
Drop an unfinished materialized view
You can drop a materialized view that is being created by canceling the in-progress creation task. First, you need to get the job ID JobID
of the materialized view creation task by checking the building status of the materialized view. After getting the job ID, you need to cancel the creation task with the CANCEL ALTER command.
CANCEL ALTER TABLE ROLLUP FROM sales_records (12090);
Drop an existing materialized view
You can drop an existing materialized view with the DROP MATERIALIZED VIEW command.
DROP MATERIALIZED VIEW store_amt;
Best practices
Precise de-duplication
The following example is based on an advertisement business analysis table advertiser_view_record
, which records the date that the ad is viewed click_time
, the name of the ad advertiser
, the channel of the ad channel
, and the ID of the user who viewed the ID user_id
.
CREATE TABLE advertiser_view_record(
click_time DATE,
advertiser VARCHAR(10),
channel VARCHAR(10),
user_id INT
) distributed BY hash(click_time);
Analysis is mainly focused on the UV of the ads.
SELECT advertiser, channel, count(distinct user_id)
FROM advertiser_view_record
GROUP BY advertiser, channel;
To accelerate the precise de-duplication query, you can create a materialized view based on this table and use the bitmap_union function to pre-aggregate the data.
CREATE MATERIALIZED VIEW advertiser_uv AS
SELECT advertiser, channel, bitmap_union(to_bitmap(user_id))
FROM advertiser_view_record
GROUP BY advertiser, channel;
After the materialized view is created, the sub-query count(distinct user_id)
in the subsequent queries will be automatically rewritten as bitmap_union_count (to_bitmap(user_id))
so that they can hit the materialized view.
Approximate de-duplication
Use the table advertiser_view_record
above as an example again. To accelerate approximate count distinct, you can create a materialized view based on this table and use the hll_union() function to pre-aggregate the data.
CREATE MATERIALIZED VIEW advertiser_uv2 AS
SELECT advertiser, channel, hll_union(hll_hash(user_id))
FROM advertiser_view_record
GROUP BY advertiser, channel;
Set extra sort keys
Suppose that the base table tableA
contains columns k1
, k2
and k3
, where only k1
and k2
are sort keys. If the query including the sub-query where k3=x
must be accelerated, you can create a materialized view with k3
as the first column.
CREATE MATERIALIZED VIEW k3_as_key AS
SELECT k3, k2, k1
FROM tableA
Correspondence of aggregate functions
When a query is executed with a materialized view, the original query statement will be automatically rewritten and used to query the intermediate results stored in the materialized view. The following table shows the correspondence between the aggregate function in the original query and the aggregate function used to construct the materialized view. You can select the corresponding aggregate function to build a materialized view according to your business scenario.
aggregate function in the original query | aggregate function of the materialized view |
---|---|
sum | sum |
min | min |
max | max |
count | count |
bitmap_union, bitmap_union_count, count(distinct) | bitmap_union |
hll_raw_agg, hll_union_agg, ndv, approx_count_distinct | hll_union |
Caution
Sync materialized views only support aggregate functions on a single column. Query statements in the form of
sum(a+b)
are not supported.Clauses such as JOIN, and WHERE are not supported in the sync materialized view creation statements.
The current version of StarRocks does not support creating multiple materialized views at the same time. A new materialized view can only be created when the one before is completed.
A materialized view supports only one aggregate function for each column of the base table. Query statements such as
select sum(a), min(a) from table
are not supported.When using ALTER TABLE DROP COLUMN to drop a specific column in a base table, you need to ensure that all materialized views of the base table contain the dropped column, otherwise the drop operation cannot be performed. If you have to drop the column, you need to first drop all materialized views that do not contain the column, and then drop the column.
Creating too many materialized views for a table will affect the data load efficiency. When data is being loaded to the base table, the data in materialized view and base table will be updated synchronously. If a base table contains
n
materialized views, the efficiency of loading data into the base table is about the same as the efficiency of loading data inton
tables.You must use the GROUP BY clause when using aggregate functions and specify the GROUP BY column in your SELECT list.
Model data warehouse with materialized view
CAUTION
StarRocks prior to 2.4 does not support the functions demonstrated below.
StarRocks 2.4 supports creating asynchronous materialized views for multiple base tables to allow modeling data warehouse. Asynchronous materialized views support all Data Models.
As for the current version, multi-table materialized views support two refresh strategies:
Async refresh
Async refresh strategy allows materialized views to refresh through asynchronous refresh tasks, and does not guarantee strict consistency between the base table and its subordinate materialized views.
Manual refresh
You can manually trigger a refresh task for an async materialized view. It does not guarantee strict consistency between the base table and its subordinate materialized views.
Preparation
Enable async materialized view
To use the async materialized view feature, you need to set the configuration item enable_experimental_mv
as true
using the following statement:
ADMIN SET FRONTEND CONFIG ("enable_experimental_mv"="true");
Create base tables
The following examples involve two base tables:
Table
goods
records the item IDitem_id1
, item nameitem_name
, and item priceprice
.Table
order_list
records the order IDorder_id
, client IDclient_id
, item IDitem_id2
, and order dateorder_date
.
Column item_id1
is equivalent to column item_id2
.
Follow these steps to create the tables and insert data into them:
CREATE TABLE goods(
item_id1 INT,
item_name STRING,
price FLOAT
) DISTRIBUTED BY HASH(item_id1);
INSERT INTO goods
VALUES
(1001,"apple",6.5),
(1002,"pear",8.0),
(1003,"potato",2.2);
CREATE TABLE order_list(
order_id INT,
client_id INT,
item_id2 INT,
order_date DATE
) DISTRIBUTED BY HASH(order_id);
INSERT INTO order_list
VALUES
(10001,101,1001,"2022-03-13"),
(10001,101,1002,"2022-03-13"),
(10002,103,1002,"2022-03-13"),
(10002,103,1003,"2022-03-14"),
(10003,102,1003,"2022-03-14"),
(10003,102,1001,"2022-03-14");
The business scenario of this example demands frequent analyses on the total of each order. Because each query requires JOIN operation on the two base tables, two base tables should be joined a wide table. Besides, the business scenario demands the data refresh at an interval of one day.
The query statement is as follows:
SELECT
order_id,
sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;
Create a materialized view
You can create a materialized view based on a specific query statement by using the following SQL command.
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [database.]mv_name
[distribution_desc]
[REFRESH refresh_scheme_desc]
[primary_expression]
[COMMENT ""]
[PROPERTIES ("key"="value", ...)]
AS (query);
For detailed instructions and parameter references, see SQL Reference - CREATE MATERIALIZED VIEW.
Based on the table goods
, order_list
and the query statement mentioned above, the following example creates the materialized view order_mv
to analyze the total of each order. The materialized view is set to refresh asynchronously at an interval of one day.
CREATE MATERIALIZED VIEW order_mv
DISTRIBUTED BY HASH(`order_id`) BUCKETS 12
REFRESH ASYNC START('2022-09-01 10:00:00') EVERY (interval 1 day)
AS SELECT
order_list.order_id,
sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;
Query with the materialized view
The materialized view you created contains the complete set of pre-computed results in accordance with the query statement. You can directly query an async materialized view.
MySQL > SELECT * FROM order_mv;
+----------+--------------------+
| order_id | total |
+----------+--------------------+
| 10001 | 14.5 |
| 10002 | 10.200000047683716 |
| 10003 | 8.700000047683716 |
+----------+--------------------+
3 rows in set (0.01 sec)
Rename a materialized view
You can rename a materialized view via ALTER MATERIALIZED VIEW command.
ALTER MATERIALIZED VIEW order_mv RENAME order_total;
Alter the refresh strategy of a materialized view
You can also alter the refresh strategy of a materialized view via ALTER MATERIALIZED VIEW command.
ALTER MATERIALIZED VIEW order_mv REFRESH ASYNC EVERY(INTERVAL 2 DAY);
Check materialized views
You can check materialized views in your database in the following ways:
- Check all materialized views in your database.
SHOW MATERIALIZED VIEW;
- Check a specific materialized view.
SHOW MATERIALIZED VIEW WHERE NAME = order_mv;
- Check specific materialized views by matching the name.
SHOW MATERIALIZED VIEW WHERE NAME LIKE "order%";
- Check all materialized views via
information_schema
.
SELECT * FROM information_schema.materialized_views;
Check the definition of a materialized view
You can check the SQL statement used to create a materialized view via SHOW CREATE MATERIALIZED VIEW command.
SHOW CREATE MATERIALIZED VIEW order_mv;
Check the refresh tasks of single-table sync materialized views
You can check the refresh tasks of all single-table sync materialized views in the database.
SHOW ALTER MATERIALIZED VIEW;
Manually refresh an async materialized view
You can manually refresh an async materialized view via REFRESH MATERIALIZED VIEW command.
REFRESH MATERIALIZED VIEW order_mv;
CAUTION
You can refresh a materialized view with async or manual refresh strategy via this command. However, you cannot refresh a single-table sync refresh materialized view via this command.
You can cancel a refresh task by using the CANCEL REFRESH MATERIALIZED VIEW statement.
Check the execution status of a multi-table materialized view
You can check the execution status of a multi-table materialized view via the following ways.
SELECT * FROM INFORMATION_SCHEMA.tasks;
SELECT * FROM INFORMATION_SCHEMA.task_runs;
NOTE
Async refresh materialized views rely on the Task framework to refresh data, so you can check refresh tasks by querying the
tasks
andtask_runs
metadata tables provided by the Task framework.
Drop a materialized view
You can drop a materialized view via DROP MATERIALIZED VIEW command.
DROP MATERIALIZED VIEW order_mv;
Caution
Async refresh materialized views have the following features:
- You can directly query a async refresh materialized view, but the result may be inconsistent with that from the base tables.
- You can set different partitioning and bucketing strategies for a async refresh materialized view from that of the base tables.
- Async refresh materialized views support dynamic partitioning strategy in a longer span. For example, if the base table is partitioned at an interval of one day, you can set the materialized view to be partitioned at an interval of one month.
You can build a multi-table materialized view under async or manual refresh strategies.
Partition keys and bucket keys of the async or manual refresh materialized view must be in the query statement.
The query statement does not support random functions, including rand((), random(), uuid()), and sleep().
- Materialized view
- Overview
- Accelerate queries with a materialized view
- Model data warehouse with materialized view
- Preparation
- Create a materialized view
- Query with the materialized view
- Rename a materialized view
- Alter the refresh strategy of a materialized view
- Check materialized views
- Check the definition of a materialized view
- Check the refresh tasks of single-table sync materialized views
- Manually refresh an async materialized view
- Check the execution status of a multi-table materialized view
- Drop a materialized view
- Caution