Skip to main content
Version: 3.0

Automatic partitioning

This topic describes how to create a table that supports automatic partitioning. This topic also describes the usage notes and limits of automatic partitioning.

Introduction

To make partitions creation more easy to use and flexible, StarRocks supports the partitioning expression and automatic partitioning since version 3.0. You only need to specify a partition column of the DATE or DATETIME data type and a partition granularity (year, month, day, or hour) in the partition expression, which includes a time function. With this implicit partitioning method implemented by using a expression, you do not need to create a large number of partitions in advance. StarRocks automatically creates partitions when new data is written. It is recommended that you prioritize using automatic partitioning.

Enable automatic partitioning

Syntax

The PARTITION BY clause contains a function expression that specifies the partition granularity and partition column for automatic partitioning.

PARTITION BY date_trunc(<time_unit>,<partition_column_name>)

Or

PARTITION BY time_slice(<partition_column_name>,INTERVAL N <time_unit>[, boundary]))

Parameters

  • Functions: Currently, only the date_trunc and time_slice functions are supported. If you use the function time_slice, you do not need to pass the boundary parameter. It is because in this scenario, the default and valid value for this parameter is floor, and the value can not be ceil.
  • time_unit: the partition granularity, which can be hour, day, month or year. The week partition granularity is not supported. If the partition granularity is hour, the partition column must be of the DATETIME data type and cannot be of the DATE data type.
  • partition_column_name: the name of the partition column. The partition type is RANGE, and therefore the partition column can only be of the DATE or DATETIME data type. Currently, you can specify only one partition column and multiple partition columns are not supported.
    • If the date_trunc function is used, the partition column can be of the DATE or DATETIME data type. If the time_slice function is used, the partition column must be of the DATETIME data type.
    • The partition column allows NULL values. If the partition column is of the DATE data type, the supported range is [0000-01-01 ~ 9999-12-31]. If the partition column is of the DATETIME data type, the supported range is [0000-01-01 01:01:01 ~ 9999-12-31 23:59:59].

Examples

Example 1: Use the date_trunc function to create a table that supports automatic partitioning. Set the partition granularity to day and the partition column to event_day.

CREATE TABLE site_access (
event_day DATETIME NOT NULL,
site_id INT DEFAULT '10',
city_code VARCHAR(100),
user_name VARCHAR(32) DEFAULT '',
pv BIGINT DEFAULT '0'
)
DUPLICATE KEY(event_day, site_id, city_code, user_name)
PARTITION BY date_trunc('day', event_day)
DISTRIBUTED BY HASH(event_day, site_id);
note

If a StarRocks cluster in a staging environment contains only one BE, the number of replicas can be set to 1 in the PROPERTIES clause, such as PROPERTIES( "replication_num" = "1" ). The default number of replicas is 3, which is also the number recommended for production StarRocks clusters. If you want to use the default number, you do not need to configure the replication_num parameter.

When the following two data rows are inserted, StarRocks automatically creates two partitions, p20230226 and p20230227, whose ranges are [2023-02-26 00:00:00, 2023-02-27 00:00:00) and [2023-02-27 00:00:00, 2023-02-28 00:00:00) respectively.

-- insert two data rows
INSERT INTO site_access VALUES
("2023-02-26 20:12:04",002,"New York","Sam Smith",1),
("2023-02-27 21:06:54",001,"Los Angeles","Taylor Swift",1);

-- view partitions
SHOW PARTITIONS FROM site_access\G
*************************** 1. row ***************************
PartitionId: 135846228
PartitionName: p20230226
VisibleVersion: 2
VisibleVersionTime: 2023-03-22 14:50:17
VisibleVersionHash: 0
State: NORMAL
PartitionKey: event_day
Range: [types: [DATETIME]; keys: [2023-02-26 00:00:00]; ..types: [DATETIME]; keys: [2023-02-27 00:00:00]; )
DistributionKey: event_day, site_id
Buckets: 6
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
LastConsistencyCheckTime: NULL
DataSize: 0B
IsInMemory: false
RowCount: 0
*************************** 2. row ***************************
PartitionId: 135846215
PartitionName: p20230227
VisibleVersion: 2
VisibleVersionTime: 2023-03-22 14:50:17
VisibleVersionHash: 0
State: NORMAL
PartitionKey: event_day
Range: [types: [DATETIME]; keys: [2023-02-27 00:00:00]; ..types: [DATETIME]; keys: [2023-02-28 00:00:00]; )
DistributionKey: event_day, site_id
Buckets: 6
ReplicationNum: 3
StorageMedium: HDD
CooldownTime: 9999-12-31 23:59:59
LastConsistencyCheckTime: NULL
DataSize: 0B
IsInMemory: false
RowCount: 0
2 rows in set (0.00 sec)

Example 3: Use the time_slice function to create a table that supports automatic partitioning. Set the partition granularity to 7 days and the partition column to event_day.

CREATE TABLE site_access(
event_day DATETIME NOT NULL,
site_id INT DEFAULT '10',
city_code VARCHAR(100),
user_name VARCHAR(32) DEFAULT '',
pv BIGINT DEFAULT '0'
)
DUPLICATE KEY(event_day, site_id, city_code, user_name)
PARTITION BY time_slice(event_day, INTERVAL 7 day)
DISTRIBUTED BY HASH(event_day, site_id);
note

If a StarRocks cluster in a staging environment contains only one BE, the number of replicas can be set to 1 in the PROPERTIES clause, such as PROPERTIES( "replication_num" = "1" ). The default number of replicas is 3, which is also the number recommended for production StarRocks clusters. If you want to use the default number, you do not need to configure the replication_num parameter.

Usage notes

  • StarRocks automatically creates partitions and sets the start time and end time of the partitions based on the loaded data and the automatic partitioning rule configured at table creation. For example, if the value of the partition column for the data row is 2015-06-05, and the partition granularity is month, then a partition named p201506 is created with a range of [2015-06-01, 2015-07-01) rather than [2015-06-05, 2015-07-05).
  • For tables that support automatic partitioning, StarRocks sets the default maximum number of automatically created partitions to 4096, which can be configured by the FE parameter max_automatic_partition_number. This parameter can prevent you from accidentally creating too many partitions, such as when specifying a partition column of DATETIME type with a too-fine partition granularity like hour, which can generate a large number of partitions.
  • During data loading, StarRocks automatically creates some partitions based on the loaded data, but if the load job fails for some reason, the partitions that are automatically created by StarRocks cannot be automatically deleted.
  • Note that the PARTITION BY clause is only used to calculate the partition range for the loaded data and does not change the values of the data. For example, if the original data is 2023-02-27 21:06:54, the function expression in PARTITION BY date_trunc('day', event_day) computes it as 2023-02-27 00:00:00 and infers that it belongs to the partition range [2023-02-27 00:00:00, 2023-02-28 00:00:00), but the data is still written as 2023-02-27 21:06:54. If you want the written data's value to be the same as the start time of the partition range, you need to use the function specified in the PARTITION BY clause, such as date_trunc, on the event_day column when creating a load job.
  • The naming rules for automatic partitioning are consistent with the naming rules for dynamic partitioning.

Limits

  • Only the range partitioning type is supported, whereas the list partitioning type is not supported.
  • When a table that supports automatic partitioning is created, it is generally not recommended to create partitions in advance. If you need to create partitions in advance, you can create multiple partitions all at a time, as shown in the preceding Example 2. The statement in Example 2 has the following limits:
    • The granularity of the partitions created in advance must be consistent with that of the automatically created partitions.
    • When you configure automatic partitioning, you can only use the function date_trunc rather than time_slice.
    • The syntax for creating multiple partitions all at a time only supports an interval of 1.
    • After a table that supports automatic partitioning is created, you can use ALTER TABLE ADD PARTITION to add partitions for that table. And the statement ALTER TABLE ADD PARTITION also has the above limits.
  • Currently, StarRocks's shared-data mode does not support this feature.
  • Currently, using CTAS to create a table that supports automatic partitioning is not supported.
  • Currently, using Spark Load to load data to tables that support automatic partitioning is not supported.
  • If you use automatic partitioning, you can only roll back your StarRocks cluster to version 2.5.4 or later.
  • To view the specific information of automatically created partitions, use the SHOW PARTITIONS FROM statement, rather than the SHOW CREATE TABLE statement.