Skip to main content
Version: Latest-3.2

Duplicate Key table

The Duplicate Key table is the default model in StarRocks. If you did not specify a model when you create a table, a Duplicate Key table is created by default.

When you create a Duplicate Key table, you can define a sort key for that table. If the filter conditions contain the sort key columns, StarRocks can quickly filter data from the table to accelerate queries. The Duplicate Key table allows you to append new data to the table. However, it does not allow you to modify existing data in the table.

Scenarios

The Duplicate Key table is suitable for the following scenarios:

  • Analyze raw data, such as raw logs and raw operation records.
  • Query data by using a variety of methods without being limited by the pre-aggregation method.
  • Load log data or time-series data. New data is written in append-only mode, and existing data is not updated.

Create a table

Suppose that you want to analyze the event data over a specific time range. In this example, create a table named detail and define event_time and event_type as sort key columns.

Statement for creating the table:

CREATE TABLE IF NOT EXISTS detail (
event_time DATETIME NOT NULL COMMENT "datetime of event",
event_type INT NOT NULL COMMENT "type of event",
user_id INT COMMENT "id of user",
device_code INT COMMENT "device code",
channel INT COMMENT ""
)
DUPLICATE KEY(event_time, event_type)
DISTRIBUTED BY HASH(user_id);

NOTICE

  • When you create a table, you must specify the bucketing column by using the DISTRIBUTED BY HASH clause. For detailed information, see bucketing.
  • Since v2.5.7, StarRocks can automatically set the number of buckets (BUCKETS) when you create a table or add a partition. You no longer need to manually set the number of buckets. For detailed information, see set the number of buckets.

Usage notes

  • Take note of the following points about the sort key of a table:

    • You can use the DUPLICATE KEY keyword to explicitly define the columns that are used in the sort key.

      Note: By default, if you do not specify sort key columns, StarRocks uses the first three columns as sort key columns.

    • In the Duplicate Key table, the sort key can consist of some or all of the dimension columns.

  • You can create indexes such as BITMAP indexes and Bloomfilter indexes at table creation.

  • If two identical records are loaded, the Duplicate Key table retains them as two records, rather than one.

What to do next

After a table is created, you can use various data ingestion methods to load data into StarRocks. For information about the data ingestion methods that are supported by StarRocks, see Overview of data loading.

Note: When you load data into a table that uses the Duplicate Key table, you can only append data to the table. You cannot modify the existing data in the table.