- Introduction
- Quick Start
- Table Design
- Data Loading
- Data Export
- Using StarRocks
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- DROP FILE
- INSTALL PLUGIN
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW TABLE STATUS
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- BACKUP
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- RECOVER
- RESTORE
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SELECT
- SHOW ALTER
- SHOW BACKUP
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Data Type
- Auxiliary Commands
- Function Reference
- Date Functions
- Geographic Functions
- String Functions
- Aggregation Functions
- Bitmap Functions
- Array Functions
- cast function
- hash function
- Crytographic Functions
- Math Functions
- Utility Functions
- System variables
- Error code
- System limits
- SQL Reference
- Administration
- FAQ
- Deployment
- Data Migration
- SQL
- Others FAQs
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
Insert Into Loading
The Insert Into statement is used similarly to the Insert Into statement in databases (e.g. MySQL). However, in StarRocks, all data writes are a separate import job, so Insert Into is also introduced as an import method.
Usage Scenarios
- Users can use the Insert Into command if they want the results to be returned simultaneously during the import.
- Users can use INSERT INTO VALUES to import test data to verify the functionality of the StarRocks system.
- Users can use INSERT INTO SELECT to ETL data that is already in the StarRocks table and import it into a new StarRocks table.
- Users can create an external table (e.g. a MySQL external table that maps a table in a MySQL system). The data from the external table can be imported into the StarRocks table for storage via the
INSERT INTO SELECT
syntax.
Syntax
INSERT INTO table_name
[ PARTITION (p1, ...) ]
[ WITH LABEL label]
[ (column [, ...]) ]
[ [ hint [, ...] ] ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
Parameters
- tablet_name: The destination table to which the data will be imported, for example
db_name.table_name
. - partitions: Specify the partitions to be imported. Partitions must exist in
table_name
. Multiple partition names are separated by commas. If you specify a target partition, only data that matches the target partition will be imported. If not specified, the default value is all partitions of the table. - label: Specify a Label for the insert job.The Label is the identifier. Each import job has a Label that is unique within a single database.
- Note: It is recommended to specify a Label instead of having it automatically assigned by the system. If a label is assigned automatically by the system, it may be impossible to know whether Insert Into is successful or not when disconnected happens during execution due to network errors.
- column_name: The specified target column, which must exist in
table_name
. The target columns of the imported table can exist in any order. If no destination column is specified, then the default value is all columns of the imported table. If a column in the target table is not in the imported column, then this column needs to have a default value, otherwise Insert Into will fail. If the type of the result column does not match the type of the target column, then an implicit type conversion is invoked. If the conversion is not possible, then the Insert Into statement reports a syntax parsing error. - expression: The expression assigned to a column.
- default: The default value of the corresponding column.
- query: A general query whose results are written to the target. The query statement supports any SQL query syntax supported by StarRocks.
- values: Users can insert one or more pieces of data via the VALUES syntax.
Note: The VALUES method is only suitable for importing a few pieces of data as a DEMO and is not applicable to any test or production environment at all. The StarRocks system itself is also not suitable for single data import scenarios. It is recommended to use
INSERT INTO SELECT
for bulk import.
Import Results
Insert Into itself is a SQL command, and its return results will be divided into the following depending on the execution result.
Executed successfully
mysql> insert into tbl1 select * from empty_tbl;
Query OK, 0 rows affected (0.02 sec)
mysql> insert into tbl1 select * from tbl2;
Query OK, 4 rows affected (0.38 sec)
{'label':'insert_8510c568-9eda-4173-9e36-6adc7d35291c', 'status':'visible', 'txnId':'4005'}
mysql> insert into tbl1 with label my_label1 select * from tbl2;
Query OK, 4 rows affected (0.38 sec)
{'label':'my_label1', 'status':'visible', 'txnId':'4005'}
mysql> insert into tbl1 select * from tbl2;
Query OK, 2 rows affected, 2 warnings (0.31 sec)
{'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'visible', 'txnId':'4005'}
mysql> insert into tbl1 select * from tbl2;
Query OK, 2 rows affected, 2 warnings (0.31 sec)
{'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'committed', 'txnId':'4005'}
rows affected
indicates the total number of rows being imported. warnings
indicates the number of rows being filtered.
label
is a user-specified label or an automatically generated label. label
is the identifier of the Insert Into import job. Each import job has a Label that is unique within a single database.
status
indicates whether the imported data is visible. If it is visible, it shows visible
, otherwise it shows committed
.
txnId
is the id of the import transaction corresponding to this insert.
The err
field displays unintended errors. To see the filtered rows, use the following statement. Use the URL in the return result to query the wrong data.
SHOW LOAD WHERE label="xxx";
Execution Failure
An execution failure indicates that no data was successfully imported and returns the following.
mysql> insert into tbl1 select * from tbl2 where k1 = "a";
ERROR 1064 (HY000): all partitions have no load data. url: [http://10.74.167.16:8042/api/_load_error_log?file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de8507c0bf8a2](http://10.74.167.16:8042/api/_load_error_log?file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de8507c0bf8a2)
Where ERROR 1064 (HY000): all partitions have no load data
shows the reason for the failure. Use the followed URL to query the wrong data.
Related configurations
FE configuration
- timeout: The timeout of the import job (in seconds). If the import job is not completed within the timeout time, it will be CANCELLED by the system. At present, Insert Into does not support custom timeout value. The timeout value is unified by 1 hour. If the import job cannot be completed within the specified time, adjust the
insert_load_default_timeout_second
parameter of the FE.
Session parameters
- enable_insert_strict: Insert Into import itself does not control the tolerable error rate. The tolerable error rate can only be controlled by the
enable_insert_strict
parameter. When this parameter is set to false, it means that at least one piece of data has been imported correctly. If there is failed data, a Label will be returned. When this parameter is set to true, it means that the import will fail even if there is only one data error. This parameter is default to true and can be set bySET enable_insert_strict = false
;. - query_timeout: Insert Into is a SQL command, so the statement is limited by the
query_timeout
session parameter . You can increase the timeout in "seconds" bySET query_timeout = xxx;
.
Import Example
Create Database and Data Table
mysql> CREATE DATABASE IF NOT EXISTS load_test;
mysql> USE load_test;
mysql> CREATE TABLE insert_wiki_edit
(
event_time DATETIME,
channel VARCHAR(32) DEFAULT '',
user VARCHAR(128) DEFAULT '',
is_anonymous TINYINT DEFAULT '0',
is_minor TINYINT DEFAULT '0',
is_new TINYINT DEFAULT '0',
is_robot TINYINT DEFAULT '0',
is_unpatrolled TINYINT DEFAULT '0',
delta INT SUM DEFAULT '0',
added INT SUM DEFAULT '0',
deleted INT SUM DEFAULT '0'
)
AGGREGATE KEY(event_time, channel, user, is_anonymous, is_minor, is_new, is_robot, is_unpatrolled)
PARTITION BY RANGE(event_time)
(
PARTITION p06 VALUES LESS THAN ('2015-09-12 06:00:00'),
PARTITION p12 VALUES LESS THAN ('2015-09-12 12:00:00'),
PARTITION p18 VALUES LESS THAN ('2015-09-12 18:00:00'),
PARTITION p24 VALUES LESS THAN ('2015-09-13 00:00:00')
)
DISTRIBUTED BY HASH(user) BUCKETS 10
PROPERTIES("replication_num" = "1");
Import Data Using values
mysql> INSERT INTO insert_wiki_edit VALUES("2015-09-12 00:00:00","#en.wikipedia","GELongstreet",0,0,0,0,0,36,36,0),("2015-09-12 00:00:00","#ca.wikipedia","PereBot",0,1,0,1,0,17,17,0);
Query OK, 2 rows affected (0.29 sec)
{'label':'insert_1f12c916-5ff8-4ba9-8452-6fc37fac2e75', 'status':'VISIBLE', 'txnId':'601'}
Import Data Using select
mysql> INSERT INTO insert_wiki_edit WITH LABEL insert_load_wikipedia SELECT * FROM routine_wiki_edit;
Query OK, 18203 rows affected (0.40 sec)
{'label':'insert_load_wikipedia', 'status':'VISIBLE', 'txnId':'618'}
Note
- When the
INSERT
statement is executed, the default behavior is to filter out data that does not match the target table format, such as a string that is too long. However, for scenarios that require data not to be filtered, you can setenable_insert_strict
to true to ensure that INSERT will not execute successfully when there is data being filtered out. - StarRocks' insert has the same logic as import. Similar to import, each insert statement will generate a new version of the data. Frequent small batch import operations will generate too many versions of data, which will affect query performance. Therefore it is not recommended to use insert to import data frequently or as a daily routine import job. If there is a need for streaming import or small batch import, it is better to use Stream Load or Routine Load.