- StarRocks
- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Sparkâ„¢
- Load data using INSERT
- Synchronize data from MySQL in real time
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Administration
- Deployment
- Management
- Data Recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- BACKUP
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- RECOVER
- RESTORE
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW BACKUP
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Auxiliary Commands
- Data Types
- Function Reference
- Java UDFs
- Window functions
- Aggregate Functions
- Array Functions
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_agg
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- to_bitmap
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Math Functions
- String Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- Error code
- System limits
- SQL Reference
- FAQ
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
Strict mode
Strict mode is an optional property that you can configure for data loads. It affects the loading behavior and the final loaded data.
This topic introduces what strict mode is and how to set strict mode.
Understand strict mode
If the original data type of a source column, or the new data type of a source column upon function computation, differs from the data type of the matching destination column, StarRocks converts the source column values to the destination data type during loading. Source column values that fail to be converted are processed into NULL
values, which are called "error data." Rows that contain such error data are called "error rows."
Strict mode works as follows:
- If strict mode is enabled, StarRocks loads only qualified rows. It filters out error rows and returns details about the error rows.
- If strict mode is disabled, StarRocks loads qualified rows together with error rows.
For example, you want to load four rows that hold \N
(\N
denotes a NULL
value), abc
, 2000
, and 1
values respectively in a column from a CSV-formatted data file into a StarRocks table, and the data type of the destination StarRocks table column is TINYINT [-128, 127].
The source column value
\N
is processed intoNULL
upon conversion to TINYINT.NOTE
\N
is always processed intoNULL
upon conversion regardless of the destination data type.
The source column value
abc
is processed intoNULL
, because its data type is not TINYINT and the conversion fails.The source column value
2000
is processed intoNULL
, because it is beyond the range supported by TINYINT and the conversion fails.The source column value
1
can be properly converted to a TINYINT-type value1
.
If strict mode is disabled, StarRocks loads all the four rows.
If strict mode is enabled, StarRocks loads only the rows that hold \N
or 1
and filters out the rows that hold abc
or 2000
. The rows filtered out are counted against the maximum percentage of rows that can be filtered out due to inadequate data quality as specified by the max_filter_ratio
parameter.
Final loaded data with strict mode disabled
Source column value | Column value upon conversion to TINYINT | Load result when destination column allows NULL values | Load result when destination column does not allow NULL values |
---|---|---|---|
\N | NULL | The value NULL is loaded. | An error is reported. |
abc | NULL | The value NULL is loaded. | An error is reported. |
2000 | NULL | The value NULL is loaded. | An error is reported. |
1 | 1 | The value 1 is loaded. | The value 1 is loaded. |
Final loaded data with strict mode enabled
Source column value | Column value upon conversion to TINYINT | Load result when destination column allows NULL values | Load result when destination column does not allow NULL values |
---|---|---|---|
\N | NULL | The value NULL is loaded. | An error is reported. |
abc | NULL | The value NULL is not allowed and therefore is filtered out. | An error is reported. |
2000 | NULL | The value NULL is not allowed and therefore is filtered out. | An error is reported. |
1 | 1 | The value 1 is loaded. | The value 1 is loaded. |
Set strict mode
If you run a Stream Load, Broker Load, Routine Load, or Spark Load job to load data, use the strict_mode
parameter to set strict mode for the load job. Valid values are true
and false
. The default value is false
. The value true
enables strict mode, and the value false
disables strict mode.
If you execute INSERT to load data, use the enable_insert_strict
session variable to set strict mode. Valid values are true
and false
. The default value is true
. The value true
enables strict mode, and the value false
disables strict mode.
Examples are as follows:
Stream Load
curl --location-trusted -u <username>:<password> \
-H "strict_mode: {true | false}" \
-T <file_name> -XPUT \
http://<fe_host>:<fe_http_port>/api/<database_name>/<table_name>/_stream_load
For detailed syntax and parameters about Stream Load, see STREAM LOAD.
Broker Load
LOAD LABEL [<database_name>.]<label_name>
(
DATA INFILE ("<file_path>"[, "<file_path>" ...])
INTO TABLE <table_name>
)
WITH BROKER
(
"username" = "<hdfs_username>",
"password" = "<hdfs_password>"
)
PROPERTIES
(
"strict_mode" = "{true | false}"
)
The preceding code snippet uses HDFS as an example. For detailed syntax and parameters about Broker Load, see BROKER LOAD.
Routine Load
CREATE ROUTINE LOAD [<database_name>.]<job_name> ON <table_name>
PROPERTIES
(
"strict_mode" = "{true | false}"
)
FROM KAFKA
(
"kafka_broker_list" ="<kafka_broker1_ip>:<kafka_broker1_port>[,<kafka_broker2_ip>:<kafka_broker2_port>...]",
"kafka_topic" = "<topic_name>"
)
The preceding code snippet uses Apache Kafka® as an example. For detailed syntax and parameters about Routine Load, see CREATE ROUTINE LOAD.
Spark Load
LOAD LABEL [<database_name>.]<label_name>
(
DATA INFILE ("<file_path>"[, "<file_path>" ...])
INTO TABLE <table_name>
)
WITH RESOURCE <resource_name>
(
"spark.executor.memory" = "3g",
"broker.username" = "<hdfs_username>",
"broker.password" = "<hdfs_password>"
)
PROPERTIES
(
"strict_mode" = "{true | false}"
)
The preceding code snippet uses HDFS as an example. For detailed syntax and parameters about Spark Load, see SPARK LOAD.
INSERT
SET enable_insert_strict = {true | false};
INSERT INTO <table_name> ...
For detailed syntax and parameters about INSERT, see INSERT.