Skip to main content
Version: Candidate-3.4

Feature Support: Data Loading and Unloading

This document outlines the features of various data loading and unloading methods supported by StarRocks.

File format

Loading file formats

Data SourceFile Format
CSVJSON [3]ParquetORCAvroProtoBufThrift
Stream LoadLocal file systems, applications, connectorsYesYesTo be supportedTo be supportedTo be supported
INSERT from FILESHDFS, S3, OSS, Azure, GCS, NFS(NAS) [5]Yes (v3.3+)To be supportedYes (v3.1+)Yes (v3.1+)To be supported
Broker LoadYesYes (v3.2.3+)YesYesTo be supported
Routine LoadKafkaYesYesTo be supportedTo be supportedYes (v3.0+) [1]To be supportedTo be supported
Spark LoadYesTo be supportedYesYesTo be supported
ConnectorsFlink, SparkYesYesTo be supportedTo be supportedTo be supported
Kafka Connector [2]KafkaYes (v3.0+)To be supportedTo be supportedYes (v3.0+)To be supported
PIPE [4]Consistent with INSERT from FILES
note

[1], [2]: Schema Registry is required.

[3]: JSON supports a variety of CDC formats. For details about the JSON CDC formats supported by StarRocks, see JSON CDC format.

[4]: Currently, only INSERT from FILES is supported for loading with PIPE.

[5]: You need to mount a NAS device as NFS under the same directory of each BE or CN node to access the files in NFS via the file:// protocol.

JSON CDC formats

Stream LoadRoutine LoadBroker LoadINSERT from FILESKafka Connector [1]
DebeziumTo be supportedTo be supportedTo be supportedTo be supportedYes (v3.0+)
CanalTo be supported
Maxwell
note

[1]: You must configure the transforms parameter while loading Debezium CDC format data into Primary Key tables in StarRocks.

Unloading file formats

TargetFile format
Table formatRemote storageCSVJSONParquetORC
INSERT INTO FILESN/AHDFS, S3, OSS, Azure, GCS, NFS(NAS) [3]Yes (v3.3+)To be supportedYes (v3.2+)Yes (v3.3+)
INSERT INTO CatalogHiveHDFS, S3, OSS, Azure, GCSYes (v3.3+)To be supportedYes (v3.2+)Yes (v3.3+)
IcebergHDFS, S3, OSS, Azure, GCSTo be supportedTo be supportedYes (v3.2+)To be supported
Hudi/DeltaTo be supported
EXPORTN/AHDFS, S3, OSS, Azure, GCSYes [1]To be supportedTo be supportedTo be supported
PIPETo be supported [2]
note

[1]: Configuring Broker process is supported.

[2]: Currently, unloading data using PIPE is not supported.

[3]: You need to mount a NAS device as NFS under the same directory of each BE or CN node to access the files in NFS via the file:// protocol.

File formatParameterLoading method
Stream LoadINSERT from FILESBroker LoadRoutine LoadSpark Load
CSVcolumn_separatorYesYes (v3.3+)Yes [1]
row_delimiterYesYes [2] (v3.1+)Yes [3] (v2.2+)To be supported
encloseYes (v3.0+)Yes (v3.0+)Yes (v3.0+)To be supported
escape
skip_headerTo be supported
trim_spaceYes (v3.0+)
JSONjsonpathsYesTo be supportedYes (v3.2.3+)YesTo be supported
strip_outer_array
json_root
ignore_json_sizeTo be supported
note

[1]: The corresponding parameter is COLUMNS TERMINATED BY.

[2]: The corresponding parameter is ROWS TERMINATED BY.

[3]: The corresponding parameter is ROWS TERMINATED BY.

File formatParameterUnloading method
INSERT INTO FILESEXPORT
CSVcolumn_separatorYes (v3.3+)Yes
line_delimiter [1]
note

[1]: The corresponding parameter in data loading is row_delimiter.

Compression formats

Loading compression formats

File formatCompression formatLoading method
Stream LoadBroker LoadINSERT from FILESRoutine LoadSpark Load
CSV
  • defalte
  • bzip2
  • gzip
  • lz4_frame
  • zstd
Yes [1]Yes [2]To be supportedTo be supportedTo be supported
JSONYes (v3.2.7+) [3]To be supportedN/ATo be supportedN/A
Parquet
  • gzip
  • lz4
  • snappy
  • zlib
  • zstd
N/AYes [4]To be supportedYes [4]
ORC
note

[1]: Currently, only when loading CSV files with Stream Load can you specify the compression format by using format=gzip, indicating gzip-compressed CSV files. deflate and bzip2 formats are also supported.

[2]: Broker Load does not support specifying the compression format of CSV files by using the parameter format. Broker Load identifies the compression format by using the suffix of the file. The suffix of gzip-compressed files is .gz, and that of the zstd-compressed files is .zst. Besides, other format-related parameters, such as trim_space and enclose, are also not supported.

[3]: Supports specifying the compression format by using compression = gzip.

[4]: Supported by Arrow Library. You do not need to configure the compression parameter.

Unloading compression formats

File formatCompression formatUnloading method
INSERT INTO FILESINSERT INTO CatalogEXPORT
HiveIcebergHudi/Delta
CSV
  • defalte
  • bzip2
  • gzip
  • lz4_frame
  • zstd
To be supportedTo be supportedTo be supportedTo be supportedTo be supported
JSONN/AN/AN/AN/AN/AN/A
Parquet
  • gzip
  • lz4
  • snappy
  • zstd
Yes (v3.2+)Yes (v3.2+)Yes (v3.2+)To be supportedN/A
ORC

Credentials

Loading - Authentication

AuthenticationLoading method
Stream LoadINSERT from FILESBroker LoadRoutine LoadExternal Catalog
Single KerberosN/AYes (v3.1+)Yes [1] (versions earlier than v2.5)Yes [2] (v3.1.4+)Yes
Kerberos Ticket Granting Ticket (TGT)N/ATo be supportedYes (v3.1.10+/v3.2.1+)
Single KDC Multiple KerberosN/A
Basic access authentications (Access Key pair, IAM Role)N/AYes (HDFS and S3-compatible object storage)Yes [3]Yes
note

[1]: For HDFS, StarRocks supports both simple authentication and Kerberos authentication.

[2]: When the security protocol is set to sasl_plaintext or sasl_ssl, both SASL and GSSAPI (Kerberos) authentications are supported.

[3]: When the security protocol is set to sasl_plaintext or sasl_ssl, both SASL and PLAIN authentications are supported.

Unloading - Authentication

INSERT INTO FILESEXPORT
Single KerberosTo be supportedTo be supported

Loading - Other parameters and features

Parameter and featureLoading method
Stream LoadINSERT from FILESINSERT from SELECT/VALUESBroker LoadPIPERoutine LoadSpark Load
partial_updateYes (v3.0+)Yes [1] (v3.3+)Yes (v3.0+)N/AYes (v3.0+)To be supported
partial_update_modeYes (v3.1+)To be supportedYes (v3.1+)N/ATo be supportedTo be supported
COLUMNS FROM PATHN/AYes (v3.2+)N/AYesN/AN/AYes
timezone or session variable time_zone [2]Yes [3]Yes [4]Yes [4]Yes [4]To be supportedYes [4]To be supported
Time accuracy - MicrosecondYesYesYesYes (v3.1.11+/v3.2.6+)To be supportedYesYes
note

[1]: From v3.3 onwards, StarRocks supports Partial Updates in Row mode for INSERT INTO by specifying the column list.

[2]: Setting the time zone by the parameter or the session variable will affect the results returned by functions such as strftime(), alignment_timestamp(), and from_unixtime().

[3]: Only the parameter timezone is supported.

[4]: Only the session variable time_zone is supported.

Unloading - Other parameters and features

Parameter and featureINSERT INTO FILESEXPORT
target_max_file_sizeYes (v3.2+)To be supported
single
Partitioned_by
Session variable time_zoneTo be supported
Time accuracy - MicrosecondTo be supportedTo be supported