- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Spark™
- Load data using INSERT
- Synchronize data from MySQL in real time
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Data Recovery
- User Privilege and Authentication
- Performance Tuning
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- INSTALL PLUGIN
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- GROUP BY
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SHOW ALTER TABLE
- SHOW BACKUP
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Auxiliary Commands
- Data Types
- Function Reference
- Java UDFs
- Window functions
- Aggregate Functions
- Array Functions
- Bit Functions
- Bitmap Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- date_sub, subdate
- Geographic Functions
- JSON Functions
- Math Functions
- String Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- Error code
- System limits
- SQL Reference
1. Does Broker Load support re-running load jobs that have been run successfully and are in the FINISHED state?
Broker Load does not support re-running load jobs that have been run successfully and are in the FINISHED state. Also, to prevent data loss and duplication, Broker Load does not allow reusing the labels of successfully run load jobs. You can use SHOW LOAD to view the history of load jobs and find the load job that you want to re-run. Then, you can copy the information of that load job and use the job information, except the label, to create another load job.
2. When I load data from HDFS by using Broker Load, what do I do if the date and time values loaded into the destination StarRocks table are 8 hours later than the date and time values from the source data file?
Both the destination StarRocks table and the Broker Load job are compiled at creation to use a China Standard Time (CST) time zone (specified by using the
timezone parameter). However, the server is set to run based on a Coordinated Universal Time (UTC) time zone. As a result, 8 extra hours are added to the date and time values from the source data file during data loading. To prevent this issue, do not specify the
timezone parameter when you create the destination StarRocks table.
3. When I load ORC-formatted data by using Broker Load, what do I do if the "ErrorMsg: type:ETL_RUN_FAIL; msg:Cannot cast '<slot 6>' from VARCHAR to ARRAY<VARCHAR(30)>" error occurs?
The source data file has different column names than the destination StarRocks table. In this situation, you must use the
SET clause in the load statement to specify the column mapping between the file and the table. When executing the
SET clause, StarRocks needs to perform a type inference, but it fails in invoking the cast function to transform the source data to the destination data types. To resolve this issue, make sure that the source data file has the same column names as the destination StarRocks table. As such, the
SET clause is not needed and therefore StarRocks does not need to invoke the cast function to perform data type conversions. Then the Broker Load job can be run successfully.
Broker Load is an asynchronous loading method. The load job may still fail even if the load statement does not return errors. After you run a Broker Load job, you can use SHOW LOAD to view the result and
errmsg of the load job. Then, you can modify the job configuration and retry.
5. What do I do if the "failed to send batch" or "TabletWriter add batch with unknown id" error occurs?
The amount of time taken to write the data exceeds the upper limit, causing a timeout error. To resolve this issue, modify the settings of the session variable
query_timeout and the BE configuration item
streaming_load_rpc_max_alive_time_sec based on your business requirements.
6. What do I do if the "LOAD-RUN-FAIL; msg:OrcScannerAdapter::init_include_columns. col name = xxx not found" error occurs?
If you are loading Parquet- or ORC-formatted data, check whether the column names held in the first row of the source data file are the same as the column names of the destination StarRocks table.
(tmp_c1,tmp_c2) SET ( id=tmp_c2, name=tmp_c1 )
The preceding example maps the
tmp_c2 columns of the source data file onto the
id columns of the destination StarRocks table, respectively. If you do not specify the
SET clause, the column names specified in the
column_list parameter are used to declare the column mapping. For more information, see BROKER LOAD.
If the source data file is an ORC-formatted file generated by Apache Hive™ and the first row of the file holds
(_col0, _col1, _col2, ...), the "Invalid Column Name" error may occur. If this error occurs, you need to use the
SETclause to specify the column mapping.
7. How do I handle errors such as the error that causes the Broker Load job to run for an excessively long period of time?
View the FE log file fe.log and search for the ID of the load job based on the job label. Then, view the BE log file be.INFO and retrieve the log records of the load job based on the job ID to locate the root cause of the error.
If an HDFS cluster runs in high availability (HA) mode, configure it as follows:
dfs.nameservices: the name of the HDFS cluster, for example,
"dfs.nameservices" = "my_ha".
dfs.ha.namenodes.xxx: the name of the NameNode in the HDFS cluster. If you specify multiple NameNode names, separate them with commas (,).
xxxis the HDFS cluster name that you have specified in
dfs.nameservices, for example,
"dfs.ha.namenodes.my_ha" = "my_nn".
dfs.namenode.rpc-address.xxx.nn: the RPC address of the NameNode in the HDFS cluster.
nnis the NameNode name that you have specified in
dfs.ha.namenodes.xxx, for example,
"dfs.namenode.rpc-address.my_ha.my_nn" = "host:port".
dfs.client.failover.proxy.provider: the provider of the NameNode to which the client will connect. Default value:
( "dfs.nameservices" = "my-ha", "dfs.ha.namenodes.my-ha" = "my-namenode1, my-namenode2", "dfs.namenode.rpc-address.my-ha.my-namenode1" = "nn1-host:rpc_port", "dfs.namenode.rpc-address.my-ha.my-namenode2" = "nn2-host:rpc_port", "dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" )
The HA mode can be used with simple authentication or Kerberos authentication. For example, to use simple authentication to access an HDFS cluster that runs in HA mode, you need to specify the following configurations:
( "username"="user", "password"="passwd", "dfs.nameservices" = "my-ha", "dfs.ha.namenodes.my-ha" = "my_namenode1, my_namenode2", "dfs.namenode.rpc-address.my-ha.my-namenode1" = "nn1-host:rpc_port", "dfs.namenode.rpc-address.my-ha.my-namenode2" = "nn2-host:rpc_port", "dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" )
You can add the configurations of the HDFS cluster to the hdfs-site.xml file. This way, you only need to specify the file path and authentication information when you use brokers to load data from the HDFS cluster.
Copy the ViewFs-related configuration files
hdfs-site.xml to the broker/conf directory.
If you have a custom file system, you also need to copy the file system-related .jar files to the broker/lib directory.
10. When I access an HDFS cluster that requires Kerberos authentication, what do I do if the "Can't get Kerberos realm" error occurs?
Check that the /etc/krb5.conf file is configured on all hosts on which brokers are deployed.
If the error persists, add
-Djava.security.krb5.conf:/etc/krb5.conf to the end of the
JAVA_OPTS variable in the broker startup script.
- Broker Load
- 1. Does Broker Load support re-running load jobs that have been run successfully and are in the FINISHED state?
- 2. When I load data from HDFS by using Broker Load, what do I do if the date and time values loaded into the destination StarRocks table are 8 hours later than the date and time values from the source data file?
- 3. When I load ORC-formatted data by using Broker Load, what do I do if the "ErrorMsg: type:ETL_RUN_FAIL; msg:Cannot cast '
' from VARCHAR to ARRAY " error occurs?
- 4. The Broker Load job does not report errors, but why am I unable to query the loaded data?
- 5. What do I do if the "failed to send batch" or "TabletWriter add batch with unknown id" error occurs?
- 6. What do I do if the "LOAD-RUN-FAIL; msg:OrcScannerAdapter::init_include_columns. col name = xxx not found" error occurs?
- 7. How do I handle errors such as the error that causes the Broker Load job to run for an excessively long period of time?
- 8. How do I configure an Apache HDFS cluster that runs in HA mode?
- 9. How do I configure Hadoop ViewFS Federation?
- 10. When I access an HDFS cluster that requires Kerberos authentication, what do I do if the "Can't get Kerberos realm" error occurs?