- StarRocks
- Introduction to StarRocks
- Quick Start
- Deployment
- Deployment overview
- Prepare
- Deploy
- Deploy classic StarRocks
- Deploy and use shared-data StarRocks
- Manage
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Sparkâ„¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Realtime synchronization from MySQL
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Gather CBO statistics
- Synchronous materialized view
- Asynchronous materialized view
- Colocate Join
- Lateral Join
- Query Cache
- Index
- Computing the Number of Distinct Values
- Sorted streaming aggregate
- Administration
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEWS
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- AUTO_INCREMENT
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- array_agg
- avg
- any_value
- approx_count_distinct
- bitmap
- bitmap_agg
- count
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_agg
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Binary Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- Math Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- starts_with
- strleft
- strright
- substring
- trim
- ucase
- unhex
- upper
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Deploy StarRocks manually
This topic describes how to manually deploy StarRocks. For other modes of installation, see Deployment Overview.
Step 1: Start the FE service
The following procedures are performed on the FE instances.
Create a dedicated directory for metadata storage. We recommend storing metadata in a separate directory from the FE deployment files. Make sure that this directory exists and that you have write access to it.
# Replace <meta_dir> with the metadata directory you want to create. mkdir -p <meta_dir>
Navigate to the directory that stores the StarRocks FE deployment files you prepared earlier, and modify the FE configuration file fe/conf/fe.conf.
a. Specify the metadata directory in the configuration item
meta_dir
.# Replace <meta_dir> with the metadata directory you have created. meta_dir = <meta_dir>
b. If any of the FE ports mentioned in the Environment Configuration Checklist are occupied, you must assign valid alternatives in the FE configuration file.
http_port = aaaa # Default: 8030 rpc_port = bbbb # Default: 9020 query_port = cccc # Default: 9030 edit_log_port = dddd # Default: 9010
CAUTION
If you want to deploy multiple FE nodes in a cluster, you must assign the same
http_port
to each FE node.c. If you want to enable IP address access for your cluster, you must add the configuration item
priority_networks
in the configuration file and assign a dedicated IP address (in the CIDR format) to the FE node. You can ignore this configuration item if you want to enable FQDN access for your cluster.priority_networks = x.x.x.x/x
NOTE
You can run
ifconfig
in your terminal to view the IP address(es) owned by the instance.d. If you have multiple JDKs installed on the instance, and you want to use a specific JDK that is different from the one specified in the environment variable
JAVA_HOME
, you must specify the path where the chosen JDK is installed by adding the configuration itemJAVA_HOME
in the configuration file.# Replace <path_to_JDK> with the path where the chosen JDK is installed. JAVA_HOME = <path_to_JDK>
f. For information about advanced configuration items, see Parameter Configuration - FE configuration items.
Start the FE node.
To enable IP address access for your cluster, run the following command to start the FE node:
./fe/bin/start_fe.sh --daemon
To enable FQDN access for your cluster, run the following command to start the FE node:
./fe/bin/start_fe.sh --host_type FQDN --daemon
Note that you only need to specify the parameter
--host_type
ONCE when you start the node for the first time.CAUTION
Before starting the FE node with FQDN access enabled, make sure you have assigned hostnames for all instances in /etc/hosts. See Environment Configuration Checklist - Hostnames for more information.
Check the FE logs to verify if the FE node is started successfully.
cat fe/log/fe.log | grep thrift
A record of log like "2022-08-10 16:12:29,911 INFO (UNKNOWN x.x.x.x_9010_1660119137253(-1)|1) [FeServer.start():52] thrift server started with port 9020." suggests that the FE node is started properly.
Step 2: (Optional) Deploy high-availability FE cluster
A high-availability FE cluster requires at least THREE Follower FE nodes in the StarRocks cluster. After the first FE node is started successfully, you can then start two new FE nodes to deploy a high-availability FE cluster.
Basically, you can repeat the procedures in Step 1 except for the command used to start the FE node. When adding an extra FE node to a cluster, you must assign a helper node (essentially an existing FE node) to the new FE node to synchronize the metadata of all other FE nodes.
To start a new FE node with IP address access, run the following command to start the FE node:
# Replace <helper_fe_ip> with the IP address (priority_networks) # of the helper node, and replace <helper_edit_log_port> with # the helper node's edit_log_port. ./fe/bin/start_fe.sh --helper <helper_fe_ip>:<helper_edit_log_port> --daemon
Note that you only need to specify the parameter
--helper
ONCE when you start the node for the first time.To start a new FE node with FQDN access, run the following command to start the FE node:
# Replace <helper_fqdn> with the FQDN of the helper node, # and replace <helper_edit_log_port> with the helper node's edit_log_port. ./fe/bin/start_fe.sh --helper <helper_fqdn>:<helper_edit_log_port> \ --host_type FQDN --daemon
Note that you only need to specify the parameters
--helper
and--host_type
ONCE when you start the node for the first time.
Step 3: Start the BE service
The following procedures are performed on the BE instances.
Create a dedicated directory for data storage. We recommend storing data in a separate directory from the BE deployment directory. Make sure that this directory exists and you have write access to it.
# Replace <storage_root_path> with the data storage directory you want to create. mkdir -p <storage_root_path>
Navigate to the directory that stores the StarRocks BE deployment files you prepared earlier, and modify the BE configuration file be/conf/be.conf.
a. Specify the data directory in the configuration item
storage_root_path
.# Replace <storage_root_path> with the data directory you have created. storage_root_path = <storage_root_path>
b. If any of the BE ports mentioned in the Environment Configuration Checklist are occupied, you must assign valid alternatives in the BE configuration file.
be_port = vvvv # Default: 9060 be_http_port = xxxx # Default: 8040 heartbeat_service_port = yyyy # Default: 9050 brpc_port = zzzz # Default: 8060
c. If you want to enable IP address access for your cluster, you must add the configuration item
priority_networks
in the configuration file and assign a dedicated IP address (in the CIDR format) to the BE node. You can ignore this configuration item if you want to enable FQDN access for your cluster.priority_networks = x.x.x.x/x
NOTE
You can run
ifconfig
in your terminal to view the IP address(es) owned by the instance.d. If you have multiple JDKs installed on the instance, and you want to use a specific JDK that is different from the one specified in the environment variable
JAVA_HOME
, you must specify the path where the chosen JDK is installed by adding the configuration itemJAVA_HOME
in the configuration file.# Replace <path_to_JDK> with the path where the chosen JDK is installed. JAVA_HOME = <path_to_JDK>
For information about advanced configuration items, see Parameter Configuration - BE configuration items.
Start the BE node.
./be/bin/start_be.sh --daemon
CAUTION
- Before starting the BE node with FQDN access enabled, make sure you have assigned hostnames for all instances in /etc/hosts. See Environment Configuration Checklist - Hostnames for more information.
- You do not need to specify the parameter
--host_type
when you start BE nodes.
Check the BE logs to verify if the BE node is started successfully.
cat be/log/be.INFO | grep heartbeat
A record of log like "I0810 16:18:44.487284 3310141 task_worker_pool.cpp:1387] Waiting to receive first heartbeat from frontend" suggests that the BE node is started properly.
You can start new BE nodes by repeating the above procedures on other BE instances.
NOTE
A high-availability cluster of BEs is automatically formed when at least three BE nodes are deployed and added to a StarRocks cluster.
Step 4: (Optional) Start the CN service
A Compute Node (CN) is a stateless computing service that does not maintain data itself. You can optionally add CN nodes to your cluster to provide extra computing resources for queries. You can deploy CN nodes with the BE deployment files. Compute Nodes are supported since v2.4.
Navigate to the directory that stores the StarRocks BE deployment files you prepared earlier, and modify the CN configuration file be/conf/cn.conf.
a. If any of the CN ports mentioned in the Environment Configuration Checklist are occupied, you must assign valid alternatives in the CN configuration file.
thrift_port = vvvv # Default: 9060 be_http_port = xxxx # Default: 8040 heartbeat_service_port = yyyy # Default: 9050 brpc_port = zzzz # Default: 8060
b. If you want to enable IP address access for your cluster, you must add the configuration item
priority_networks
in the configuration file and assign a dedicated IP address (in the CIDR format) to the CN node. You can ignore this configuration item if you want to enable FQDN access for your cluster.priority_networks = x.x.x.x/x
NOTE
You can run
ifconfig
in your terminal to view the IP address(es) owned by the instance.c. If you have multiple JDKs installed on the instance, and you want to use a specific JDK that is different from the one specified in the environment variable
JAVA_HOME
, you must specify the path where the chosen JDK is installed by adding the configuration itemJAVA_HOME
in the configuration file.# Replace <path_to_JDK> with the path where the chosen JDK is installed. JAVA_HOME = <path_to_JDK>
For information about advanced configuration items, see Parameter Configuration - BE configuration items because most of CN's parameters are inherited from BE.
Start the CN node.
./be/bin/start_cn.sh --daemon
CAUTION
- Before starting the CN node with FQDN access enabled, make sure you have assigned hostnames for all instances in /etc/hosts. See Environment Configuration Checklist - Hostnames for more information.
- You do not need to specify the parameter
--host_type
when you start CN nodes.
Check the CN logs to verify if the CN node is started successfully.
cat be/log/cn.INFO | grep heartbeat
A record of log like "I0313 15:03:45.820030 412450 thrift_server.cpp:375] heartbeat has started listening port on 9050" suggests that the CN node is started properly.
You can start new CN nodes by repeating the above procedures on other instances.
Step 5: Set up the cluster
After all FE, BE nodes, and CN nodes are started properly, you can set up the StarRocks cluster.
The following procedures are performed on a MySQL client. You must have MySQL client 5.5.0 or later installed.
Connect to StarRocks via your MySQL client. You need to log in with the initial account
root
, and the password is empty by default.# Replace <fe_address> with the IP address (priority_networks) or FQDN # of the FE node you connect to, and replace <query_port> (Default: 9030) # with the query_port you specified in fe.conf. mysql -h <fe_address> -P<query_port> -uroot
(Optional) If you have deployed multiple Follower FE nodes, you can add the extra two Follower FE nodes to the cluster by executing the following SQL.
-- Replace <fe_address> with the IP address (priority_networks) -- or FQDN of each FE node, and replace <edit_log_port> -- with the edit_log_port (Default: 9010) you specified in fe.conf. ALTER SYSTEM ADD FOLLOWER "<fe2_address>:<edit_log_port>", "<fe3_address>:<edit_log_port>";
NOTE
- You can use the preceding command to add multiple Follower FE nodes at a time. Each
<fe_address>:<edit_log_port>
pair represents one FE node. - If you want to add more Observer FE nodes, execute
ALTER SYSTEM ADD OBSERVER "<fe2_address>:<edit_log_port>"[, ...]
. For detailed instructions, see ALTER SYSTEM - FE.
- You can use the preceding command to add multiple Follower FE nodes at a time. Each
Check the status of the FE nodes by executing the following SQL.
SHOW PROC '/frontends'\G
Example:
MySQL [(none)]> SHOW PROC '/frontends'\G *************************** 1. row *************************** Name: x.x.x.x_9010_1660119137253 IP: x.x.x.x EditLogPort: 9010 HttpPort: 8030 QueryPort: 9030 RpcPort: 9020 Role: FOLLOWER IsMaster: true ClusterId: 58958864 Join: true Alive: true ReplayedJournalId: 30602 LastHeartbeat: 2022-08-11 20:34:26 IsHelper: true ErrMsg: StartTime: 2022-08-10 16:12:29 Version: 2.3.0-a9bdb09 1 row in set (0.01 sec)
- If the field
Alive
istrue
, this FE node is properly started and added to the cluster. - If the field
Role
isFOLLOWER
, this FE node is eligible to be elected as the Leader FE node. - If the field
Role
isLEADER
, this FE node is the Leader FE node.
- If the field
Add a BE node to the cluster.
-- Replace <be_address> with the IP address (priority_networks) -- or FQDN of the BE node, and replace <heartbeat_service_port> -- with the heartbeat_service_port (Default: 9050) you specified in be.conf. ALTER SYSTEM ADD BACKEND "<be_address>:<heartbeat_service_port>";
NOTE
You can use the preceding command to add multiple BE nodes at a time. Each
<be_address>:<heartbeat_service_port>
pair represents one BE node.Check the status of the BE nodes by executing the following SQL.
SHOW PROC '/backends'\G
Example:
MySQL [(none)]> SHOW PROC '/backends'\G *************************** 1. row *************************** BackendId: 10036 Cluster: default_cluster IP: x.x.x.x HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2022-08-10 17:39:01 LastHeartbeat: 2022-08-11 20:34:31 Alive: true SystemDecommissioned: false ClusterDecommissioned: false TabletNum: 0 DataUsedCapacity: .000 AvailCapacity: 1.000 B TotalCapacity: .000 UsedPct: 0.00 % MaxDiskUsedPct: 0.00 % ErrMsg: Version: 2.3.0-a9bdb09 Status: {"lastSuccessReportTabletsTime":"N/A"} DataTotalCapacity: .000 DataUsedPct: 0.00 % CpuCores: 16
If the field
Alive
istrue
, this BE node is properly started and added to the cluster.(Optional) Add a CN node to the cluster.
-- Replace <cn_address> with the IP address (priority_networks) -- or FQDN of the CN node, and replace <heartbeat_service_port> -- with the heartbeat_service_port (Default: 9050) you specified in cn.conf. ALTER SYSTEM ADD COMPUTE NODE "<cn_address>:<heartbeat_service_port>";
NOTE
You can add multiple CN nodes with one SQL. Each
<cn_address>:<heartbeat_service_port>
pair represents one CN node.(Optional) Check the status of the CN nodes by executing the following SQL.
SHOW PROC '/compute_nodes'\G
Example:
MySQL [(none)]> SHOW PROC '/compute_nodes'\G *************************** 1. row *************************** ComputeNodeId: 10003 IP: x.x.x.x HeartbeatPort: 9550 BePort: 9060 HttpPort: 8540 BrpcPort: 8560 LastStartTime: 2023-03-13 15:11:13 LastHeartbeat: 2023-03-13 15:11:13 Alive: true SystemDecommissioned: false ClusterDecommissioned: false ErrMsg: Version: 2.5.2-c3772fb 1 row in set (0.00 sec)
If the field
Alive
istrue
, this CN node is properly started and added to the cluster.
Stop the StarRocks cluster
You can stop the StarRocks cluster by running the following commands on the corresponding instances.
Stop an FE node.
./fe/bin/stop_fe.sh --daemon
Stop a BE node.
./be/bin/stop_be.sh --daemon
Stop a CN node.
./be/bin/stop_cn.sh --daemon
Troubleshooting
Try the following steps to identify the errors that occur when you start the FE, BE, or CN nodes:
If an FE node is not started properly, you can identify the problem by checking its log in fe/log/fe.warn.log.
cat fe/log/fe.warn.log
Having identified and resolved the problem, you must first terminate the current FE process, delete the existing meta directory, create a new metadata storage directory, and then restart the FE node with the correct configuration.
If a BE node is not started properly, you can identify the problem by checking its log in be/log/be.WARNING.
cat be/log/be.WARNING
Having identified and resolved the problem, you must first terminate the existing BE process, delete the existing storage directory, create a new data storage directory, and then restart the BE node with the correct configuration.
If a CN node is not started properly, you can identify the problem by checking its log in be/log/cn.WARNING.
cat be/log/cn.WARNING
Having identified and resolved the problem, you must first terminate the existing CN process, and then restart the CN node with the correct configuration.
What to do next
Having deployed your StarRocks cluster, you can move on to Post-deployment Setup for instructions on initial management measures.