- StarRocks
- Introduction to StarRocks
- Quick Start
- Deployment
- Deployment overview
- Prepare
- Deploy
- Deploy classic StarRocks
- Deploy and use shared-data StarRocks
- Manage
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Sparkâ„¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Realtime synchronization from MySQL
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Gather CBO statistics
- Synchronous materialized view
- Asynchronous materialized view
- Colocate Join
- Lateral Join
- Query Cache
- Index
- Computing the Number of Distinct Values
- Sorted streaming aggregate
- Administration
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEWS
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- AUTO_INCREMENT
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- array_agg
- avg
- any_value
- approx_count_distinct
- bitmap
- bitmap_agg
- count
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_agg
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Binary Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- Math Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- starts_with
- strleft
- strright
- substring
- trim
- ucase
- unhex
- upper
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Check environment configurations
This topic lists all environment and system configuration items that you must check and set before deploying StarRocks. Setting these configuration items properly allows your StarRocks cluster to work with high availability and performance.
Ports
StarRocks uses specific ports for different services. Check whether these ports are occupied on each instance if you have deployed other services on these instances.
FE ports
On the instances used for the FE deployment, you need to check the following ports:
8030
: FE HTTP server port (http_port
)9020
: FE Thrift server port (rpc_port
)9030
: FE MySQL server port (query_port
)9010
: FE internal communication port (edit_log_port
)
Run the following commands on the FE instances to check whether these ports are occupied:
netstat -tunlp | grep 8030
netstat -tunlp | grep 9020
netstat -tunlp | grep 9030
netstat -tunlp | grep 9010
If any of the above ports are occupied, you must find alternatives and specify them later when you deploy FE nodes. For detailed instructions, see Deploy StarRocks - Start the FE service.
BE ports
On the instances used for the BE deployment, you need to check the following ports:
9060
: BE Thrift server port (be_port
)8040
: BE HTTP server port (be_http_port
)9050
: BE heartbeat service port (heartbeat_service_port
)8060
: BE BRPC port (brpc_port
)
Run the following commands on the BE instances to check whether these ports are occupied:
netstat -tunlp | grep 9060
netstat -tunlp | grep 8040
netstat -tunlp | grep 9050
netstat -tunlp | grep 8060
If any of the above ports are occupied, you must find alternatives and specify them later when you deploy BE nodes. For detailed instructions, see Deploy StarRocks - Start the BE service.
CN ports
On the instances used for the CN deployment, you need to check the following ports:
9060
: CN Thrift server port (thrift_port
)8040
: CN HTTP server port (be_http_port
)9050
: CN heartbeat service port (heartbeat_service_port
)8060
: CN BRPC port (brpc_port
)
Run the following commands on the CN instances to check whether these ports are occupied:
netstat -tunlp | grep 9060
netstat -tunlp | grep 8040
netstat -tunlp | grep 9050
netstat -tunlp | grep 8060
If any of the above ports are occupied, you must find alternatives and specify them later when you deploy CN nodes. For detailed instructions, see Deploy StarRocks - Start the CN service.
Hostnames
If you want to enable FQDN access for your StarRocks cluster, you must assign a hostname to each instance.
In the file /etc/hosts on each instance, you must specify the IP addresses and corresponding hostnames of all the other instances in the cluster.
CAUTION
All IP addresses in the file /etc/hosts must be unique.
JDK configuration
StarRocks relies on the environment variable JAVA_HOME
to locate the Java dependency on the instance.
Run the following command to check the environment variable JAVA_HOME
:
echo $JAVA_HOME
Follow these steps to set JAVA_HOME
:
Set
JAVA_HOME
in the file /etc/profile:sudo vi /etc/profile # Replace <path_to_JDK> with the path where JDK is installed. export JAVA_HOME=<path_to_JDK> export PATH=$PATH:$JAVA_HOME/bin
Bring the change into effect:
source /etc/profile
Run the following command to verify the change:
java -version
CPU scaling governor
This configuration item is optional. You can skip it if your CPU does not support the scaling governor.
The CPU scaling governor controls the CPU power mode. If your CPU supports it, we recommend you set it to performance
for better CPU performance:
echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Memory configurations
Memory Overcommit
Memory Overcommit allows the operating system to overcommit memory resources to processes. We recommend you enable Memory Overcommit.
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
Transparent Huge Pages
Transparent Huge Pages are enabled by default. We recommend you disable this feature because it can interfere with the memory allocator, and thereby lead to a drop in performance.
echo 'madvise' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
Swap Space
We recommend you disable Swap Space.
Follow these steps to check and disable Swap Space:
Disable Swap Space.
swapoff /<path_to_swap_space>
Delete the Swap Space information from the configuration file /etc/fstab.
/<path_to_swap_space> swap swap defaults 0 0
Verify that Swap Space is disabled.
free -m
Swappiness
We recommend you disable swappiness to eliminate its impact on performance.
echo 0 | sudo tee /proc/sys/vm/swappiness
Storage configurations
We recommend that you choose your suitable scheduler algorithm in accordance with the storage medium you use.
You can run the following command to check the scheduler algorithm that you are using:
cat /sys/block/${disk}/queue/scheduler
# For example, run cat /sys/block/vdb/queue/scheduler
We recommend you use the mq-deadline scheduler for SATA disks and the kyber scheduler algorithm for SSD and NVMe disks.
SATA
The mq-deadline scheduler algorithm suits SATA disks.
Modify this item temporarily:
echo mq-deadline | sudo tee /sys/block/${disk}/queue/scheduler
To make the change permanent, run the following command after you modify this item:
chmod +x /etc/rc.d/rc.local
SSD and NVMe
The kyber scheduler algorithm suits NVMe or SSD disks.
Modify this item temporarily:
echo kyber | sudo tee /sys/block/${disk}/queue/scheduler
If your system does not support the kyber scheduler for SSD and NVMe, we recommend you use the none (or noop) scheduler.
echo none | sudo tee /sys/block/${disk}/queue/scheduler
To make the change permanent, run the following command after you modify this item:
chmod +x /etc/rc.d/rc.local
SELinux
We recommend you disable SELinux.
sed -i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
sed -i 's/SELINUXTYPE/#SELINUXTYPE/' /etc/selinux/config
setenforce 0
Firewall
Open the internal ports for FE nodes, BE nodes, and Broker if your firewall is enabled.
systemctl stop firewalld.service
systemctl disable firewalld.service
LANG variable
Run the following command to check and configure the LANG variable manually:
echo "export LANG=en_US.UTF8" >> /etc/profile
source /etc/profile
Time zone
Set this item in accordance with your actual time zone.
The following example sets the time zone to /Asia/Shanghai
.
cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
hwclock
ulimit configurations
Problems can occur with StarRocks if the values of max file descriptors and max user processes are abnormally small.
Max file descriptors
You can set the maximum number of file descriptors by running the following command:
ulimit -n 65535
Max user processes
You can set the maximum number of user processes by running the following command:
ulimit -u 40960
File system configuration
We recommend you use the ext4 or xfs journaling file system. You can run the following command to check the mount type:
df -Th
Network configuration
tcp_abort_on_overflow
Allow the system to reset new connections if the system is currently overflowed with new connection attempts that the daemon(s) can not handle:
echo 1 | sudo tee /proc/sys/net/ipv4/tcp_abort_on_overflow
somaxconn
Specify the maximum number of connection requests queued for any listening socket to 1024
:
echo 1024 | sudo tee /proc/sys/net/core/somaxconn
NTP configuration
You must configure time synchronization between nodes within your StarRocks cluster to ensure linear consistency of transactions. You can either use the internet time service provided by pool.ntp.org, or use the NTP service built in an offline environment. For example, you can use the NTP service provided by your cloud service provider.
Check if the NTP time server exists.
rpm -qa | grep ntp
Install the NTP service if there is not one.
sudo yum install ntp ntpdate && \ sudo systemctl start ntpd.service && \ sudo systemctl enable ntpd.service
Check the NTP service.
systemctl list-unit-files | grep ntp
Check the connectivity and monitoring status of the NTP service.
netstat -tlunp | grep ntp
Check if your application is synchronized with the NTP server.
ntpstat
Check the state of all the configured NTP servers in your network.
ntpq -p
High concurrency configurations
If your StarRocks cluster has a high load concurrency, we recommend you set the following configurations:
echo 120000 > /proc/sys/kernel/threads-max
echo 262144 > /proc/sys/vm/max_map_count
echo 200000 > /proc/sys/kernel/pid_max