- StarRocks
- Introduction to StarRocks
- Quick Start
- Deployment
- Deployment overview
- Prepare
- Deploy
- Deploy classic StarRocks
- Deploy and use shared-data StarRocks
- Manage
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Sparkâ„¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Realtime synchronization from MySQL
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Gather CBO statistics
- Synchronous materialized view
- Asynchronous materialized view
- Colocate Join
- Lateral Join
- Query Cache
- Index
- Computing the Number of Distinct Values
- Sorted streaming aggregate
- Administration
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEWS
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- AUTO_INCREMENT
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- array_agg
- avg
- any_value
- approx_count_distinct
- bitmap
- bitmap_agg
- count
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_agg
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Binary Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- Math Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- starts_with
- strleft
- strright
- substring
- trim
- ucase
- unhex
- upper
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Deploy StarRocks with Operator
This topic introduces how to use the StarRocks Operator to automate the deployment and management of a StarRocks cluster on a Kubernetes cluster.
How it works
Before you begin
Create Kubernetes cluster
You can use the cloud-managed Kubernetes service, such as an Amazon Elastic Kubernetes Service (EKS) or Google Kubernetes Engine (GKE) cluster, or a self-managed Kubernetes cluster.
Create an Amazon EKS cluster
- Check that the following command-line tools are installed in your environment:
- Install and configure AWS command-line tool AWS CLI.
- Install EKS cluster command-line tool eksctl.
- Install Kubernetes cluster command-line tool kubectl.
- Use one of the following methods to create an EKS cluster:
- Check that the following command-line tools are installed in your environment:
Create a GKE cluster
Before you start to create a GKE cluster, make sure that you complete all the prerequisites. Then follow the instructions provided in Create a GKE cluster to create a GKE cluster.
Create a self-managed Kubernetes cluster
Follow the instructions provided in Bootstrapping clusters with kubeadm to create a self-managed Kubernetes cluster. You can use Minikube and Docker Desktop to create a single-node private Kubernetes cluster with minimum steps.
Deploy StarRocks Operator
Add the custom resource StarRocksCluster.
kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/starrocks.com_starrocksclusters.yaml
Deploy the StarRocks Operator. You can choose to deploy the StarRocks Operator by using a default configuration file or a custom configuration file.
Deploy the StarRocks Operator by using a default configuration file.
kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/operator.yaml
The StarRocks Operator is deployed to the namespace
starrocks
and manages all StarRocks clusters under all namespaces.Deploy the StarRocks Operator by using a custom configuration file.
Download the configuration file operator.yaml, which is used to deploy the StarRocks Operator.
curl -O https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/operator.yaml
Modify the configuration file operator.yaml to suit your needs.
Deploy the StarRocks Operator.
kubectl apply -f operator.yaml
Check the running status of the StarRocks Operator. If the pod is in the
Running
state and all containers inside the pod areREADY
, the StarRocks Operator is running as expected.$ kubectl -n starrocks get pods NAME READY STATUS RESTARTS AGE starrocks-controller-65bb8679-jkbtg 1/1 Running 0 5m6s
NOTE
If you customize the namespace in which the StarRocks Operator is located, you need to replace
starrocks
with the name of your customized namespace.
Deploy StarRocks Cluster
You can directly use the sample configuration files provided by StarRocks to deploy a StarRocks cluster (an object instantiated by using the custom resource StarRocks Cluster). For example, you can use starrocks-fe-and-be.yaml to deploy a StarRocks cluster that contains three FE nodes and three BE nodes.
kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/examples/starrocks/starrocks-fe-and-be.yaml
The following table describes a few important fields in the starrocks-fe-and-be.yaml file.
Field | Description |
---|---|
Kind | The resource type of the object. The value must be StarRocksCluster . |
Metadata | Metadata, in which the following sub-fields are nested:
|
Spec | The expected status of the object. Valid values are starRocksFeSpec , starRocksBeSpec , and starRocksCnSpec . |
You can also deploy the StarRocks cluster by using a modified configuration file. For supported fields and detailed descriptions, see api.md.
Deploying the StarRocks cluster takes a while. During this period, you can use the command kubectl -n starrocks get pods
to check the starting status of the StarRocks cluster. If all the pods are in the Running
state and all containers inside the pods are READY
, the StarRocks cluster is running as expected.
NOTE
If you customize the namespace in which the StarRocks cluster is located, you need to replace
starrocks
with the name of your customized namespace.
$ kubectl -n starrocks get pods
NAME READY STATUS RESTARTS AGE
starrocks-controller-65bb8679-jkbtg 1/1 Running 0 22h
starrockscluster-sample-be-0 1/1 Running 0 23h
starrockscluster-sample-be-1 1/1 Running 0 23h
starrockscluster-sample-be-2 1/1 Running 0 22h
starrockscluster-sample-fe-0 1/1 Running 0 21h
starrockscluster-sample-fe-1 1/1 Running 0 21h
starrockscluster-sample-fe-2 1/1 Running 0 22h
Note
If some pods cannot start after a long period of time, you can use
kubectl logs -n starrocks <pod_name>
to view the log information or usekubectl -n starrocks describe pod <pod_name>
to view the event information to locate the problem.
Manage StarRocks Cluster
Access StarRocks Cluster
The components of the StarRocks cluster can be accessed through their associated Services, such as the FE Service. For detailed descriptions of Services and their access addresses, see api.md and Services.
NOTE
- Only the FE Service is deployed by default. If you need to deploy the BE Service and CN Service, you need to configure
starRocksBeSpec
andstarRocksCnSpec
in the StarRocks cluster configuration file.- The name of a Service is
<cluster name>-<component name>-service
by default, for example,starrockscluster-sample-fe-service
. You can also specify the Service name in the spec of each component.
Access StarRocks Cluster from within Kubernetes Cluster
From within the Kubernetes cluster, the StarRocks cluster can be accessed through the FE Service's ClusterIP.
Obtain the internal virtual IP address
CLUSTER-IP
and portPORT(S)
of the FE Service.$ kubectl -n starrocks get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE be-domain-search ClusterIP None <none> 9050/TCP 23m fe-domain-search ClusterIP None <none> 9030/TCP 25m starrockscluster-sample-fe-service ClusterIP 10.100.162.xxx <none> 8030/TCP,9020/TCP,9030/TCP,9010/TCP 25m
Access the StarRocks cluster by using the MySQL client from within the Kubernetes cluster.
mysql -h 10.100.162.xxx -P 9030 -uroot
Access StarRocks Cluster from outside Kubernetes Cluster
From outside the Kubernetes cluster, you can access the StarRocks cluster through the FE Service's LoadBalancer or NodePort. This topic uses LoadBalancer as an example:
Run the command
kubectl -n starrocks edit src starrockscluster-sample
to update the StarRocks cluster configuration file, and change the Service type ofstarRocksFeSpec
toLoadBalancer
.starRocksFeSpec: image: starrocks/alpine-fe:2.4.1 replicas: 3 requests: cpu: 4 memory: 16Gi service: type: LoadBalancer # specified as LoadBalancer
Obtain the IP address
EXTERNAL-IP
and portPORT(S)
that the FE Service exposes to the outside.$ kubectl -n starrocks get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE be-domain-search ClusterIP None <none> 9050/TCP 127m fe-domain-search ClusterIP None <none> 9030/TCP 129m starrockscluster-sample-fe-service LoadBalancer 10.100.162.xxx a7509284bf3784983a596c6eec7fc212-618xxxxxx.us-west-2.elb.amazonaws.com 8030:30629/TCP,9020:32544/TCP,9030:32244/TCP,9010:32024/TCP 129m ClusterIP None <none> 9030/TCP 23h
Log in to your machine host and access the StarRocks cluster by using the MySQL client.
mysql -h a7509284bf3784983a596c6eec7fc212-618xxxxxx.us-west-2.elb.amazonaws.com -P9030 -uroot
Upgrade StarRocks Cluster
Upgrade BE nodes
Run the following command to specify a new BE image file, such as starrocks/be-ubuntu:2.5.0-fix-uid
:
kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksBeSpec":{"image":"starrocks/be-ubuntu:2.5.0-fix-uid"}}}'
Upgrade FE nodes
Run the following command to specify a new FE image file, such as starrocks/fe-ubuntu:2.5.0-fix-uid
:
kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksFeSpec":{"image":"starrocks/fe-ubuntu:2.5.0-fix-uid"}}}'
The upgrade process lasts for a while. You can run the command kubectl -n starrocks get pods
to view the upgrade progress.
Scale StarRocks cluster
This topic takes scaling out the BE and FE clusters as examples.
Scale out BE cluster
Run the following command to scale out the BE cluster to 9 nodes:
kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksBeSpec":{"replicas":9}}}'
Scale out FE cluster
Run the following command to scale out the FE cluster to 4 nodes:
kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksFeSpec":{"replicas":4}}}'
The scaling process lasts for a while. You can use the command kubectl -n starrocks get pods
to view the scaling progress.
Automatic scaling for CN cluster
Run the command kubectl -n starrocks edit src starrockscluster-sample
to configure the automatic scaling policy for the CN cluster. You can specify the resource metrics for CNs as the average CPU utilization, average memory usage, elastic scaling threshold, upper elastic scaling limit, and lower elastic scaling limit. The upper elastic scaling limit and lower elastic scaling limit specify the maximum number and minimum number of CNs allowed for elastic scaling.
NOTE
If the automatic scaling policy for the CN cluster is configured, delete the
replicas
field from thestarRocksCnSpec
in the StarRocks cluster configuration file.
Kubernetes also supports using behavior
to customize scaling behaviors according to business scenarios, helping you achieve rapid or slow scaling or disable scaling. For more information about automatic scaling policies, see Horizontal Pod Scaling.
The following is a template provided by StarRocks to help you configure automatic scaling policies:
starRocksCnSpec:
image: starrocks/centos-cn:2.4.1
requests:
cpu: 4
memory: 4Gi
autoScalingPolicy: # Automatic scaling policy of the CN cluster.
maxReplicas: 10 # The maximum number of CNs is set to 10.
minReplicas: 1 # The minimum number of CNs is set to 1.
hpaPolicy:
metrics: # Resource metrics
- type: Resource
resource:
name: memory # The average memory usage of CNs is specified as a resource metric.
target:
averageUtilization: 30
# The elastic scaling threshold is 30%.
# When the average memory utilization of CNs exceeds 30%, the number of CNs increases for scale-out.
# When the average memory utilization of CNs is below 30%, the number of CNs decreases for scale-in.
type: Utilization
- type: Resource
resource:
name: cpu # The average CPU utilization of CNs is specified as a resource metric.
target:
averageUtilization: 60
# The elastic scaling threshold is 60%.
# When the average CPU utilization of CNs exceeds 60%, the number of CNs increases for scale-out.
# When the average CPU utilization of CNs is below 60%, the number of CNs decreases for scale-in.
type: Utilization
behavior: # The scaling behavior is customized according to business scenarios, helping you achieve rapid or slow scaling or disable scaling.
policies:
- type: Pods
value: 1
periodSeconds: 10
scaleDown:
selectPolicy: Disabled
The following table describes a few important fields:
- The upper and lower elastic scaling limit.
maxReplicas: 10 # The maximum number of CNs is set to 10.
minReplicas: 1 # The minimum number of CNs is set to 1.
- The elastic scaling threshold.
# For example, the average CPU utilization of CNs is specified as a resource metric.
# The elastic scaling threshold is 60%.
# When the average CPU utilization of CNs exceeds 60%, the number of CNs increases for scale-out.
# When the average CPU utilization of CNs is below 60%, the number of CNs decreases for scale-in.
- type: Resource
resource:
name: cpu
target:
averageUtilization: 60