- StarRocks
- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Sparkâ„¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Synchronize data from MySQL in real time
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Administration
- Deployment
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE EXTERNAL CATALOG
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Auxiliary Commands
- Data Types
- Keywords
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_sub
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Math Functions
- String Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Deploy
- Data Migration
- SQL
- Query Dump
- Other FAQs
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Deploy and manage CNs on Kubernetes with StarRocks Operator [Preview]
From 2.4 onwards, StarRocks introduces a new type of node, a stateless compute node (CN for short), in addition to FEs and BEs. CNs provide compute services and are responsible for completing part of the execution plan. They don't store or manage data. Multiple CNs consist of a CN cluster. It can be containerized, deployed, and maintained on Kubernetes to achieve auto-scaling. Thus, the StarRocks cluster as a whole can better support data analytics workloads that consume a lot of compute resources, such as data lake analytics.
This topic describes how to use StarRocks operator to deploy a CN cluster on Kubernetes to achieve auto-scaling.
Concepts
Kubernetes
Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.
A container bundles and runs a lightweight and portable executable image that contains software and all of its dependencies.
Operator
Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. StarRocks Operator can deploy StarRocks compute service on cloud. It can simplify operation and maintenance, and realize auto-scaling of computing resources.
Node
Node is the actual resources provider in a Kubernetes cluster and are the place where Pods are scheduled to run. In a production environment, a Kubernetes cluster usually consists of multiple Nodes.
Pod
Pod is the smallest deployable units of computing that you can create and manage in Kubernetes. A Pod (as in a pod of whales or pea pod) is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers.
When a StarRocks Operator is in work, FEs detect compute services through Direct Connect. But StarRocks does not allow nodes in one cluster to share the same IP. So StarRocks Operator only supports a Pod containing a CN and scheduled to run on one Node.
CN
CN is a stateless compute node and is responsible for carrying out part of the execution plan. It doesn't store data. Multiple CNs consist of a CN cluster. It can be containerized, and be deployed and maintained on Kubernetes. StarRocks operator senses a CN cluster's resource load and deploys more or less CNs accordingly to achieve autoscaling.
Principle
Interaction between StarRocks Operator, CN cluster, and StarRocks
StarRocks Operator connects to FE by using FE's IP address and query port and adds CNs to a StarRocks cluster. If you query the data stored in BEs, FE assigns the execution plan to CNs and BEs according to the data distribution and operator types in the execution plan. CNs receive data after BEs shuffle data, perform some operators (e.g. JOIN), and returns the computation results to FE.
Also, to query data from data lake, such as HDFS, AWS S3, FE assigns the execution plan to CNs, and CNs directly access the external data source, perform all the operators, and finally returns the computation results to FE.
Scaling policy of StarRocks Operator
StarRocks Operator senses the CN cluster resource load in a K8s cluster, and automatically deploys more or less CNs to achieve autoscaling according to the configured scaling policy.
Environment preparation
Note:
If you want to quickly get started with this feature, you can create a single-node Kubernetes cluster with Minikube.
Install the GO language development environment on the machine that deploys StarRocks Operator. This is because the GO language is used to compile StarRocks Operator code while the StarRocks Operator image is created.
Prepare Docker images
You need to build Docker images, including StarRocks Operator, CN, and CN Group images, and push the images to a remote Docker repository. When you deploy StarRocks Operator, CN, and CN Group, you will pull images from the remote repository.
Prepare StarRocks Operator image
Download the StarRocks Operator code and save it to the directory $your_code_path/**starrocks**-kubernetes-operator .
cd $your_code_path git clone https://github.com/StarRocks/starrocks-kubernetes-operator
Enter the directory $your_code_path/**starrocks**-kubernetes-operator.
cd starrocks-kubernetes-operator
Create a StarRocks Operator image.
make docker IMG="starrocks-kubernetes-operator:v1.0"
Execute
docker login
and enter your account and password to log in to the remote Docker repository Docker Hub.Note: You need to register an account and create a Docker repository in Docker Hub in advance.
Tag the StarRocks Operator image and push it to the remote Docker repository.
docker tag $operator_image_id $account/repo:tag docker push $account/repo:tag
NOTE
dockerImageId
: StarRocks Operator image ID. To see images ID, execute thedocker images
command.account/repo:tag
: StarRocks Operator image tag, e.g.starrocks/sr-cn-test:operator
.account
is your Docker Hub account,repo
is the Docker repository in Docker Hub, andtag
is the tag you determined for StarRocks Operator image.
Prepare a CN image
Download the StarRocks code from the Github repository.
git clone https://github.com/StarRocks/starrocks
Enter the directory docker.
cd $your_path/starrocks/docker
Compile StarRocks and create a CN image.
./build.sh -b branch-2.4
Tag the CN image and push it to the remote Docker repository.
docker tag $cn_Image_Id $account/repo:tag docker push $account/repo:tag
Prepare a CN Group image
Enter the directory starrocks**-kubernetes-operator**/components.
cd $your_code_path/starrocks-kubernetes-operator/components
Create a CN Group image and push it to the remote Docker repository.
# Build a image. make docker IMG="computenodegroup:v1.0" # Push the image to docker hub. make push IMG="computenodegroup:v1.0"
Deploy StarRocks Operator
Enter the directory starrocks**-kubernetes-operator**/deploy.
cd $your_code_path/starrocks-kubernetes-operator/deploy
Configure manager.yaml. Configure the StarRocks Operator image tag. For example,
starrocks/sr-cn-test:operator.
Execute the following command to deploy StarRocks Operator.
kubectl apply -f starrocks.com_computenodegroups.yaml kubectl apply -f namespace.yaml kubectl apply -f leader_election_role.yaml kubectl apply -f role.yaml kubectl apply -f role_binding.yaml kubectl apply -f leader_election_role_binding.yaml kubectl apply -f service_account.yaml kubectl apply -f manager.yaml
Check the pod status by executing
kubectl get pod -n starrocks
. Check whetherSTATUS
isRunning
.kubectl get pod -n starrocks NAMESPACE NAME READY STATUS RESTARTS AGE starrocks cn-controller-manager-69598d4b48-6qj2p 1/1 Running 0 13h
Deploy a CN cluster
Enter the directory starrocks**-kubernetes-operator/examples/cn**.
cd $your_code_path/starrocks-kubernetes-operator/examples/cn
Configure cn.yaml.
cnImage
: CN image tag in the remote repository. For example,starrocks/sr-cn-test:v3
.componentsImage
: CN Group image tag in the remote repository. For example,starrocks/computenodegroup:v1.0
.<fe_ip>:<fe_query_port>
: any FE's IP address andquery_port
port (defaults to9030
).feInfo: accountSecret: test-secret # Secret of the FE account addresses: # FE addresses - <fe_ip>:<fe_query_port>
Add the
command
configuration: specify the absolute path of start_cn.shell in the CN Group image.
Deploy a CN cluster.
cd examples/cn kubectl apply -f fe-account.yaml kubectl apply -f cn-config.yaml kubectl apply -f cn.yaml
Check the CN cluster status.
$ kubectl get pod -n starrocks # NAMESPACE is default to be starrocks NAMESPACE NAME READY STATUS RESTARTS AGE starrocks cn-controller-manager-69598d4b48-6qj2p 1/1 Running 0 13h starrocks computenodegroup-sample-8-4-21-45-5dcb56ff5-4l522 2/2 Running 0 12m starrocks computenodegroup-sample-8-4-21-45-5dcb56ff5-cfvmj 2/2 Running 0 4m29s starrocks computenodegroup-sample-8-4-21-45-5dcb56ff5-lz5s2 2/2 Running 0 4m23s starrocks computenodegroup-sample-8-4-21-45-5dcb56ff5-s7dwz 2/2 Running 0 12m
After the CN cluster is running successfully, StarRocks Operator uses the FE IP address and query port number configured in the cn.yaml to add CNs into the StarRocks cluster.
Configure policy for horizontal pod autoscaling
If you want to configure policy for horizontal pod autoscaling, you can edit the cn.yaml file
$your_code_path/starrocks-kubernetes-operator/examples/cn/cn.yaml
.autoScalingPolicy: # Auto-scaling policy of CN cluster maxReplicas: 10 # Maximum number of CNs minReplicas: 1 # Minimum number of CNs hpaPolicy: metrics: # The metrics and scaling threshold. For more information, see metricspec. - type: Resource resource: name: memory target: averageUtilization: 30 type: Utilization - type: Resource resource: name: cpu target: averageUtilization: 30 type: Utilization behavior: # Scaling behavior scaleUp: policies: - type: Pods value: 1 periodSeconds: 10 scaleDown: selectPolicy: Disabled
The description for some parameters is as follows:
For more parameters and detailed description, see Horizontal Pod Autoscaling.
Set the maximum and minimum numbers of CNs for horizontal scaling.
# The maximum number of CNs. The upper limit is 10. maxReplicas: 10 # The minimum number of CNs. The lower limit is 1. minReplicas: 1
Set the CPU utilization threshold for horizontal scaling.
- type: Resource resource: name: cpu target: averageUtilization: 30
Validate the auto-scaling policy.
kubectl apply -f cn/cn.yaml
FAQ
Unable to Deploy CN
Execute Kubectl get po -A
to check the status of pods.
- Problem description: If the returned result shows that
reason
isunhealthy
, the HTTP health check fails.
- Solution: Refer to Deploy a CN cluster and check FE's IP address and the query port number in cn.yaml.
Problem description: If
Message
showsexec: "be/bin/start_cn.sh": stat be/bin/start_cn.sh: no such file or directory
, it means getting the startup script fail.Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5m53s default-scheduler Successfully assigned starrocks/computenodegroup-sample-5979687fd-qw28w to ip-172-31-44-58 Normal Pulling 5m51s kubelet Pulling image "adzfolc/computenodegroup:v1.1" Normal Started 5m48s kubelet Started container register Normal Pulled 5m48s kubelet Successfully pulled image "adzfolc/computenodegroup:v1.1" in 2.865218014s Normal Created 5m48s kubelet Created container register Normal Created 5m3s (x4 over 5m51s) kubelet Created container cn-container Warning Failed 5m3s (x4 over 5m51s) kubelet Error: failed to start container "cn-container": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "be/bin/start_cn.sh": stat be/bin/start_cn.sh: no such file or directory: unknown Normal Pulled 4m15s (x5 over 5m51s) kubelet Container image "adzfolc/sr-cn-test:v3" already present on machine Warning BackOff 41s (x27 over 5m46s) kubelet Back-off restarting failed container
Solution: Refer to Deploy a CN cluster and check the path of start_cn.shell in cn.yaml.