- StarRocks
- Introduction to StarRocks
- Quick Start
- Deployment
- Deployment overview
- Prepare
- Deploy
- Deploy classic StarRocks
- Deploy and use shared-data StarRocks
- Manage
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Spark™
- Load data using INSERT
- Load data using Stream Load transaction interface
- Realtime synchronization from MySQL
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Gather CBO statistics
- Synchronous materialized view
- Asynchronous materialized view
- Colocate Join
- Lateral Join
- Query Cache
- Index
- Computing the Number of Distinct Values
- Sorted streaming aggregate
- Administration
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE DATABASE
- CREATE EXTERNAL CATALOG
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SET CATALOG
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE CATALOG
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEWS
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- SUBMIT TASK
- UPDATE
- Auxiliary Commands
- Data Types
- Keywords
- AUTO_INCREMENT
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- array_agg
- avg
- any_value
- approx_count_distinct
- bitmap
- bitmap_agg
- count
- grouping
- grouping_id
- hll_empty
- hll_hash
- hll_raw_agg
- hll_union
- hll_union_agg
- max
- max_by
- min
- multi_distinct_sum
- multi_distinct_count
- percentile_approx
- percentile_cont
- percentile_disc
- retention
- stddev
- stddev_samp
- sum
- variance, variance_pop, var_pop
- var_samp
- window_funnel
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- cardinality
- element_at
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_agg
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Binary Functions
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_add
- hours_diff
- hours_sub
- microseconds_add
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- months_sub
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- week_iso
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- Math Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char
- char_length
- character_length
- concat
- concat_ws
- ends_with
- find_in_set
- group_concat
- hex
- hex_decode_binary
- hex_decode_string
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- parse_url
- repeat
- replace
- reverse
- right
- rpad
- rtrim
- space
- split
- split_part
- starts_with
- strleft
- strright
- substring
- trim
- ucase
- unhex
- upper
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
SQL
This topic provides answers to some frequently asked questions about SQL.
This error "fail to allocate memory." when I build a materialized view
To solve this problem, increase the value of the memory_limitation_per_thread_for_schema_change
parameter in the be.conf file. This parameter refers to the maximum storage that can be allocated for a single task to change the scheme. The default value of the maximum storage is 2 GB.
Does StarRocks support caching query results?
StarRocks does not directly cache final query results. From v2.5 onwards, StarRocks uses the Query Cache feature to save the intermediate results of first-stage aggregation in the cache. New queries that are semantically equivalent to previous queries can reuse the cached computation results to accelerate computations. Query cache uses BE memory. For more information, see Query cache.
When a Null
is included in the calculation, the calculation results of functions are false except for the ISNULL() function
In standard SQL, every calculation that includes an operand with a NULL
value returns a NULL
.
Why is the query result incorrect after I enclose quotation marks around a value of the BIGINT data type for an equivalence query?
Problem description
See the following examples:
select cust_id,idno
from llyt_dev.dwd_mbr_custinfo_dd
where Pt= ‘2021-06-30’
and cust_id = ‘20210129005809043707’
limit 10 offset 0;
+---------------------+-----------------------------------------+
| cust_id | idno |
+---------------------+-----------------------------------------+
| 20210129005809436 | yjdgjwsnfmdhjw294F93kmHCNMX39dw= |
| 20210129005809436 | sdhnswjwijeifme3kmHCNMX39gfgrdw= |
| 20210129005809436 | Tjoedk3js82nswndrf43X39hbggggbw= |
| 20210129005809436 | denuwjaxh73e39592jwshbnjdi22ogw= |
| 20210129005809436 | ckxwmsd2mei3nrunjrihj93dm3ijin2= |
| 20210129005809436 | djm2emdi3mfi3mfu4jro2ji2ndimi3n= |
+---------------------+-----------------------------------------+
select cust_id,idno
from llyt_dev.dwd_mbr_custinfo_dd
where Pt= ‘2021-06-30’
and cust_id = 20210129005809043707
limit 10 offset 0;
+---------------------+-----------------------------------------+
| cust_id | idno |
+---------------------+-----------------------------------------+
| 20210189979989976 | xuywehuhfuhruehfurhghcfCNMX39dw= |
+---------------------+-----------------------------------------+
Solution
When you compare the STRING data type and the INTEGER data type, the fields of these two types are cast to the DOUBLE data type. Therefore, quotation marks cannot be added. Otherwise, the condition defined in the WHERE clause cannot be indexed.
Does StarRocks support the DECODE function?
StarRocks does not support the DECODE function of the Oracle database. StarRocks is compatible with MySQL, so you can use the CASE WHEN statement.
Can the latest data be queried immediately after data is loaded into the Primary Key table of StarRocks?
Yes. StarRocks merges data in a way that references Google Mesa. In StarRocks, a BE triggers the data merge and it has two kinds of compaction to merge data. If the data merge is not completed, it is finished during your query. Therefore, you can read the latest data after data loading.
Do the utf8mb4 characters stored in StarRocks get truncated or appear garbled?
No.
This error "table's state is not normal" occurs when I run the alter table
command
This error occurs because the previous alteration has not been completed. You can run the following code to check the status of the previous alteration:
show tablet from lineitem where State="ALTER";
The time spent on the alteration operation relates to the data volume. In general, the alteration can be completed in minutes. We recommend that you stop loading data into StarRocks while you are altering tables because data loading lowers the speed at which alteration completes.
This error "get partition detail failed: org.apache.doris.common.DdlException: get hive partition meta data failed: java.net.UnknownHostException:hadooptest" occurs when I query the external tables of Apache Hive
This error occurs when the metadata of Apache Hive partitions cannot be obtained. To solve this problem, copy core-sit.xml and hdfs-site.xml to the fe.conf file and the be.conf file.
This error "planner use long time 3000 remaining task num 1" occurs when I query data
This error occurs usually due to a full garbage collection (full GC), which can be checked by using backend monitoring and the fe.gc log. To solve this problem, perform one of the following operations:
- Allows SQL's client to access multiple frontends (FEs) simultaneously to spread the load.
- Change the heap size of Java Virtual Machine (JVM) from 8 GB to 16 GB in the fe.conf file to increase memory and reduce the impact of full GC.
When cardinality of column A is small, the query results of select B from tbl order by A limit 10
vary each time
SQL can only guarantee that column A is ordered, and it cannot guarantee that the order of column B is the same for each query. MySQL can guarantee the order of column A and column B because it is a standalone database.
StarRocks is a distributed database, of which data stored in the underlying table is in a sharding pattern. The data of column A is distributed across multiple machines, so the order of column B returned by multiple machines may be different for each query, resulting in inconsistent order of B each time. To solve this problem, change select B from tbl order by A limit 10
to select B from tbl order by A,B limit 10
.
Why is there a large gap in column efficiency between SELECT * and SELECT?
To solve this problem, check the profile and see MERGE details:
Check whether the aggregation on the storage layer takes up too much time.
Check whether there are too many indicator columns. If so, aggregate hundreds of columns of millions of rows.
MERGE:
- aggr: 26s270ms
- sort: 15s551ms
Does DELETE support nested functions?
Nested functions are not supported, such as to_days(now())
in DELETE from test_new WHERE to_days(now())-to_days(publish_time) >7;
.
How to improve the usage efficiency of a database when there are hundreds of tables in it?
To improve efficiency, add the -A
parameter when you connect to MySQL's client server: mysql -uroot -h127.0.0.1 -P8867 -A
. MySQL's client server does not pre-read database information.
How to reduce the disk space occupied by the BE log and the FE log?
Adjust the log level and corresponding parameters. For more information, see Parameter Configuration.
This error "table *** is colocate table, cannot change replicationNum" occurs when I modify the replication number
When you create colocated tables, you need to set the group
property. Therefore, you cannot modify the replication number for a single table. You can perform the following steps to modify the replication number for all tables in a group:
- Set
group_with
toempty
for all tables in a group. - Set a proper
replication_num
for all tables in a group. - Set
group_with
back to its original value.
Does setting VARCHAR to the maximum value affect storage?
VARCHAR is a variable-length data type, which has a specified length that can be changed based on the actual data length. Specifying a different varchar length when you create a table has little impact on the query performance on the same data.
This error "create partititon timeout" occurs when I truncate a table
To truncate a table, you need to create the corresponding partitions and then swap them. If there are a larger number of partitions that need to be created, this error occurs. In addition, if there are many data load tasks, the lock will be held for a long time during the compaction process. Therefore, the lock cannot be acquired when you create tables. If there are too many data load tasks, set tablet_map_shard_size
to 512
in the be.conf file to reduce the lock contention.
This error "Failed to specify server's Kerberos principal name" occurs when I access external tables of Apache Hive
Add the following information to hdfs-site.xml in the fe.conf file and the be.conf file:
<property>
<name>dfs.namenode.kerberos.principal.pattern</name>
<value>*</value>
</property>
Is "2021-10" a date format in StarRocks?
No.
Can "2021-10" be used as a partition field?
No, use functions to change "2021-10" to "2021-10-01" and then use "2021-10-01" as a partition field.
Where can I query the size of a StarRocks database or table?
You can use the SHOW DATA command.
SHOW DATA;
displays the data size and replicas of all tables in the current database.
SHOW DATA FROM <db_name>.<table_name>;
displays the data size, number of replicas, and number of rows in a specified table of a specified database.
- SQL
- This error "fail to allocate memory." when I build a materialized view
- Does StarRocks support caching query results?
- When a Null is included in the calculation, the calculation results of functions are false except for the ISNULL() function
- Why is the query result incorrect after I enclose quotation marks around a value of the BIGINT data type for an equivalence query?
- Does StarRocks support the DECODE function?
- Can the latest data be queried immediately after data is loaded into the Primary Key table of StarRocks?
- Do the utf8mb4 characters stored in StarRocks get truncated or appear garbled?
- This error "table's state is not normal" occurs when I run the alter table command
- This error "get partition detail failed: org.apache.doris.common.DdlException: get hive partition meta data failed: java.net.UnknownHostException:hadooptest" occurs when I query the external tables of Apache Hive
- This error "planner use long time 3000 remaining task num 1" occurs when I query data
- When cardinality of column A is small, the query results of select B from tbl order by A limit 10 vary each time
- Why is there a large gap in column efficiency between SELECT * and SELECT?
- Does DELETE support nested functions?
- How to improve the usage efficiency of a database when there are hundreds of tables in it?
- How to reduce the disk space occupied by the BE log and the FE log?
- This error "table *** is colocate table, cannot change replicationNum" occurs when I modify the replication number
- Does setting VARCHAR to the maximum value affect storage?
- This error "create partititon timeout" occurs when I truncate a table
- This error "Failed to specify server's Kerberos principal name" occurs when I access external tables of Apache Hive
- Is "2021-10" a date format in StarRocks?
- Can "2021-10" be used as a partition field?
- Where can I query the size of a StarRocks database or table?