Edit

System variables

Variable Settings and Viewing

This section describes the variables supported by StarRocks. You can view these variables by using the SHOW VARIABLES command. These variables can take effect globally on the entire system or only on the current session.

The variables in StarRocks refer to the variable sets in MySQL, but some variables are only compatible with the MySQL client protocol and do not function on the MySQL database.

View variables

You can view all or some variables by using SHOW VARIABLES [LIKE 'xxx']. Example:

SHOW VARIABLES;

SHOW VARIABLES LIKE '%time_zone%';

Set variables

Variables can generally be set to take effect globally or only on the current session. When set to global, a new value will be used in subsequent new sessions without affecting the current session. When set to "current session only", the variable will only take effect on the current session.

A variable set by SET var_name=xxx; only takes effect for the current session. For example:

SET query_mem_limit = 137438953472;

SET forward_to_master = true;

SET time_zone = "Asia/Shanghai";

A variable set by the SET GLOBAL var_name=xxx; statement takes effect globally. For example:

SET GLOBAL query_mem_limit = 137438953472;

Note: Only ADMIN users can set variables to be globally effective. Globally effective variables do not affect the current session, only subsequent new sessions.

Variables that can be set both globally or partially effective include:

  • batch_size
  • disable_streaming_preaggregations
  • query_mem_limit
  • force_streaming_aggregate
  • enable_profile
  • hash_join_push_down_right_table
  • parallel_fragment_exec_instance_num
  • parallel_exchange_instance_num
  • prefer_compute_node
  • query_timeout
  • sql_mode
  • time_zone
  • use_compute_nodes
  • vectorized_engine_enable (deprecated from v2.4 onwards)
  • wait_timeout
  • sql_dialect

Variables that can only be set globally effective include:

  • default_rowset_type

In addition, variable settings also support constant expressions, such as:

SET query_mem_limit = 10 * 1024 * 1024 * 1024;
SET forward_to_master = concat('tr', 'u', 'e');

Set variables in a single query statement

In some scenarios, you may need to set variables specifically for certain queries. By using the SET_VAR hint, you can set session variables that will take effect only within a single statement. Example:

SELECT /*+ SET_VAR(query_mem_limit = 8589934592) */ name FROM people ORDER BY name;

SELECT /*+ SET_VAR(query_timeout = 1) */ sleep(3);

NOTE

  • SET_VAR is supported only in MySQL 8.0 and later.
  • It can only be placed after the SELECT keyword and enclosed in /*+...*/.

You can also set multiple variables in a single statement. Example:

SELECT /*+ SET_VAR
  (
  exec_mem_limit = 515396075520,
  query_timeout=10000000,
  batch_size=4096,
  parallel_fragment_exec_instance_num=32
  )
  */ * FROM TABLE;

Descriptions of variables

  • SQL_AUTO_IS_NULL

    Used for compatibility with the JDBC connection pool C3P0. No practical usage.

  • auto_increment_increment

    Used for MySQL client compatibility. No practical usage.

  • autocommit

    Used for MySQL client compatibility. No practical usage.

  • batch_size

    Used to specify the number of rows of a single packet transmitted by each node during query execution. The default is 1024, i.e., every 1024 rows of data generated by the source node is packaged and sent to the destination node. A larger number of rows will improve the query throughput in large data volume scenarios, but may increase the query latency in small data volume scenarios. Also, it may increase the memory overhead of the query. We recommend to set batch_size between 1024 to 4096.

  • default_rowset_type

    Global variable. Used to set the default storage format used by the storage engine of the computing node. The currently supported storage formats are alpha and beta.

  • default_table_compression

    Set the default compression algorithm for table storage, supported compression algorithms are: snappy, lz4, zlib, zstd.

  • disable_colocate_join

    Used to control whether the Colocation Join is enabled. The default value is false, meaning the feature is enabled. When this feature is disabled, query planning will not attempt to execute Colocation Join.

  • streaming_preaggregation_mode

    Used to specify the preaggregation mode for the first phase of GROUP BY. If the preaggregation effect in the first phase is not satisfactory, you can use the streaming mode, which performs simple data serialization before streaming data to the destination. Valid values:

    • auto: The system first tries local preaggregation. If the effect is not satisfactory, it switches to the streaming mode. This is the default value.
    • force_preaggregation: The system directly performs local preaggregation.
    • force_streaming: The system directly performs streaming.
  • disable_streaming_preaggregations

    Used to enable the streaming pre-aggregations. The default value is false, meaning it is enabled.

  • div_precision_increment

    Used for MySQL client compatibility. No practical usage.

  • enable_insert_strict

    Used to enable the strict mode when loading data using the INSERT statement. The default value is true, indicating the strict mode is enabled by default. For more information, see Strict mode.

  • enable_spilling

    Used to enable large data volume drop sorting. The default value is false, meaning it is not enabled. It is enabled when the user does not specify a LIMIT condition in the ORDER BY clause and sets enable_spilling to true. When enabled, the BE data directory starrocks-scratch/ is used to store temporary spilling data that will be cleared after the query is completed.

    This function is mainly used for sorting operations with large data volume using limited memory.

  • event_scheduler

    Used for MySQL client compatibility. No practical usage.

  • force_streaming_aggregate

    Used to control whether the aggregation node enables streaming aggregation for computing. The default value is false, meaning the feature is not enabled.

  • forward_to_master

    Used to specify whether some commands will be forwarded to the leader FE for execution. The default value is false, meaning not forwarding to the leader FE. There are multiple FEs in a StarRocks cluster, one of which is the leader FE. Normally, users can connect to any FE for full-featured operations. However, some information is only available on the leader FE.

    For example, if the SHOW BACKENDS command is not forwarded to the leader FE, only basic information (for example, whether the node is alive) can be viewed. Forwarding to the leader FE can get more detailed information including the node start time and last heartbeat time.

    The commands affected by this parameter are as follows:

    • SHOW FRONTENDS: Forwarding to the leader FE allows users to view the last heartbeat message.

    • SHOW BACKENDS: Forwarding to the leader FE allows users to view the boot time, last heartbeat information, and disk capacity information.

    • SHOW BROKER: Forwarding to the leader FE allows users to view the boot time and last heartbeat information.

    • SHOW TABLET

    • ADMIN SHOW REPLICA DISTRIBUTION

    • ADMIN SHOW REPLICA STATUS: Forwarding to the leader FE allows users to view the tablet information stored in the metadata of the leader FE. Normally, the tablet information should be the same in the metadata of different FEs. If an error occurs, you can use this method to compare the metadata of the current FE and the leader FE.

    • Show PROC: Forwarding to the leader FE allows users to view the PROC information stored in the metadata. This is mainly used for metadata comparison.

  • hash_join_push_down_right_table

    Used to control whether the data of the left table can be filtered by using the filter condition against the right table in the Join query. If so, it can reduce the amount of data that needs to be processed during the query.

    true indicates the operation is allowed and the system decides whether the left table can be filtered. false indicates the operation is disabled. The default value is true.

  • init_connect

    Used for MySQL client compatibility. No practical usage.

  • interactive_timeout

    Used for MySQL client compatibility. No practical usage.

  • enable_materialized_view_union_rewrite (2.5 and later)

    Boolean value to control whether to enable materialized view Union query rewrite. Default: true.

  • enable_rule_based_materialized_view_rewrite (2.5 and later)

    Boolean value to control whether to enable rule-based materialized view query rewrite. This variable is mainly used in single-table query rewrite. Default: true.

  • enable_profile

    Specifies whether to send the profile of a query for analysis. The default value is false, which means no profile is required.

    By default, a profile is sent to the FE only when a query error occurs in the BE. Profile sending causes network overhead and therefore affects high concurrency.

    If you need to analyze the profile of a query, you can set this variable to true. After the query is completed, the profile can be viewed on the web page of the currently connected FE (address: fe_host:fe_http_port/query). This page displays the profiles of the latest 100 queries with enable_profile turned on.

  • enable_query_queue_load

    Boolean value to enable query queues for loading tasks. Default: false.

  • enable_query_queue_select

    Boolean value to enable query queues for SELECT queries. Default: false.

  • enable_query_queue_statistic

    Boolean value to enable query queues for statistics queries.

  • enable_scan_block_cache (2.5 and later)

    Specifies whether to enable the Data Cache feature. After this feature is enabled, StarRocks caches hot data read from external storage systems into blocks, which accelerates queries and analysis. For more information, see Data Cache.

  • enable_populate_block_cache (2.5 and later)

    Specifies whether to cache data blocks read from external storage systems in StarRocks. If you do not want to cache data blocks read from external storage systems, set this variable to false. Default value: true. This variable is supported from 2.5.

  • enable_query_cache (2.5 and later)

    Specifies whether to enable the Query Cache feature. Valid values: true and false. true specifies to enable this feature, and false specifies to disable this feature. When this feature is enabled, it works only for queries that meet the conditions specified in the application scenarios of Query Cache.

  • query_cache_entry_max_bytes (2.5 and later)

    The threshold for triggering the Passthrough mode. Valid values: 0 to 9223372036854775807. When the number of bytes or rows from the computation results of a specific tablet accessed by a query exceeds the threshold specified by the query_cache_entry_max_bytes or query_cache_entry_max_rows parameter, the query is switched to Passthrough mode.

  • group_concat_max_len

    Used for compatibility with MySQL. No practical usage. Default value: 65535.

  • query_cache_entry_max_rows (2.5 and later)

    The upper limit of rows that can be cached. See the description in query_cache_entry_max_bytes. Default value: 409600.

  • query_cache_agg_cardinality_limit (2.5 and later)

    The upper limit of cardinality for GROUP BY in Query Cache. Query Cache is not enabled if the rows generated by GROUP BY exceeds this value. Default value: 5000000. If the query_cache_entry_max_bytes or query_cache_entry_max_rows parameter is set to 0, the Passthrough mode is used even when no computation results are generated from the involved tablets.

  • enable_adaptive_sink_dop (2.5 and later)

    Specifies whether to enable adaptive parallelism for data loading. After this feature is enabled, the system automatically sets load parallelism for INSERT INTO and Broker Load jobs, which is equivalent to the mechanism of pipeline_dop. For a newly deployed v2.5 StarRocks cluster, the value is true by default. For a v2.5 cluster upgraded from v2.4, the value is false.

  • enable_pipeline_engine

    Specifies whether to enable the pipeline execution engine. true indicates enabled and false indicates the opposite. Default value: true.

  • pipeline_dop

    The parallelism of a pipeline instance, which is used to adjust the query concurrency. Default value: 0, indicating the system automatically adjusts the parallelism of each pipeline instance. You can also set this parameter to a value greater than 0. Generally, set the value to half the number of physical CPU cores.

    From v3.0 onwards, StarRocks adaptively adjusts this parameter based on query parallelism.

  • enable_sort_aggregate (2.5 and later)

    Specifies whether to enable sorted streaming. true indicates sorted streaming is enabled to sort data in data streams.

  • language

    Used for MySQL client compatibility. No practical usage.

  • license

    Displays the license of StarRocks.

  • load_mem_limit

    Specifies the memory limit for the import operation. The default value is 0, meaning that this variable is not used and query_mem_limit is used instead.

    This variable is only used for the INSERT operation which involves both query and import. If the user does not set this variable, the memory limit for both query and import will be set as exec_mem_limit. Otherwise, the memory limit for query will be set as exec_mem_limit and the memory limit for import will be as load_mem_limit.

    Other import methods such as BROKER LOAD, STREAM LOAD still use exec_mem_limit for memory limit.

  • lower_case_table_names

    Used for MySQL client compatibility. No practical usage. Table names in StarRocks are case-sensitive.

  • max_allowed_packet

    Used for compatibility with the JDBC connection pool C3P0. This parameter specifies the maximum size of packets that can be transmitted between the client and server. Default value: 32 MB. You can raise this value if the client reports "PacketTooBigException".

  • max_pushdown_conditions_per_column

    This variable is set to -1 by default, indicating that the value configured in be.conf is used. If this variable is set to be greater than 0, queries in the current session will use this value and ignore the configured value in be.conf.

  • nested_mv_rewrite_max_level

    The maximum levels of nested materialized views that can be used for query rewrite. Type: INT. Range: [1, +∞). The value of 1 indicates that only materialized views created on base tables can be used for query rewrite. Default: 3.

  • net_buffer_length

    Used for MySQL client compatibility. No practical usage.

  • net_read_timeout

    Used for MySQL client compatibility. No practical usage.

  • net_write_timeout

    Used for MySQL client compatibility. No practical usage.

  • parallel_exchange_instance_num

    Used to set the number of exchange nodes that an upper-level node uses to receive data from a lower-level node in the execution plan. The default value is -1, meaning the number of exchange nodes is equal to the number of execution instances of the lower-level node. When this variable is set to be greater than 0 but smaller than the number of execution instances of the lower-level node, the number of exchange nodes equals the set value.

    In a distributed query execution plan, the upper-level node usually has one or more exchange nodes to receive data from the execution instances of the lower-level node on different BEs. Usually the number of exchange nodes is equal to the number of execution instances of the lower-level node.

    In some aggregation query scenarios where the amount of data decreases drastically after aggregation, you can try to modify this variable to a smaller value to reduce the resource overhead. An example would be running aggregation queries using the Duplicate Key table.

  • parallel_fragment_exec_instance_num

    Used to set the number of instances used to scan nodes on each BE. The default value is 1.

    A query plan typically produces a set of scan ranges. This data is distributed across multiple BE nodes. A BE node will have one or more scan ranges, and by default, each BE node's set of scan ranges is processed by only one execution instance. When machine resources suffice, you can increase this variable to allow more execution instances to process a scan range simultaneously for efficiency purposes.

    The number of scan instances determines the number of other execution nodes in the upper level, such as aggregation nodes and join nodes. Therefore, it increases the concurrency of the entire query plan execution. Modifying this parameter will help improve efficiency, but larger values will consume more machine resources, such as CPU, memory, and disk IO.

  • performance_schema

    Used for compatibility with MySQL JDBC versions 8.0.16 and above. No practical usage.

  • prefer_compute_node

    Specifies whether the FEs distribute query execution plans to CN nodes. Valid values:

    • true: indicates that the FEs distribute query execution plans to CN nodes.
    • false: indicates that the FEs do not distribute query execution plans to CN nodes.
  • query_cache_size

    Used for MySQL client compatibility. No practical use.

  • query_cache_type

    Used for compatibility with JDBC connection pool C3P0. No practical use.

  • query_mem_limit

    Used to set the memory limit of a query on each backend node. The default value is 0, which means no limit for it. Units including B/K/KB/M/MB/G/GB/T/TB/P/PB are supported.

    When the Memory Exceed Limit error happens, you could try to increase this parameter.

  • query_queue_concurrency_limit

    The upper limit of concurrent queries on a BE. It takes effect only after being set greater than 0. Default: 0.

  • query_queue_cpu_used_permille_limit

    The upper limit of CPU usage permille (CPU usage * 1000) on a BE. It takes effect only after being set greater than 0. Default: 0. Range: [0, 1000]

  • query_queue_max_queued_queries

    The upper limit of queries in a queue. When this threshold is reached, incoming queries are rejected. It takes effect only after being set greater than 0. Default: 0.

  • query_queue_mem_used_pct_limit

    The upper limit of memory usage percentage on a BE. It takes effect only after being set greater than 0. Default: 0. Range: [0, 1]

  • query_queue_pending_timeout_second

    The maximum timeout of a pending query in a queue. When this threshold is reached, the corresponding query is rejected. Unit: second. Default: 300.

  • query_timeout

    Used to set the query timeout in "seconds". This variable will act on all query statements in the current connection, as well as INSERT statements. The default value is 300 seconds. Value range: [1, 259200].

  • rewrite_count_distinct_to_bitmap_hll

    Used to decide whether to rewrite count distinct queries to bitmap_union_count and hll_union_agg.

  • sql_mode

    Used to specify the SQL mode to accommodate certain SQL dialects.

  • sql_safe_updates

    Used for MySQL client compatibility. No practical usage.

  • sql_select_limit

    Used for MySQL client compatibility. No practical usage.

  • storage_engine

    The types of engines supported by StarRocks:

    • olap: StarRocks system-owned engine.
    • mysql: Use MySQL external tables.
    • broker: Access external tables through a broker program.
    • elasticsearch or es: Use Elasticsearch external tables.
    • hive: Use Hive external tables.
  • system_time_zone

    Used to display the time zone of the current system. Cannot be changed.

  • time_zone

    Used to set the time zone of the current session. The time zone can affect the results of certain time functions.

  • tx_isolation

    Used for MySQL client compatibility. No practical usage.

  • use_compute_nodes

    The maximum number of CN nodes that can be used. This parameter is valid when prefer_compute_node=true. Valid values:

    • -1: indicates that all CN nodes are used.
    • 0: indicates that no CN nodes are used.
  • use_v2_rollup

    Used to control the query to fetch data using the rollup index of the segment v2 storage format. This variable is used for validation when going online with segment v2. It is not recommended for other cases.

  • vectorized_engine_enable (deprecated from v2.4 onwards)

    Used to control whether the vectorized engine is used to execute queries. A value of true indicates that the vectorized engine is used, otherwise the non-vectorized engine is used. The default is true. This feature is enabled by default from v2.4 onwards and therefore, is deprecated.

  • version

    Used for MySQL client compatibility. No practical usage.

  • version_comment

    Used to display the version of StarRocks. Cannot be changed.

  • wait_timeout

    Used to set the connection timeout for idle connections. When an idle connection does not interact with StarRocks for that length of time, StarRocks will actively disconnect the link. The default value is 8 hours, in seconds.

  • enable_global_runtime_filter

    Whether to enable global runtime filter (RF for short). RF filters data at runtime. Data filtering often occurs in the Join stage. During multi-table joins, optimizations such as predicate pushdown are used to filter data, in order to reduce the number of scanned rows for Join and the I/O in the Shuffle stage, thereby speeding up the query.

    StarRocks offers two types of RF: Local RF and Global RF. Local RF is suitable for Broadcast Hash Join and Global RF is suitable for Shuffle Join.

    Default value: true, which means global RF is enabled. If this feature is disabled, global RF does not take effect. Local RF can still work.

  • enable_multicolumn_global_runtime_filter

    Whether to enable multi-column global runtime filter. Default value: false, which means multi-column global RF is disabled.

    If a Join (other than Broadcast Join and Replicated Join) has multiple equi-join conditions:

    • If this feature is disabled, only Local RF works.
    • If this feature is enabled, multi-column Global RF takes effect and carries multi-column in the partition by clause.
  • runtime_filter_on_exchange_node

    Whether to place GRF on Exchange Node after GRF is pushed down across the Exchange operator to a lower-level operator. The default value is false, which means GRF will not be placed on Exchange Node after it is pushed down across the Exchange operator to a lower-level operator. This prevents repetitive use of GRF and reduces the computation time.

    However, GRF delivery is a "try-best" process. If the lower-level operator fails to receive the GRF but the GRF is not placed on Exchange Node, data cannot be filtered, which compromises filter performance. true means GRF will still be placed on Exchange Node even after it is pushed down across the Exchange operator to a lower-level operator.

  • runtime_join_filter_push_down_limit

    The maximum number of rows allowed for the Hash table based on which Bloom filter Local RF is generated. Local RF will not be generated if this value is exceeded. This variable prevents the generation of an excessively long Local RF.

    The value is an integer. Default value: 1024000.

  • sql_dialect (v3.0 and later)

    The SQL dialect that is used. For example, you can run the set sql_dialect = 'trino'; command to set the SQL dialect to Trino, so you can use Trino-specific SQL syntax and functions in your queries.

    NOTICE

    After you configure StarRocks to use the Trino dialect, identifiers in queries are not case-sensitive by default. Therefore, you must specify names in lowercase for your databases and tables at database and table creation. If you specify database and table names in uppercase, queries against these databases and tables will fail.

  • io_tasks_per_scan_operator (v3.0 and later)

    The number of concurrent I/O tasks that can be issued by a scan operator. Increase this value if you want to access remote storage systems such as HDFS or S3 but the latency is high. However, a larger value causes more memory consumption.

    The value is an integer. Default value: 4.

  • range_pruner_max_predicate (v3.0 and later)

    The maximum number of IN predicates that can be used for Range partition pruning. Default value: 100. A value larger than 100 may cause the system to scan all tablets, which compromises the query performance.