Skip to main content
Version: Latest-3.4

SQL Digest

This topic introduces the SQL Digest feature of StarRocks. This feature is supported from v3.3.6 onwards.

Overview

SQL Digest is a fingerprint generated by historical SQL statements with parameters removed. It helps cluster SQL statements with the same structure but different parameters.

Common use cases of SQL Digest include:

  • Finding other SQL statements with the same structure but different parameters in query history
  • Tracking execution frequency, cumulative time, and other statistics of SQL with the same structure
  • Identifying the most time-consuming SQL patterns in the system

In StarRocks, SQL Digests are mainly recorded through audit logs fe.audit.log. For example, execute the following two SQL statements:

SELECT count(*) FROM lineorder WHERE lo_orderdate > '19920101';
SELECT count(*) FROM lineorder WHERE lo_orderdate > '19920202';

Two same Digest will be generated in fe.audit.log:

Digest=f58bb71850d112014f773717830e7f77
Digest=f58bb71850d112014f773717830e7f77

Usage

Prerequisites

To enable this feature, you must set the FE configuration item enable_sql_digest to true.

Execute the following statement to enable it dynamically:

ADMIN SET FRONTEND CONFIG ('enable_sql_digest'='true');

To enable it permanently, you must add enable_sql_digest = true to the FE configuration file fe.conf and restart FE.

After enabling this feature, you can install the AuditLoader plugin to perform statistical analysis on SQL statements.

Find similar SQL

SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__ 
WHERE digest = '<Digest>'
LIMIT 1;

Track daily execution count and time of similar SQL

SELECT 
date_trunc('day', `timestamp`) query_date,
count(*),
sum(queryTime),
sum(scanRows),
sum(cpuCostNs),
sum(memCostBytes)
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE digest = '<Digest>'
GROUP BY query_date
ORDER BY query_date
DESC LIMIT 30;

Calculate average execution time of similar SQL

SELECT avg(queryTime), min(queryTime), max(queryTime), stddev(queryTime)
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE digest = '<Digest>';

Aggregate similar SQL to analyze the most time-consuming pattern

WITH top_sql AS (
SELECT digest, sum(queryTime)
FROM starrocks_audit_db__.starrocks_audit_tbl__
GROUP BY digest
ORDER BY sum(queryTime)
DESC LIMIT 10
)
SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE digest IN (SELECT digest FROM top_sql);

Parameter normalization rules

  • Constant values in SQL will be normalized. For example, similar SQL statements with WHERE a = 1 and WHERE a = 2 will have the same Digest.
  • IN predicates will be normalized. For example, similar SQL statements with IN (1,2,3) and IN (1,2) will have the same Digest.
  • LIMIT N clauses will be normalized. For example, similar SQL statements with LIMIT 10 and LIMIT 30 will have the same Digest.