Skip to main content
Version: Latest-3.2

Bitmap indexes

This topic describes how to create and manage a bitmap index, along with usage cases.

A bitmap index is a special database index that uses bitmaps, which are an array of bits. A bit is always in one of two values: 0 and 1. Each bit in the bitmap corresponds to a single row in the table. The value of each bit depends upon the value of the corresponding row.

A bitmap index can help improve the query performance on a given column. If a query hits a sort key column, StarRocks efficiently returns the query result by using the prefix index. However, the prefix index entry for a data block cannot exceed 36 bytes in length. If you want to improve the query performance on a column, which is not used as a sort key, you can create a bitmap index for the column.

Bitmap indexes are generally suitable for columns with high cardinality. Bitmap indexes are a good choice when Bitmap indexes can exhibit high selectivity, and its filtering effect (number of data rows filtered by the Bitmap index/total number of data rows) is lower than one in ten thousand.

To evaluate the performance improvement of Bitmap indexes in StarRocks, queries are performed on a 100 GB SSB dataset in StarRocks. The test results are as follows:

  • Query performance can be significantly improved only when bitmap indexes are created on high-cardinality columns (in this test dataset, a noticeable performance improvement is observed when the cardinality reaches the order of 100,000).
  • The column or columns with high-cardinality can either be a single column with high cardinality or a columns combination with high cardinality.
  • Query performance is essentially not improved and may even decrease when bitmap indexes are created on low-cardinality columns.

Benefits

You can benefit from bitmap indexes in the following aspects:

  • Use less storage space compared to other indexing techniques. Bitmap indexes typically take up only a fraction of the size of the indexed data in a table.
  • Combine multiple bitmap indexes together to fire queries on multiple columns. For more information, see Query multiple columns.

Usage notes

  • You can create a bitmap index for a column that can be filtered by using the equal (=) or [NOT] IN operator.

  • You can create bitmap indexes for all columns of a Primary Key table or Duplicate Key table. For an Aggregate table or Unique Key table, you can only create bitmap indexes for key columns.

  • Bitmap indexes can be created for columns of the following data types:

    • Numeric types: TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DECIMAL, and BOOLEAN.
    • String types: CHAR, STRING, and VARCHAR.
    • Date types: DATE and DATETIME.
    • Other types: HLL.
  • You can check whether a query uses bitmap indexes by viewing the BitmapIndexFilterRows field of the query's profile.

Create a bitmap index

There are two ways to create a bitmap index for a column.

  • Create a bitmap index for a column when you create a table. Example:

    CREATE TABLE d0.table_hash
    (
    k1 TINYINT,
    k2 DECIMAL(10, 2) DEFAULT "10.5",
    v1 CHAR(10) REPLACE,
    v2 INT SUM,
    INDEX index_name (column_name [, ...]) [USING BITMAP] [COMMENT '']
    )
    ENGINE = olap
    AGGREGATE KEY(k1, k2)
    DISTRIBUTED BY HASH(k1)
    PROPERTIES ("storage_type" = "column");

    The following table describes the parameters related to the bitmap index.

    ParameterRequiredDescription
    index_nameYesThe name of the bitmap index. For the naming conventions, see System limits. The name of bitmap index must be unique in a table.
    column_nameYesThe name of the column on which a bitmap index is created. You can specify multiple column names to create bitmap indexes for multiple columns at a time. Separate multiple columns with commas (,).
    COMMENTNoThe comment of the bitmap index.

    You can create bitmap indexes for multiple columns at a time by specifying multiple INDEX index_name (column_name [, ...]) [USING BITMAP] [COMMENT ''] commands. These commands need to be separated with commas (,). For other parameter descriptions of the CREATE TABLE statement, see CREATE TABLE.

  • Create a bitmap index for a column of a table using the CREATE INDEX statement. For parameter descriptions and examples, see CREATE INDEX.

    CREATE INDEX index_name ON table_name (column_name) [USING BITMAP] [COMMENT ''];

Display bitmap indexes

You can view all bitmap indexes created in a table using the SHOW INDEX statement. For parameter descriptions and examples, see SHOW INDEX.

SHOW { INDEX[ES] | KEY[S] } FROM [db_name.]table_name [FROM db_name];

Note

Creating indexes is an asynchronous process. Therefore, you can only see the indexes that have completed the creation process.

Delete a bitmap index

You can delete a bitmap index from a table using the DROP INDEX statement. For parameter descriptions and examples, see DROP INDEX.

DROP INDEX index_name ON [db_name.]table_name;

Usage cases

For example, the following table employee shows a portion of a company's employee information.

IDGenderPositionIncome_level
01femaleDeveloperlevel_1
02femaleAnalystlevel_2
03femaleSalesmanlevel_1
04maleAccountantlevel_3

Query a single column

For example, if you want to improve the query performance on the Gender column, you can create a bitmap index for the column by using the following statement.

CREATE INDEX index1 ON employee (Gender) USING BITMAP COMMENT 'index1';

After you execute the preceding statement, the bitmap index is generated as shown in the following figure.

figure

  1. Build a dictionary: StarRocks builds a dictionary for the Gender column and maps female and male to coded values of the INT type: 0 and 1.
  2. Generate bitmaps: StarRocks generates bitmaps for female and male based on the coded values. The bitmap of female is 1110 because female displays in the first three rows. The bitmap of male is 0001 because male only displays in the fourth row.

If you want to find out the male employee in the company, you can send a query as follows.

SELECT xxx FROM employee WHERE Gender = male;

After the query is sent, StarRocks searches for the dictionary to get the coded value of male, which is 1and then gets the bitmap of male, which is 0001. This means that only the fourth row matches the query condition. Then StarRocks will skip the first three rows and read only the fourth row.

Query multiple columns

For example, if you want to improve the query performance on the Gender and Income_levelcolumn, you can create bitmap indexes for these two columns by using the following statements.

  • Gender

    CREATE INDEX index1 ON employee (Gender) USING BITMAP COMMENT 'index1';
  • Income_level

    CREATE INDEX index2 ON employee (Income_level) USING BITMAP COMMENT 'index2';

After you execute the preceding two statements, the bitmap indexes are generated as shown in the following figure.

figure

StarRocks respectively builds a dictionary for the Gender and Income_level columns and then generates bitmaps for the distinct values in these two columns.

  • Gender: The bitmap of female is 1110 and the bitmap of male is 0001.
  • Producer: The bitmap of level_1 is 1010, the bitmap of level_2 is 0100, and the bitmap of level_3 is 0001.

If you want to find out the female employees whose salary is in the level_1, you can send a query as follows.

 SELECT xxx FROM employee 
WHERE Gender = female AND Income_level = level_1;

After the query is sent, StarRocks search for the dictionaries of Gender and Income_level at the same time to get the following information:

  • The coded value of female is 0 and the bitmap of female is 1110.
  • The coded value of level_1 is 0 and the bitmap of level_1 is 1010.

StarRocks performs a bitwise logical operation 1110 & 1010 based on the AND operator to get the result 1010. According to the result, StarRocks only reads the first row and the third row.