- Introduction
- Quick Start
- Table Design
- Data Loading
- Data Export
- Using StarRocks
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- DROP FILE
- INSTALL PLUGIN
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW TABLE STATUS
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- BACKUP
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- RECOVER
- RESTORE
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SELECT
- SHOW ALTER
- SHOW BACKUP
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Data Type
- Auxiliary Commands
- Function Reference
- Date Functions
- Geographic Functions
- String Functions
- Aggregation Functions
- Bitmap Functions
- Array Functions
- cast function
- hash function
- Crytographic Functions
- Math Functions
- Utility Functions
- System variables
- Error code
- System limits
- SQL Reference
- Administration
- FAQ
- Deployment
- Data Migration
- SQL
- Others FAQs
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
BITMAP
Here is a simple example to illustrate the usage of several aggregate functions in Bitmap. For detailed function definitions or more Bitmap functions, see bitmap-functions.
Create table
The aggregation model is needed when creating table. The data type is bitmap and the aggregation function is bitmap_union.
CREATE TABLE `pv_bitmap` (
`dt` int(11) NULL COMMENT "",
`page` varchar(10) NULL COMMENT "",
`user_id` bitmap BITMAP_UNION NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `page`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`) BUCKETS 2;
Note: With an amount of data, you'd better create a rollup table corresponding to high-frequent bitmap_union.
ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);
Data Load
TO_BITMAP (expr)
: Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap
BITMAP_EMPTY ()
: Generate empty bitmap columns, used for the default value to be filled in when inserting or inputting
BITMAP_HASH (expr)
: Convert columns of any type to a bitmap by hashing
Stream Load
When inputting data using Stream Load, you can convert the data to a BItmap field as follows:
cat data | curl --location-trusted -u user:passwd -T - \
-H "columns: dt,page,user_id, user_id=to_bitmap(user_id)" \
http://host:8410/api/test/testDb/_stream_load
cat data | curl --location-trusted -u user:passwd -T - \
-H "columns: dt,page,user_id, user_id=bitmap_hash(user_id)" \
http://host:8410/api/test/testDb/_stream_load
cat data | curl --location-trusted -u user:passwd -T - \
-H "columns: dt,page,user_id, user_id=bitmap_empty()" \
http://host:8410/api/test/testDb/_stream_load
Insert Into
When inputting data using Insert Into, you need to select the corresponding mode based on the type of columns in the source table.
- id2's column type in source table is bitmap
insert into bitmap_table1
select id, id2 from bitmap_table2;
- id2's column type in target table is bitmap
insert into bitmap_table1 (id, id2)
values (1001, to_bitmap(1000))
, (1001, to_bitmap(2000));
- id2's column type in source table is bitmap, and is the result of aggregation using bit_map_union().
insert into bitmap_table1
select id, bitmap_union(id2) from bitmap_table2 group by id;
- id2's column type in source table is int, and the bitmap type is generated by to_bitmap().
insert into bitmap_table1
select id, to_bitmap(id2) from table;
- id2's column type in source table is String, and the bitmap type is generated by bitmap_hash().
insert into bitmap_table1
select id, bitmap_hash(id2) from table;
Data Query
Syntax
`BITMAP_UNION (expr)
: Calculate the union of the input Bitmaps, and returns the new Bitmap.
BITMAP_UNION_COUNT (expr)
: Calculate the union of the input Bitmaps, and returns its cardinality, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, for its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)).
BITMAP_UNION_INT (expr)
: Calculate the number of different values in columns of type TINYINT, SMALLINT and INT, return the value same as COUNT (DISTINCT expr).
INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ...)
: Calculate the cardinality of the intersection of multiple bitmaps that satisfy filter_column condition. bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values is a list of dimension values.
BITMAP_INTERSECT(expr)
: Calculate the intersection of this group of bitmap values and returns a new bitmap.
Example
The following SQL uses the pv_bitmap table above as an example:
Calculate the deduplicated value for user_id:
select bitmap_union_count(user_id)
from pv_bitmap;
select bitmap_count(bitmap_union(user_id))
from pv_bitmap;
Calculate the deduplicated value of id:
select bitmap_union_int(id)
from pv_bitmap;
Calculate the retention of user_id:
select intersect_count(user_id, page, 'meituan') as meituan_uv,
intersect_count(user_id, page, 'waimai') as waimai_uv,
intersect_count(user_id, page, 'meituan', 'waimai') as retention -- 在 'meituan' 和 'waimai' 两个页面都出现的用户数
from pv_bitmap
where page in ('meituan', 'waimai');
keyword
BITMAP,BITMAP_COUNT,BITMAP_EMPTY,BITMAP_UNION,BITMAP_UNION_INT,TO_BITMAP,BITMAP_UNION_COUNT,INTERSECT_COUNT,BITMAP_INTERSECT