- Introduction
- Quick Start
- Table Design
- Data Loading
- Data Export
- Using StarRocks
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- DROP FILE
- INSTALL PLUGIN
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW TABLE STATUS
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- BACKUP
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- RECOVER
- RESTORE
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SELECT
- SHOW ALTER
- SHOW BACKUP
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Data Type
- Auxiliary Commands
- Function Reference
- Date Functions
- Geographic Functions
- String Functions
- Aggregation Functions
- Bitmap Functions
- Array Functions
- cast function
- hash function
- Crytographic Functions
- Math Functions
- Utility Functions
- System variables
- Error code
- System limits
- SQL Reference
- Administration
- FAQ
- Deployment
- Data Migration
- SQL
- Others FAQs
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Synchronize data from MySQL
This topic describes how to synchronize data from MySQL to StarRocks in seconds by using Flink CDC Connector, flink-starrocks-connector, and StarRocks Migration Tools (SMT).
Before you begin
Before you start data synchronization, enable the binary logging in MySQL and download the following software packages:
Modify the /etc/my.cnf file to enable binary logging in MySQL and then restart MySQL Server (mysqld). You can execute the SHOW VARIABLES LIKE 'log_bin' statement to check whether binary logging is enabled.
#Enable binlog log-bin=/var/lib/mysql/mysql-bin
#log_bin=ON
##Base name for binlog files
#log_bin_basename=/var/lib/mysql/mysql-bin
##Index file to manage all binlog files
#log_bin_index=/var/lib/mysql/mysql-bin.index
#Configure serverid
server-id=1
binlog_format = row
```
- Apache Flink. Only Apache Flink 1.1 and later are supported. We recommend that you download Apache Flink 1.13.
- Flink CDC Connector. To synchronize data from MySQL to StarRocks, download Flink-MySQL-CDC that is used with the version of Apache Flink you downloaded.
- Flink-connector-starrocks. Apache Flink 1.13 uses a different version of flink-connector-starrocks than Apache Flink 1.11 and Apache Flink 1.12.
- SMT
How it works
The preceding figure shows the workflow of data synchronization:
- SMT generates the statements that are used to create the source table and the sink table based on the cluster information and the table schema of MySQL and StarRocks.
- Flink CDC Connector acquires binary log data from MySQL.
- Flink-connector-starrocks loads the data into StarRocks.
Procedure
Perform the following steps to synchronize data:
Duplicate flink-sql-connector-mysql-cdc-xxx.jar and flink-connector-starrocks-xxx.jar to the
flink-xxx/lib/
directory.Decompress SMT and modify the configuration file of SMT:
DB
: modify the value of this parameter to the connection information of MySQL.be_num
: the number of nodes in your StarRocks cluster. This parameter helps you to set the number of buckets more reasonably.[table-rule.1]
: the rule based on which you want to match data. You can match the databases and tables by using regular expressions, thus generate the SQL statements that are used to create tables. You can configure multiple rules.flink.starrocks.*
: the configuration information of your StarRocks cluster. For more information, see Load data by using flink-connector-starrocks.[db] host = 192.168.1.1 port = 3306 user = root password =
[other]
# number of backends in StarRocks
be_num = 3
# `decimal_v3` is supported since StarRocks-1.18.1
use_decimal_v3 = false
# file to save the converted DDL SQL
output_dir = ./result
[table-rule.1]
# pattern to match databases for setting properties
database = ^console_19321.*$
# pattern to match tables for setting properties
table = ^.*$
############################################
### flink sink configurations
### DO NOT set `connector`, `table-name`, `database-name`, they are auto-generated
############################################
flink.starrocks.jdbc-url=jdbc:mysql://192.168.1.1:9030
flink.starrocks.load-url= 192.168.1.1:8030
flink.starrocks.username=root
flink.starrocks.password=
flink.starrocks.sink.properties.column_separator=\x01
flink.starrocks.sink.properties.row_delimiter=\x02
flink.starrocks.sink.buffer-flush.interval-ms=15000
```
Use the statements that are generated by SMT to create StarRocks tables and Apache Flink tables. All generated statements are saved in the result directory.
$./starrocks-migrate-tool $ls result flink-create.1.sql smt.tar.gz starrocks-create.all.sql flink-create.all.sql starrocks-create.1.sql
Create a StarRocks table.
Mysql -hxx.xx.xx.x -P9030 -uroot -p < starrocks-create.1.sql
Create an Apache Flink table and then start a data synchronization task. Once the synchronization task begins, all existing data and subsequent modifications to data are synchronized to StarRocks.
Note: When you perform this step, ensure that the Apache Flink cluster has been started. If not, use the
flink/bin/start-cluster.sh
to start it.bin/sql-client.sh -f flink-create.1.sql
Check the status of the synchronization task. If an error occurs, you can view the task log for the error message and then adjust the memory and the slot in the system configuration of conf/flink-conf.yaml. For more information, see Configuration in Apache Flink.
bin/flink list
Usage note
If you configure multiple match rules, you need to match the database, table, and flink-connector-starrocks for each match rule.
[table-rule.1] # pattern to match databases for setting properties database = ^console_19321.*$ # pattern to match tables for setting properties table = ^.*$
############################################
### flink sink configurations
### DO NOT set `connector`, `table-name`, `database-name`, they are auto-generated
############################################
flink.starrocks.jdbc-url=jdbc:mysql://192.168.1.1:9030
flink.starrocks.load-url= 192.168.1.1:8030
flink.starrocks.username=root
flink.starrocks.password=
flink.starrocks.sink.properties.column_separator=\x01
flink.starrocks.sink.properties.row_delimiter=\x02
flink.starrocks.sink.buffer-flush.interval-ms=15000
[table-rule.2]
# pattern to match databases for setting properties
database = ^database2.*$
# pattern to match tables for setting properties
table = ^.*$
############################################
### flink sink configurations
### DO NOT set `connector`, `table-name`, `database-name`, they are auto-generated
############################################
flink.starrocks.jdbc-url=jdbc:mysql://192.168.1.1:9030
flink.starrocks.load-url= 192.168.1.1:8030
flink.starrocks.username=root
flink.starrocks.password=
# If you cannot select a suitable separator for the loaded data, you can use JSON format by replacing flink.starrocks.sink.properties.column_separator and flink.starrocks.sink.properties.row_delimiter with the following parameters. Note: By doing so, the import performance will be impacted.
flink.starrocks.sink.properties.strip_outer_array=true
flink.starrocks.sink.properties.format=json
```
> Note: If you want to configure more parameters, such as the frequency to load data, see [Load data by using Flink-connector-starrocks](../loading/Flink-connector-starrocks.md) for more information about `sink`.
You can configure an individual match rule for a sharded large table. For example, you have two databases,
edu_db_1
andedu_db_2
, each database contains two tables,course_1
andcourse_2
. In addition, all these tables use the same schema. You can use the following configurations to load the preceding four tables into StarRocks:[table-rule.3] # pattern to match databases for setting properties database = ^edu_db_[0-9]*$ # pattern to match tables for setting properties table = ^course_[0-9]*$
############################################
### flink sink configurations
### DO NOT set `connector`, `table-name`, `database-name`, they are auto-generated
############################################
flink.starrocks.jdbc-url=jdbc:mysql://192.168.1.1:9030
flink.starrocks.load-url= 192.168.1.1:8030
flink.starrocks.username=root
flink.starrocks.password=
flink.starrocks.sink.properties.column_separator=\x01
flink.starrocks.sink.properties.row_delimiter=\x02
flink.starrocks.sink.buffer-flush.interval-ms=5000
```
After the tables are loaded, StarRocks generates a new table named `course_auto_shared`. You can modify the name of the table in the configuration file that is automatically generated.
If you want to create tables and synchronize data by using the command line of SQL Client, you need to escape the
\
(backslash).'sink.properties.column_separator' = '\\x01' 'sink.properties.row_delimiter' = '\\x02'