Schema Management and Migration with SQLAlchemy and Alembic
This guide introduces how to manage StarRocks schemas using the Python ecosystem β including SQLAlchemy, Alembic, and sqlacodegen β through the starrocks SQLAlchemy dialect. It is designed to help you understand why schema migration is useful and how to use it effectively with StarRocks.
Overviewβ
Many users manage StarRocks tables, views, and materialized views using SQL DDL directly. However, as projects grow, manually maintaining ALTER TABLE statements becomes error-prone and hard to track.
The StarRocks SQLAlchemy dialect (starrocks) provides:
- A full SQLAlchemy model layer for StarRocks tables, views, and materialized views
- Declarative definitions for table schema and table properties (including views and materialized views)
- Integration with Alembic to allow schema changes to be detected and generated automatically
- Compatibility with tools like sqlacodegen for reverse-generating models
This allows Python users to maintain StarRocks schemas in a declarative, version-controlled, and automated way.
Key benefitsβ
Although schema migration is traditionally associated with OLTP databases, it is also valuable in data warehousing systems such as StarRocks. Teams use Alembic together with the StarRocks dialect because of the benefits listed below.
Declarative schema definitionβ
Once you define schema in Python ORM models or SQLAlchemy core style, writing ALTER TABLE statements manually is no longer required.
Automatic diffing and autogenerationβ
Alembic compares current StarRocks schema with your SQLAlchemy models, and generates migration scripts automatically (CREATE/DROP/ALTER).
Reviewable, version-controlled migrationsβ
Each schema change becomes a migration file (Python), so users can track changes and roll back if needed.
Consistent workflow across environmentsβ
Schema changes can be applied to development, staging, and production with the same process.
Installation and Connectionβ
Prerequisites**β
- StarRocks Python client: 1.3.2 or later
SQLAlchemy: 1.4 or later (SQLAlchemy 2.0 is recommended and is required to usesqlacodegen)Alembic: 1.16 or later
Installing StarRocks Python clientβ
Run the following command to install the StarRocks Python client.
pip install starrocks
Connecting to StarRocksβ
Connect to your StarRocks cluster using the following URL.
starrocks://<user>:<password>@<FE_host>:<query_port>/[<catalog>.]<database>
user: Username used to connect to the cluster.password: User password.FE_host: FE IP address.query_port: FEquery_port(Default: 9030).catalog: The name of the catalog where your database locates.database: The name of the database you want to connect.
After installation, you can quickly validate connectivity using the following code example:
from sqlalchemy import create_engine, text
# you need to create `mydatabase` first
engine = create_engine("starrocks://root@localhost:9030/mydatabase")
with engine.connect() as conn:
conn.execute(text("SELECT 1")).fetchall()
print("Connection successful!")
Defining StarRocks Models (Declarative ORM)β
The StarRocks dialect supports:
- Tables
- Views
- Materialized Views
It also supports StarRocks-specific table attributes such as:
ENGINE(OLAP)- Key models (
DUPLICATE KEY,PRIMARY KEY,UNIQUE KEY,AGGREGATE KEY) PARTITION BYvariants (RANGE / LIST / Expression partitioning)DISTRIBUTED BYvariants (HASH / RANDOM)ORDER BY- Table properties (e.g.,
replication_num,storage_medium)
- StarRocks dialect options are passed as keyword arguments prefixed with
starrocks_. - The
starrocks_prefix must be lowercase. The suffix is accepted in either case (for example,PRIMARY_KEYandprimary_key). - If you specify a table key (e.g.
starrocks_primary_key="id"), the involved columns must also be marked withprimary_key=TrueinColumn(...), so that SQLAlchemy metadata and Alembic autogenerate can behave correctly.
Examples below reflect the real public API and parameter names.
Table Exampleβ
StarRocks table options can be specified in both ORM (via __table_args__) and Core (via Table(..., starrocks_...=...)) styles.
ORM (Declarative) styleβ
from sqlalchemy import create_engine
from sqlalchemy.orm import Mapped, declarative_base, mapped_column
from starrocks import INTEGER, STRING
# with the same engine as the quick test
engine = create_engine("starrocks://root@localhost:9030/mydatabase")
Base = declarative_base()
class MyTable(Base):
__tablename__ = 'my_orm_table'
id: Mapped[int] = mapped_column(INTEGER, primary_key=True)
name: Mapped[str] = mapped_column(STRING)
__table_args__ = {
'comment': 'table comment',
'starrocks_primary_key': 'id',
'starrocks_distributed_by': 'HASH(id) BUCKETS 10',
'starrocks_properties': {'replication_num': '1'}
}
# Create the table in the database
Base.metadata.create_all(engine)
Core styleβ
from sqlalchemy import Column, MetaData, Table, create_engine
from starrocks import INTEGER, VARCHAR
# with the same engine as the quick test
engine = create_engine("starrocks://root@localhost:9030/mydatabase")
metadata = MetaData()
my_core_table = Table(
'my_core_table',
metadata,
Column('id', INTEGER, primary_key=True),
Column('name', VARCHAR(50)),
# StarRocks-specific arguments
starrocks_primary_key='id',
starrocks_distributed_by='HASH(id) BUCKETS 10',
starrocks_properties={"replication_num": "1"}
)
# Create the table in the database
metadata.create_all(engine)
For a comprehensive reference of table attributes and data types, see Reference [4].
View Exampleβ
Below is the recommended view definition style, using columns as a list of dicts (name/comment). This example is based on an existing table my_core_table.
from starrocks.schema import View
# Reuse the metadata from the Core table example above
metadata = my_core_table.metadata
user_view = View(
"user_view",
metadata,
definition="SELECT id, name FROM my_core_table WHERE name IS NOT NULL",
columns=[
{"name": "id", "comment": "ID"},
{"name": "name", "comment": "Name"},
],
comment="Active users",
)
For more View options and limitations, see Reference [5].
Materialized View Exampleβ
Materialized views are defined similarly. The starrocks_refresh property is a syntax string that indicates the refresh strategy.
from starrocks.schema import MaterializedView
# Reuse the metadata from the Core table example above
metadata = my_core_table.metadata
# Create a simple Materialized View (asynchronous refresh)
user_stats_ = MaterializedView(
'user_stats_',
metadata,
definition='SELECT id, COUNT(*) AS cnt FROM my_core_table GROUP BY id',
starrocks_refresh='ASYNC'
)
For more options and ALTER limitations, see Reference [6].
Alembic Integrationβ
The StarRocks SQLAlchemy dialect provides full support for:
- Create / Drop table
- Create / Drop view
- Create / Drop materialized view
- Detecting supported changes on StarRocks-specific attributes (for example, table properties and distribution)
This enables Alembicβs autogenerate to work properly.
Initializing Alembicβ
-
Initialize Alembic:
alembic init migrations -
Configure your database URL in
alembic.ini:# alembic.ini
sqlalchemy.url = starrocks://<user>:<password>@<FE_host>:<query_port>/[<catalog>.]<database> -
Enable StarRocks dialect logging (optional):
You can enable the
starrockslogger inalembic.inito observe the detected changes of a table via logs. For details, see Reference [2].Edit
env.py(configure both offline and online paths):from alembic import context
from starrocks.alembic import render_column_type, include_object_for_view_
from starrocks.alembic.starrocks import StarRocksImpl # noqa: F401 (ensure impl registered)
from myapp.models import Base # adjust to your project
target_metadata = Base.metadata
def run_migrations_offline() -> None:
url = context.config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
render_item=render_column_type,
include_object=include_object_for_view_
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online() -> None:
# ... create engine and connect as in alembic default env.py ...
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
render_item=render_column_type,
include_object=include_object_for_view_
)
with context.begin_transaction():
context.run_migrations()
Generating migrations automaticallyβ
alembic revision --autogenerate -m "initial schema"
Alembic will compare SQLAlchemy models with the actual StarRocks schema, and output the correct DDL.
Applying migrationsβ
alembic upgrade head
Downgrade is also supported (where reversible).
StarRocks DDL is not transactional across multiple statements. If an upgrade fails midway, you may need to inspect what has already been applied and perform manual remediation (for example, write a compensating migration or run manual DDL) before re-running.
Supported Schema Change Operationsβ
The dialect supports Alembic autogenerate for:
- Tables: create / drop, and diffing of StarRocks-specific attributes declared via
starrocks_*(within StarRocks ALTER support) - Views: create / drop / alter (mainly definition-related changes; some attributes are immutable)
- Materialized Views: create / drop / alter (limited to mutable clauses such as refresh strategies and properties)
Some StarRocks DDL changes are not reversible or not alterable. You can only make these changes by dropping and recreate the table/view/materialized view. If you specify these changes in the dialect, autogenerate will warn or raise.
End-to-End Example (Recommended Reading for Beginners)β
This section shows a runnable end-to-end workflow, including where to pause and review generated files.
Step 1. Create a project directory and initialize Alembicβ
mkdir my_sr_alembic_project
cd my_sr_alembic_project
alembic init alembic
Step 2. Configure alembic.iniβ
Edit the URL in alembic.ini:
sqlalchemy.url = starrocks://root@localhost:9030/mydatabase
Step 3. Define your modelsβ
Create a package for your models:
mkdir -p myapp
touch myapp/__init__.py
Create myapp/models.py and put your table/view/materialized view definitions in the package:
When using Alembic migrations, do not call metadata.create_all(engine) in your models module.
from sqlalchemy import Column, Table
from sqlalchemy.orm import Mapped, declarative_base, mapped_column
from starrocks import INTEGER, STRING, VARCHAR
from starrocks.schema import MaterializedView, View
Base = declarative_base()
# --- ORM table ---
class MyOrmTable(Base):
__tablename__ = "my_orm_table"
id: Mapped[int] = mapped_column(INTEGER, primary_key=True)
name: Mapped[str] = mapped_column(STRING)
__table_args__ = {
"comment": "table comment",
"starrocks_primary_key": "id",
"starrocks_distributed_by": "HASH(id) BUCKETS 10",
"starrocks_properties": {"replication_num": "1"},
}
# --- Core table on the same metadata (important for Alembic target_metadata) ---
my_core_table = Table(
"my_core_table",
Base.metadata,
Column("id", INTEGER, primary_key=True),
Column("name", VARCHAR(50)),
comment="core table comment",
starrocks_primary_key="id",
starrocks_distributed_by="HASH(id) BUCKETS 10",
starrocks_properties={"replication_num": "1"},
)
# --- View ---
user_view = View(
"user_view",
Base.metadata,
definition="SELECT id, name FROM my_core_table WHERE name IS NOT NULL",
columns=[
{"name": "id", "comment": "ID"},
{"name": "name", "comment": "Name"},
],
comment="Active users",
)
# --- Materialized View ---
user_stats_mv = MaterializedView(
"user_stats_mv",
Base.metadata,
definition="SELECT id, COUNT(*) AS cnt FROM my_core_table GROUP BY id",
starrocks_refresh="ASYNC",
)
Step 4. Configure env.py for autogenerateβ
Edit alembic/env.py:
- Import
myapp.modelsto set thetarget_metadata. - Import
render_column_type, andinclude_object_for_view_mvto set them in bothrun_migrations_offline()andrun_migrations_online()to properly handle views and MVs, and to properly render StarRocks column types.
You need to add or modify these lines in env.py, rather than replace the generated env.py file.
from alembic import context
from starrocks.alembic import render_column_type, include_object_for_view_mv
from starrocks.alembic.starrocks import StarRocksImpl # noqa: F401
from myapp.models import Base
target_metadata = Base.metadata
# Optional: set version table replication for single-BE dev clusters
version_table_kwargs = {"starrocks_properties": {"replication_num": "1"}}
# In both run_migrations_offline() and run_migrations_online(), ensure:
def run_migrations_offline() -> None:
url = context.config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
render_item=render_column_type,
include_object=include_object_for_view_mv,
version_table_kwargs=version_table_kwargs,
)
def run_migrations_online() -> None:
# ... create engine and connect as in alembic default env.py ...
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
render_item=render_column_type,
include_object=include_object_for_view_mv,
version_table_kwargs=version_table_kwargs,
)
Step 5. Autogenerate the first revisionβ
alembic revision --autogenerate -m "create initial schema"
Pause and review:
- Check the generated migration file under
alembic/versions/. - Ensure it contains the expected operations (for example,
create_table,create_view,create_materialized_view). - Make sure it does not contain unexpected drops or alters.
Step 6. Preview SQL and applyβ
Preview SQL:
alembic upgrade head --sql
Pause and review:
- Confirm the DDL is in the order you expect.
- Identify any potentially heavy operations and consider splitting migrations if needed.
Apply:
alembic upgrade head
StarRocks DDL is not transactional across multiple statements. If an upgrade fails midway, you may need to inspect what has already been applied and perform manual remediation before re-running.
Step 7. Make a change and autogenerate againβ
Update myapp/models.py to:
- Modify an existing table (
my_core_table): add a column, or update the table comment, and change one table property. - Add a new table (
my_new_table).
Adding a column can be a time-consuming schema change. StarRocks allows only one running schema change job per table at a time. In practice, it is recommended to keep βadd/drop/modify columnsβ changes separate from other heavy changes (for example, additional add/drop columns or mass property changes), and split them into multiple Alembic revisions if needed.
from sqlalchemy import Column, Table
from starrocks import INTEGER, VARCHAR
# Modify an existing table (add a column)
# (Update the existing my_core_table definition in-place.)
my_core_table = Table(
"my_core_table",
Base.metadata,
Column("id", INTEGER, primary_key=True),
Column("name", VARCHAR(50)),
Column("age", INTEGER), # added column only
starrocks_primary_key='id',
starrocks_distributed_by='HASH(id) BUCKETS 10',
starrocks_properties={"replication_num": "1"},
)
my_new_table = Table(
"my_new_table",
Base.metadata,
Column("id", INTEGER, primary_key=True),
Column("name", VARCHAR(50)),
starrocks_primary_key="id",
starrocks_distributed_by="HASH(id) BUCKETS 10",
starrocks_properties={"replication_num": "1"},
)
alembic revision --autogenerate -m "add a new table, change a old table"
Pause and review:
Make sure the new migration contains:
- a
create_table(...)formy_new_table, and - expected operations for the
my_core_tablechanges (for example, add column / set comment / set properties).
Preview SQL and apply:
alembic upgrade head --sql
alembic upgrade head
Using sqlacodegenβ
sqlacodegen can reverse-generate SQLAlchemy models directly from StarRocks:
sqlacodegen --options include_dialect_options,keep_dialect_types \
--generator tables \
starrocks://<user>:<password>@<FE_host>:<query_port>/[catalog.]<database> > models.py
Supported objects:
- Tables
- Views
- Materialized views
- Partitioning, distribution, and order-by clauses, and properties
This is useful when onboarding an existing StarRocks schema into Alembic.
You can directly use above command to generate the Python script for tables/views/materialized views defined in the End-to-End Example section.
- It is recommended to add
--generator tableswhen generating Core-style models (ORM generators may reorder columns according toNOT NULL/NULLattribute). - Key columns may be generated as
NOT NULL. If you want them nullable, adjust the generated model manually.
Limitations and Best Practicesβ
- Some StarRocks DDL operations require dropping and recreating the table; autogenerate will warn or raise rather than silently producing unavailable SQL.
- Keys model changes (for example, changing DUPLICATE KEY to PRIMARY KEY) are not supported via
ALTER TABLE; use an explicit plan (usually dropping and recreating with backfill). - StarRocks does not provide transactional DDL across multiple statements; review generated migrations and apply them operationally. If a migration fails midway, you may need to handle rollback manually.
- For distribution, if you omit the
BUCKETSclause, StarRocks may auto-assign bucket count; the dialect is designed to avoid noisy diffs in that case.
Summaryβ
With the StarRocks SQLAlchemy dialect and Alembic integration, you can:
- β Use declarative models to define StarRocks schemas
- β Automatically detect and generate schema migration scripts
- β Use version control for schema evolution
- β Manage views and materialized views declaratively
- β Reverse-engineer existing schemas using sqlacodegen
This brings StarRocks schema management into the modern Python data engineering ecosystem and significantly simplifies cross-environment schema consistency.
Referencesβ
[1]: starrocks-python-client README
[2]: Alembic Integration
[3]: SQLAlchemy details
[4]: Table Support
[5]: View Support