Skip to main content
Version: Candidate-3.3

Authenticate to Microsoft Azure Storage

From v3.0 onwards, StarRocks can integrate with Microsoft Azure Storage (Azure Blob Storage or Azure Data Lake Storage) in the following scenarios:

  • Batch load data from Azure Storage.
  • Back up data from and restore data to Azure Storage.
  • Query Parquet and ORC files in Azure Storage.
  • Query Hive, Iceberg, Hudi, and Delta Lake tables in Azure Storage.

StarRocks supports the following types of Azure Storage accounts:

  • Azure Blob Storage
  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2

In this topic, Hive catalog, file external table, and Broker Load are used as examples to show how StarRocks integrates with Azure Storage by using these types of Azure Storage accounts. For information about the parameters in the examples, see Hive catalog, File external table, and Broker Load.

Blob Storage

StarRocks supports using one of the following authentication methods to access Blob Storage:

  • Shared Key
  • SAS Token

NOTE

When you load data or directly query files from Blob Storage, you must use the wasb or wasbs protocol to access your data:

  • If your storage account allows access over HTTP, use the wasb protocol and write the file path as wasb://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>.
  • If your storage account allows access over HTTPS, use the wasbs protocol and write the file path as wasbs://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>.

Shared Key

External catalog

Configure azure.blob.storage_account and azure.blob.shared_key as follows in the CREATE EXTERNAL CATALOG statement:

CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.shared_key" = "<blob_storage_account_shared_key>"
);

File external table

Configure azure.blob.storage_account, azure.blob.shared_key, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:

CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "wasb[s]://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.shared_key" = "<blob_storage_account_shared_key>"
);

Broker Load

Configure azure.blob.storage_account, azure.blob.shared_key, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:

LOAD LABEL test_db.label000
(
DATA INFILE("wasb[s]://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>")
INTO TABLE test_ingestion_2
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.shared_key" = "<blob_storage_account_shared_key>"
);

SAS Token

External catalog

Configure azure.blob.storage_account, azure.blob.container, and azure.blob.sas_token as follows in the CREATE EXTERNAL CATALOG statement:

CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.container" = "<blob_container_name>",
"azure.blob.sas_token" = "<blob_storage_account_SAS_token>"
);

File external table

Configure azure.blob.storage_account, azure.blob.container, azure.blob.sas_token, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:

CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "wasb[s]://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.container" = "<blob_container_name>",
"azure.blob.sas_token" = "<blob_storage_account_SAS_token>"
);

Broker load

Configure azure.blob.storage_account, azure.blob.container, azure.blob.sas_token, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:

LOAD LABEL test_db.label000
(
DATA INFILE("wasb[s]://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.container" = "<blob_container_name>",
"azure.blob.sas_token" = "<blob_storage_account_SAS_token>"
);

Data Lake Storage Gen1

StarRocks supports using one of the following authentication methods to access Data Lake Storage Gen1:

  • Managed Service Identity
  • Service Principal

NOTE

When you load data or query files from Data Lake Storage Gen1, you must use the adl protocol to access your data and write the file path as adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>.

Managed Service Identity

External catalog

Configure azure.adls1.use_managed_service_identity as follows in the CREATE EXTERNAL CATALOG statement:

CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls1.use_managed_service_identity" = "true"
);

File external table

Configure azure.adls1.use_managed_service_identity and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:

CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls1.use_managed_service_identity" = "true"
);

Broker Load

Configure azure.adls1.use_managed_service_identity and the file path (DATA INFILE) as follows in the LOAD LABEL statement:

LOAD LABEL test_db.label000
(
DATA INFILE("adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls1.use_managed_service_identity" = "true"
);

Service Principal

External catalog

Configure azure.adls1.oauth2_client_id, azure.adls1.oauth2_credential, and azure.adls1.oauth2_endpoint as follows in the CREATE EXTERNAL CATALOG statement:

CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls1.oauth2_client_id" = "<application_client_id>",
"azure.adls1.oauth2_credential" = "<application_client_credential>",
"azure.adls1.oauth2_endpoint" = "<OAuth_2.0_authorization_endpoint_v2>"
);

File external table

Configure azure.adls1.oauth2_client_id, azure.adls1.oauth2_credential, azure.adls1.oauth2_endpoint, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:

CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls1.oauth2_client_id" = "<application_client_id>",
"azure.adls1.oauth2_credential" = "<application_client_credential>",
"azure.adls1.oauth2_endpoint" = "<OAuth_2.0_authorization_endpoint_v2>"
);

Broker Load

Configure azure.adls1.oauth2_client_id, azure.adls1.oauth2_credential, azure.adls1.oauth2_endpoint, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:

LOAD LABEL test_db.label000
(
DATA INFILE("adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls1.oauth2_client_id" = "<application_client_id>",
"azure.adls1.oauth2_credential" = "<application_client_credential>",
"azure.adls1.oauth2_endpoint" = "<OAuth_2.0_authorization_endpoint_v2>"
);

Data Lake Storage Gen2

StarRocks supports using one of the following authentication methods to access Data Lake Storage Gen2:

  • Managed Identity
  • Shared Key
  • Service Principal

NOTE

When you load data or query files from Data Lake Storage Gen2, you must use the abfs or abfss protocol to access your data:

  • If your storage account allows access over HTTP, use the abfs protocol and write the file path as abfs://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>.
  • If your storage account allows access over HTTPS, use the abfss protocol and write the file path as abfss://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>.

Managed Identity

Before you start, you need to make the following preparations:

  • Edit the virtual machines (VMs) on which your StarRocks cluster is deployed.
  • Add the managed identities to these VMs.
  • Make sure that the managed identities are associated with the role (Storage Blob Data Reader) authorized to read data in your storage account.

External catalog

Configure azure.adls2.oauth2_use_managed_identity, azure.adls2.oauth2_tenant_id, and azure.adls2.oauth2_client_id as follows in the CREATE EXTERNAL CATALOG statement:

CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls2.oauth2_use_managed_identity" = "true",
"azure.adls2.oauth2_tenant_id" = "<service_principal_tenant_id>",
"azure.adls2.oauth2_client_id" = "<service_client_id>"
);

File external table

Configure azure.adls2.oauth2_use_managed_identity, azure.adls2.oauth2_tenant_id, azure.adls2.oauth2_client_id, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:

CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "abfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls2.oauth2_use_managed_identity" = "true",
"azure.adls2.oauth2_tenant_id" = "<service_principal_tenant_id>",
"azure.adls2.oauth2_client_id" = "<service_client_id>"
);

Broker Load

Configure azure.adls2.oauth2_use_managed_identity, azure.adls2.oauth2_tenant_id, azure.adls2.oauth2_client_id, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:

LOAD LABEL test_db.label000
(
DATA INFILE("adfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls2.oauth2_use_managed_identity" = "true",
"azure.adls2.oauth2_tenant_id" = "<service_principal_tenant_id>",
"azure.adls2.oauth2_client_id" = "<service_client_id>"
);

Shared Key

External catalog

Configure azure.adls2.storage_account and azure.adls2.shared_key as follows in the CREATE EXTERNAL CATALOG statement:

CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls2.storage_account" = "<storage_account_name>",
"azure.adls2.shared_key" = "<shared_key>"
);

File external table

Configure azure.adls2.storage_account, azure.adls2.shared_key, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:

CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "abfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls2.storage_account" = "<storage_account_name>",
"azure.adls2.shared_key" = "<shared_key>"
);

Broker Load

Configure azure.adls2.storage_account, azure.adls2.shared_key, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:

LOAD LABEL test_db.label000
(
DATA INFILE("adfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls2.storage_account" = "<storage_account_name>",
"azure.adls2.shared_key" = "<shared_key>"
);

Service Principal

Before you start, you need to create a service principal, create a role assignment to assign a role to the service principal, and then add the role assignment to your storage account. As such, you can make sure that this service principal can successfully access the data in your storage account.

External catalog

Configure azure.adls2.oauth2_client_id, azure.adls2.oauth2_client_secret, and azure.adls2.oauth2_client_endpoint as follows in the CREATE EXTERNAL CATALOG statement:

CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls2.oauth2_client_id" = "<service_client_id>",
"azure.adls2.oauth2_client_secret" = "<service_principal_client_secret>",
"azure.adls2.oauth2_client_endpoint" = "<service_principal_client_endpoint>"
);

File external table

Configure azure.adls2.oauth2_client_id, azure.adls2.oauth2_client_secret, azure.adls2.oauth2_client_endpoint, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:

CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "abfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls2.oauth2_client_id" = "<service_client_id>",
"azure.adls2.oauth2_client_secret" = "<service_principal_client_secret>",
"azure.adls2.oauth2_client_endpoint" = "<service_principal_client_endpoint>"
);

Broker Load

Configure azure.adls2.oauth2_client_id, azure.adls2.oauth2_client_secret, azure.adls2.oauth2_client_endpoint, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:

LOAD LABEL test_db.label000
(
DATA INFILE("adfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls2.oauth2_client_id" = "<service_client_id>",
"azure.adls2.oauth2_client_secret" = "<service_principal_client_secret>",
"azure.adls2.oauth2_client_endpoint" = "<service_principal_client_endpoint>"
);