- StarRocks
- Introduction to StarRocks
- Quick Start
- Table Design
- Data Loading
- Concepts
- Overview of data loading
- Load data from a local file system or a streaming data source using HTTP PUT
- Load data from HDFS or cloud storage
- Continuously load data from Apache Kafka®
- Bulk load using Apache Sparkâ„¢
- Load data using INSERT
- Load data using Stream Load transaction interface
- Synchronize data from MySQL in real time
- Continuously load data from Apache Flink®
- Change data through loading
- Transform data at loading
- Data Unloading
- Query Data Sources
- Query Acceleration
- Administration
- Deployment
- Management
- Data recovery
- User Privilege and Authentication
- Performance Tuning
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADD SQLBLACKLIST
- ADMIN CANCEL REPAIR TABLE
- ADMIN CHECK TABLET
- ADMIN REPAIR TABLE
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER RESOURCE GROUP
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE FILE
- CREATE RESOURCE GROUP
- DELETE SQLBLACKLIST
- DROP FILE
- DROP RESOURCE GROUP
- EXPLAIN
- INSTALL PLUGIN
- KILL
- SET
- SHOW BACKENDS
- SHOW BROKER
- SHOW COMPUTE NODES
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW PLUGINS
- SHOW PROC
- SHOW PROCESSLIST
- SHOW RESOURCE GROUP
- SHOW SQLBLACKLIST
- SHOW TABLE STATUS
- SHOW VARIABLES
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER RESOURCE
- ANALYZE TABLE
- BACKUP
- CANCEL ALTER TABLE
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE ANALYZE
- CREATE EXTERNAL CATALOG
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP ANALYZE
- DROP STATS
- DROP CATALOG
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- KILL ANALYZE
- RECOVER
- REFRESH EXTERNAL TABLE
- RESTORE
- SHOW ANALYZE JOB
- SHOW ANALYZE STATUS
- SHOW META
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- USE
- DML
- ALTER LOAD
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- CANCEL EXPORT
- CANCEL REFRESH MATERIALIZED VIEW
- CREATE ROUTINE LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- REFRESH MATERIALIZED VIEW
- RESUME ROUTINE LOAD
- SELECT
- SHOW ALTER TABLE
- SHOW ALTER MATERIALIZED VIEW
- SHOW BACKUP
- SHOW CATALOGS
- SHOW CREATE MATERIALIZED VIEW
- SHOW CREATE TABLE
- SHOW CREATE VIEW
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW MATERIALIZED VIEW
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Auxiliary Commands
- Data Types
- Keywords
- Function Reference
- Java UDFs
- Window functions
- Lambda expression
- Aggregate Functions
- Array Functions
- array_agg
- array_append
- array_avg
- array_concat
- array_contains
- array_contains_all
- array_cum_sum
- array_difference
- array_distinct
- array_filter
- array_intersect
- array_join
- array_length
- array_map
- array_max
- array_min
- array_position
- array_remove
- array_slice
- array_sort
- array_sortby
- array_sum
- arrays_overlap
- array_to_bitmap
- reverse
- unnest
- Bit Functions
- Bitmap Functions
- base64_to_bitmap
- bitmap_and
- bitmap_andnot
- bitmap_contains
- bitmap_count
- bitmap_from_string
- bitmap_empty
- bitmap_has_any
- bitmap_hash
- bitmap_intersect
- bitmap_max
- bitmap_min
- bitmap_or
- bitmap_remove
- bitmap_to_array
- bitmap_to_base64
- bitmap_to_string
- bitmap_union
- bitmap_union_count
- bitmap_union_int
- bitmap_xor
- intersect_count
- sub_bitmap
- to_bitmap
- Conditional Functions
- Cryptographic Functions
- Date Functions
- add_months
- adddate
- convert_tz
- current_date
- current_time
- current_timestamp
- date
- date_add
- date_format
- date_slice
- date_sub, subdate
- date_trunc
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- days_add
- days_diff
- days_sub
- from_days
- from_unixtime
- hour
- hours_sub
- microseconds_sub
- minute
- minutes_add
- minutes_diff
- minutes_sub
- month
- monthname
- months_add
- months_diff
- now
- quarter
- second
- seconds_add
- seconds_diff
- seconds_sub
- str_to_date
- str2date
- time_slice
- time_to_sec
- timediff
- timestamp
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- week
- weekofyear
- weeks_add
- weeks_diff
- weeks_sub
- year
- years_add
- years_diff
- years_sub
- Geographic Functions
- JSON Functions
- Overview of JSON functions and operators
- JSON operators
- JSON constructor functions
- JSON query and processing functions
- Map Functions
- Math Functions
- String Functions
- Pattern Matching Functions
- Percentile Functions
- Scalar Functions
- Utility Functions
- cast function
- hash function
- System variables
- User-defined variables
- Error code
- System limits
- SQL Reference
- FAQ
- Deploy
- Data Migration
- SQL
- Query Dump
- Other FAQs
- Benchmark
- Developers
- Contribute to StarRocks
- Code Style Guides
- Use the debuginfo file for debugging
- Development Environment
- Trace Tools
- Integration
Authenticate to AWS resources
Understand credential methods used by StarRocks to authenticate AWS resources
Instance profile-based credential
The instance profile-based credential method allows you to directly grant privileges on AWS resources to your StarRocks cluster. In theory, any cluster user who can log in to the cluster can perform permitted actions on your AWS resources according to the AWS IAM policies you have configured. The typical scenario for this use case is that you do not need any AWS resource access control between multiple cluster users in the cluster. This credential method means no isolation is required within the same cluster.
However, this credential method still can be seen as a cluster-level safe access control solution, because whoever can log in to the cluster is controlled by the cluster administrator.
Assumed role-based credential
Unlike the instance profile-based credential method, which is a cluster-level access control solution, the assumed role-based credential method is a catalog-level data source access control solution that works based on the mechanism of assuming a role in AWS IAM. For more information, see Assuming a role.
Specifically, you can create different catalogs which can access different AWS S3 resources, such as S3 buckets S3-BUCKET-1
and S3-BUCKET-2
. This means you can access a different data source by changing the catalog of the current SQL session.
Further, if the cluster administrator grants privileges on different catalogs to different users, this will achieve an access control solution just like allowing different users within the same cluster to access different data sources.
IAM user-based credential
The IAM user-based credential method is also a catalog-level data source access control solution, which works based on the mechanism of IAM user. You can configure different catalogs to assume different IAM users.
Preparation about the credential in AWS IAM
If you choose instance profile-based credential
You need to create an IAM policy like below to grant access to your AWS resource, and then attach the policy to the IAM role associated with the EC2 instance on which your StarRocks cluster runs.
Access AWS S3
If you choose AWS S3 as storage, StarRocks accesses your S3 bucket based on the following IAM policy:
NOTICE
Remember to replace
<bucket_name>
with the name of your S3 bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "s3",
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": ["arn:aws:s3:::<bucket_name>/*"]
},
{
"Sid": "s3list",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::<bucket_name>"]
}
]
}
Access AWS Glue
If you choose AWS Glue as metastore, StarRocks accesses your AWS Glue Data Catalog based on the following IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetPartition",
"glue:GetPartitions",
"glue:GetTable",
"glue:GetTableVersions",
"glue:GetTables",
"glue:GetConnection",
"glue:GetConnections",
"glue:GetDevEndpoint",
"glue:GetDevEndpoints",
"glue:BatchGetPartition"
],
"Resource": [
"*"
]
}
]
}
If you choose assumed role-based credential
You need to create an assumed role (for example, named s3_role_test
) and attach the policy provided in the "Access AWS S3" section of this topic to it. This assumed role will be assumed by the IAM role associated with the EC2 instance on which your StarRocks cluster runs.
Also, if you choose AWS Glue as metastore, you can create another assumed role (for example, named glue_role_test
) and attach the policy provided in the "Access AWS Glue" section of this topic to it. Otherwise, skip the step of creating an assumed role for AWS Glue and directly attach the policy to the above assumed role you have created (for example, s3_role_test
).
After you finish this, you need to configure a trust relationship between the assumed role and the IAM role associated with the EC2 instance on which your StarRocks cluster runs.
Configure a trust relationship
First, create and attach a new IAM policy as shown below to your assumed role s3_role_test
:
NOTICE
Remember to replace
<cluster_EC2_iam_role_ARN>
with the ARN of the IAM role associated with the EC2 instance on which your StarRocks cluster runs.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "<cluster_EC2_iam_role_ARN>"
},
"Action": "sts:AssumeRole"
}
]
}
Then, create and attach a new IAM policy as shown below to the IAM role associated with the EC2 instance on which your StarRocks cluster runs:
NOTICE
Remember to replace
<ARN of s3_role_test>
with the ARN of your assumed roles3_role_test
. You need to replace<ARN of glue_role_test>
with the ARN of your assumed roleglue_role_test
only when you choose AWS Glue as metastore and have created another assumed role namedglue_role_test
for AWS Glue.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["sts:AssumeRole"],
"Resource": [
"<ARN of s3_role_test>",
"<ARN of glue_role_test>"
]
}
]
}
If you choose IAM user-based credential
You need to create an IAM user and attach the policy provided in the "Access AWS S3" or "Access AWS Glue" section of this topic to it.
Comparison between credential methods
The following figure provides a high-level explanation of the differences in mechanism between instance profile-based credential, assumed role-based credential, and IAM user-based credential in StarRocks.
Build connections with AWS resources
Authentication parameters for accessing AWS S3
In various scenarios in which StarRocks needs to integrate with AWS S3, for example, when you create external catalogs or file external tables or when you ingest, back up, or restore data from AWS S3, configure the authentication parameters for accessing AWS S3 as follows:
- If you use the instance profile-based credential method to access AWS S3, set
aws.s3.use_instance_profile
totrue
. - If you use the assumed role-based credential method to access AWS S3, set
aws.s3.use_instance_profile
totrue
and configureaws.s3.iam_role_arn
as the assumed role's ARN that you use to access AWS S3. - If you use the IAM user-based credential method to access AWS S3, set
aws.s3.use_instance_profile
tofalse
and configureaws.s3.access_key
andaws.s3.secret_key
as the access key and secret key of your AWS IAM user.
Authentication parameters for accessing AWS Glue
In various scenarios in which StarRocks needs to integrate with AWS Glue, for example, when you create external catalogs, configure the authentication parameters for accessing AWS Glue as follows:
- If you use the instance profile-based credential method to access AWS Glue, set
aws.glue.use_instance_profile
totrue
. - If you use the assumed role-based credential method to access AWS Glue, set
aws.glue.use_instance_profile
totrue
and configureaws.glue.iam_role_arn
as the assumed role's ARN that you use to access AWS Glue. - If you use the IAM user-based credential method to access AWS Glue, set
aws.glue.use_instance_profile
tofalse
and configureaws.glue.access_key
andaws.glue.secret_key
as the access key and secret key of your AWS IAM user.
Integration examples
External catalog
Creating an external catalog in your StarRocks cluster means building integration with the target data lake system, which is composed of two key components:
- File storage like AWS S3 to store table files
- Metastore like Hive metastore or AWS Glue to store the metadata and locations of table files
StarRocks supports the following types of catalogs:
The following examples create a Hive catalog named hive_catalog_hms
or hive_catalog_glue
, depending on the type of metastore you use, to query data from your Hive cluster. For detailed syntax and parameters, see Hive catalog.
If you choose instance profile-based credential
If you use Hive metastore in your Hive cluster, run a command like below:
CREATE EXTERNAL CATALOG hive_catalog_hms PROPERTIES ( "type" = "hive", "aws.s3.use_instance_profile" = "true", "aws.s3.region" = "us-west-2", "hive.metastore.uris" = "thrift://xx.xx.xx:9083" );
If you use AWS Glue in your Amazon EMR Hive cluster, run a command like below:
CREATE EXTERNAL CATALOG hive_catalog_glue PROPERTIES ( "type" = "hive", "aws.s3.use_instance_profile" = "true", "aws.s3.region" = "us-west-2", "hive.metastore.type" = "glue", "aws.glue.use_instance_profile" = "true", "aws.glue.region" = "us-west-2" );
If you choose assumed role-based credential
If you use Hive metastore in your Hive cluster, run a command like below:
CREATE EXTERNAL CATALOG hive_catalog_hms PROPERTIES ( "type" = "hive", "aws.s3.use_instance_profile" = "true", "aws.s3.iam_role_arn" = "arn:aws:iam::081976408565:role/test_s3_role", "aws.s3.region" = "us-west-2", "hive.metastore.uris" = "thrift://xx.xx.xx:9083" );
If you use AWS Glue in your Amazon EMR Hive cluster, run a command like below:
CREATE EXTERNAL CATALOG hive_catalog_glue PROPERTIES ( "type" = "hive", "aws.s3.use_instance_profile" = "true", "aws.s3.iam_role_arn" = "arn:aws:iam::081976408565:role/test_s3_role", "aws.s3.region" = "us-west-2", "hive.metastore.type" = "glue", "aws.glue.use_instance_profile" = "true", "aws.glue.iam_role_arn" = "arn:aws:iam::081976408565:role/test_glue_role", "aws.glue.region" = "us-west-2" );
If you choose IAM user-based credential
If you use Hive metastore in your Hive cluster, run a command like below:
CREATE EXTERNAL CATALOG hive_catalog_hms PROPERTIES ( "type" = "hive", "aws.s3.use_instance_profile" = "false", "aws.s3.access_key" = "<iam_user_access_key>", "aws.s3.secret_key" = "<iam_user_access_key>", "aws.s3.region" = "us-west-2", "hive.metastore.uris" = "thrift://xx.xx.xx:9083" );
If you use AWS Glue in your Amazon EMR Hive cluster, run a command like below:
CREATE EXTERNAL CATALOG hive_catalog_glue PROPERTIES ( "type" = "hive", "aws.s3.use_instance_profile" = "false", "aws.s3.access_key" = "<iam_user_access_key>", "aws.s3.secret_key" = "<iam_user_secret_key>", "aws.s3.region" = "us-west-2", "hive.metastore.type" = "glue", "aws.glue.use_instance_profile" = "false", "aws.glue.access_key" = "<iam_user_access_key>", "aws.glue.secret_key" = "<iam_user_secret_key>", "aws.glue.region" = "us-west-2" );
File external table
File external tables must be created in your internal catalog named default_catalog
.
The following examples create a file external table named file_table
on an existing database named test_s3_db
. For detailed syntax and parameters, see File external table.
If you choose instance profile-based credential
Run a command like below:
CREATE EXTERNAL TABLE test_s3_db.file_table
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "s3://starrocks-test/",
"format" = "ORC",
"aws.s3.use_instance_profile"="true",
"aws.s3.region"="us-west-2"
);
If you choose assumed role-based credential
Run a command like below:
CREATE EXTERNAL TABLE test_s3_db.file_table
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "s3://starrocks-test/",
"format" = "ORC",
"aws.s3.use_instance_profile"="true",
"aws.s3.iam_role_arn"="arn:aws:iam::081976408565:role/test_s3_role",
"aws.s3.region"="us-west-2"
);
If you choose IAM user-based credential
Run a command like below:
CREATE EXTERNAL TABLE test_s3_db.file_table
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "s3://starrocks-test/",
"format" = "ORC",
"aws.s3.use_instance_profile" = "false",
"aws.s3.access_key" = "<iam_user_access_key>",
"aws.s3.secret_key" = "<iam_user_secret_key>",
"aws.s3.region"="us-west-2"
);
Ingestion
You can use LOAD LABEL to load data from AWS S3.
The following examples load the data from all Parquet data files stored in the s3a://test-bucket/test_brokerload_ingestion
path into the test_ingestion_2
table in an existing database named test_s3_db
. For detailed syntax and parameters, see BROKER LOAD.
If you choose instance profile-based credential
Run a command like below:
LOAD LABEL test_s3_db.test_credential_instanceprofile_7
(
DATA INFILE("s3a://test-bucket/test_brokerload_ingestion/*")
INTO TABLE test_ingestion_2
FORMAT AS "parquet"
)
WITH BROKER
(
"aws.s3.use_instance_profile"= "true",
"aws.s3.region"="us-west-1"
)
PROPERTIES
(
"timeout"="1200"
);
If you choose assumed role-based credential
Run a command like below:
LOAD LABEL test_s3_db.test_credential_instanceprofile_7
(
DATA INFILE("s3a://test-bucket/test_brokerload_ingestion/*")
INTO TABLE test_ingestion_2
FORMAT AS "parquet"
)
WITH BROKER
(
"aws.s3.use_instance_profile"= "true",
"aws.s3.iam_role_arn" = "arn:aws:iam::081976408565:role/test_s3_role",
"aws.s3.region"="us-west-1"
)
PROPERTIES
(
"timeout"="1200"
);
If you choose IAM user-based credential
Run a command like below:
LOAD LABEL test_s3_db.test_credential_instanceprofile_7
(
DATA INFILE("s3a://test-bucket/test_brokerload_ingestion/*")
INTO TABLE test_ingestion_2
FORMAT AS "parquet"
)
WITH BROKER
(
"aws.s3.use_instance_profile" = "false",
"aws.s3.access_key" = "<iam_user_access_key>",
"aws.s3.secret_key" = "<iam_user_secret_key>",
"aws.s3.region"="us-west-1"
)
PROPERTIES
(
"timeout"="1200"
);