Appearance
Custom Object Storage
Columnar data persisted by the instance is persisted in object storage. By default, the Deployer will automatically create an object storage bucket and configure the instance to use it. However you can opt out of this behaviour by deselecting the Create initial external storage option.
If you opt out, you need to use the CREATE EXTERNAL STORAGE
and CREATE EXTERNAL LOCATION
commands to configure the object storage location.
You will need to create an object storage bucket if you do not already have one, and note some of its properties or access details to pass into the creation commands. To avoid extra cost and minimise latency it's strongly recommanded to setup this storage in the same cloud provider and region as the one in which the instance has been deployed.
Note that in general, storage capacity is unlimited. If necessary, you can apply disk quotas to limit the size of individual databases, schemas, and tables.
Using an AWS S3 bucket
Step 1: Create a Bucket and IAM User
The following examples will be based on a bucket named my-yellowbrick-storage-bucket
.
Create a general-purpose S3 bucket in the same region as your deployment, either by using the AWS Management Console or the AWS CLI.
Create a dedicated IAM user for the bucket, and assign it the following two policies: This first policy grants all S3 actions on the S3 bucket and its contents. Here is an example of such policy definition in JSON:
{
"PolicyName": "S3BucketObjectsPolicy",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::my-yellowbrick-storage-bucket/*"
]
}
]
}
}
This second policy shall grant the s3:GetBucketLocation
and s3:ListBucket
actions on the bucket itself. Here is an example of such definition in JSON:
{
"PolicyName": "S3BucketPolicy",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-yellowbrick-storage-bucket"
]
}
]
}
}
Step 2: Create an Access Key and Configure the Instance
Once the user has been created, you will need to create an IAM Access Key associated with it, to be used for an Application running on an AWS compute service.
During the creation of the key you will provided with an Access key and a Secret access key. These two values must be stored carefully, as they will be required for the configuration of the external storage.
Once the key details are known, it's possible to setup the storage in the Yellowbrick Database using SQL:
CREATE EXTERNAL STORAGE inst-storage TYPE s3 ENDPOINT 'http://s3.<region>.amazonaws.com' REGION '<region>' IDENTITY '<access key>' CREDENTIAL '<secret access key>';
CREATE EXTERNAL LOCATION storage-location path '<bucket name>' EXTERNAL STORAGE inst-storage USAGE PRIMARY DEFAULT;
Using an Azure Azure Storage Account
Step 1: Create a Storage Account and Container
You must create a Storage Account for Blob Storage, referred to as Storage V2. Use the Locally Redundant Storage (LRS) and set the performance to standard.
Within this Storage Account you must create a Storage Container. It is recommended to keep that container private and never enable public access.
Ste 2: Retrieve Access Keys and Configure the Instance
The Storage Account will provide two Access Keys; either one of these keys may be used to configure the storage using SQL:
CREATE EXTERNAL STORAGE inst-storage TYPE azure ENDPOINT 'https://<storage account name>.blob.core.windows.net' REGION '<region>' IDENTITY '<storage account name>' CREDENTIAL '<access key>';
CREATE EXTERNAL LOCATION storage-location path '<storage container name>' EXTERNAL STORAGE inst-storage USAGE PRIMARY DEFAULT;
Google Cloud Storage
Step 1: Create a Storage Bucket and IAM Service Account
You must create a Google Cloud Storage bucket. Then you must create an IAM Service Account that will have necessary permissions to access this bucket. This service account will present itself as an email address, for example <account name>@<project id>.iam.gserviceaccount.com
On that service account you must grant at least the following permissions:
storage.buckets.delete
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.getIamPolicy
storage.objects.list
storage.objects.update
A role called YBWorkerCloudStorageAccess
role is created during deployment and possesses the necessary permissions. Otherwise you will need to create a custom role, or rely on predefined one that will grant more permissions than necessary.
Step 2: Create an Access Key and Configure the Instance
Create an Access Key on the newly created service account. The key will be provided as a JSON document containing various parameters. The private_key
one must be selected and then encoded in base64 to be used as the credential in the storage configuration:
CREATE EXTERNAL STORAGE inst-storage TYPE gs ENDPOINT 'https://storage.googleapis.com' REGION '<region>' IDENTITY '<service account email>' CREDENTIAL '<base64 encoded private key>';
CREATE EXTERNAL LOCATION storage-location path '<bucket name>' EXTERNAL STORAGE inst-storage USAGE PRIMARY DEFAULT;