Appearance
Azure Blob Storage Examples
The following ybload
examples show how to load Yellowbrick tables from Azure Blob storage.
Load a table from a single CSV file in an Azure container:
Here is an example of a load from Azure Blob storage. The Azure storage account is ybbobr
and the source file match.csv
was uploaded to a container called premdb
. The --object-store-identity
and --object-store-credential
options must be specified, but the --object-store-endpoint
option is optional.
$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad'
--object-store-identity 'ybbobr'
--object-store-credential '****************************************'
--object-store-endpoint "https://ybbobr.blob.core.windows.net/"
azure://premdb/match.csv
Password for user bobr:
18:49:56.744 [ INFO] ABOUT CLIENT:
app.cli_args = "-d" "premdb" "--username" "<USERNAME>" "-W" "-t" "match" "--format" "csv" "--delimiter" "," "--bad-row-file" "<FILENAME>" "--object-store-identity" "********" "--object-store-credential" "********" "--object-store-endpoint" "<ENDPOINT>" " azure://premdb/match.csv"
app.name_and_version = "ybload version 4.0.1-22337"
java.version = "1.8.0_101"
jvm.memory = "981.50 MB (max=14.22 GB)"
jvm.name_and_version = "Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)"
jvm.options = "-Xmx16g, -Xms1g, -Dapp.name=ybload, -Dapp.pid=23460, -Dapp.repo=/opt/ybtools/lib, -Dapp.home=/opt/ybtools, -Dbasedir=/opt/ybtools"
jvm.vendor = "Oracle Corporation"
os.name_and_version = "Linux 4.4.0-31-generic (amd64)"
18:49:56.821 [ INFO] Gathering metadata on input files
18:49:56.831 [ INFO] Loaded file handler for azure://
18:49:56.863 [ INFO]
Configuration (Azure client):
endpoint : https://ybbobr.blob.core.windows.net/
identity/credential: ********
18:49:56.918 [ INFO] Assuming source encoding matches database server encoding: LATIN9
18:49:58.878 [ INFO] Starting source azure://premdb/match.csv
18:49:59.026 [ INFO] Using database locale: C
18:49:59.032 [ INFO] Auto-detected line separator = '\n'
18:49:59.154 [ INFO]
Configuration (pipeline):
numSources : 1
numReaders : 4
numParsersPerReader: 2
Configuration (record/field separation):
--format : CSV
--delimiter : ,
--linesep : \n
--quote-char : "
--escape-char : "
--no-trim-white
--skip-blank-lines
--on-missing-field : ERROR
--on-extra-field : ERROR
--on-unescaped-embedded-quote: ERROR
Configuration (pre-parsing):
--on-zero-char : ERROR
--on-string-too-long : ERROR
--on-invalid-char : REMOVE
--no-convert-ascii-control
--no-convert-c-escape
Configuration (session):
tableName : "premdb"."public"."match"
keepAliveSeconds: 60
maxBadRows : Unlimited
sessionKey : DKkIwVzD4vaaAyCKu43oNuucwSpn-avzIRYo9xfdY6hmWUNdCIkiZ0awIOzj5S__
Configuration (transaction):
transactionType : BySize
rowsPerTransaction : Unlimited
bytesPerTransaction: 1.0TB
18:49:59.618 [ INFO] Bad rows will be written to /home/brumsby/newazurebad
18:50:00.138 [ INFO] Opening transaction #1 for match ...
18:50:00.223 [ INFO] Opened transaction #1 for match
18:50:00.231 [ INFO] Flushing last 8606 rows (of 8606 total) in transaction #1 for match
READ:305.8KB(88.83KB/s). ROWS G/B: 8606/0( 2.44K/s). WRITE:285.7KB(83.01KB/s). TIME E/R: 0:00:03/ --:--:--18:50:00.256 [ INFO] Committing 8606 rows into transaction #1 for match ...
18:50:01.446 [ INFO] Committed transaction #1 after a total of 292604 bytes and 8606 good rows for match
18:50:01.464 [ INFO] READ:305.8KB(66.00KB/s). ROWS G/B: 8606/0( 1.81K/s). WRITE:285.7KB(61.67KB/s). TIME E/R: 0:00:04/ --:--:--
18:50:01.465 [ INFO] SUCCESSFUL BULK LOAD: Loaded 8606 good rows in 0:00:04 (READ: 66.00KB/s WRITE: 61.67KB/s)
Load a table from multiple CSV files in an Azure container:
In this example, the URI for the load is azure://premdb/match0
. Five files are found in the premdb
container with the prefix match0
:
$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad'
--object-store-credential '****************************************'
--object-store-identity 'ybbobr' azure://premdb/match0
Password for user bobr:
20:40:03.409 [ INFO] ABOUT CLIENT:
...
20:40:03.479 [ INFO] Gathering metadata on input files
20:40:03.489 [ INFO] Loaded file handler for azure://
20:40:03.522 [ INFO]
Configuration (Azure client):
endpoint : null
identity/credential: ********
20:40:03.572 [ INFO] Assuming source encoding matches database server encoding: LATIN9
20:40:05.474 [ INFO] Expanded azure://premdb/match0 into 5 sources
20:40:05.486 [ INFO] Choosing to read sources concurrently because most sources are likely to be slow
20:40:05.829 [ INFO] Starting source azure://premdb/match04.csv
20:40:05.946 [ INFO] Starting source azure://premdb/match03.csv
20:40:05.947 [ INFO] Starting source azure://premdb/match05.csv
20:40:05.949 [ INFO] Starting source azure://premdb/match01.csv
20:40:05.950 [ INFO] Starting source azure://premdb/match02.csv
20:40:05.999 [ INFO] Using database locale: C
20:40:06.004 [ INFO] Auto-detected line separator = '\n'
20:40:06.153 [ INFO]
Configuration (pipeline):
numSources : 5
readSourcesConcurrently: 5
numReaders : 4
numParsersPerReader : 2
Configuration (record/field separation):
...
The same load could be run by specifying the five sources explicitly at the end of the command:
azure://premdb/match01.csv
azure://premdb/match02.csv
azure://premdb/match03.csv
azure://premdb/match04.csv
azure://premdb/match05.csv
Load a table from files in separate Azure containers:
The following command loads a table from two files, one in the premdb
container and one in the premdbnew
container:
$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad'
--object-store-credential '****************************************'
--object-store-identity 'ybbobr'
azure://premdb/match05.csv
azure://premdbnew/match01.csv
Use a properties file to load a table from an Azure container:
First create a properties file on the client system. For example:
$ more ybbobr.properties
yb.file.endpoint = https://ybbobr.blob.core.windows.net
yb.file.identity = ybbobr
yb.file.credential = ****************************************
Now run the ybload
command and name the ybbobr.properties
file in the --object-store-provider-config
option:
$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad'
--object-store-provider-config ybbobr.properties
azure://premdb/match05.csv
Password for user bobr:
21:51:15.868 [ INFO] ABOUT CLIENT:
app.cli_args = "-d" "premdb" "--username" "<USERNAME>" "-W" "-t" "match" "--format" "csv" "--delimiter" "," "--bad-row-file" "<FILENAME>" "--object-store-provider-config" "<FILENAME>.properties" "azure://<URL>.csv"
app.name_and_version = "ybload version 4.1.0-22656"
java.version = "1.8.0_101"
jvm.memory = "981.50 MB (max=14.22 GB)"
jvm.name_and_version = "Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)"
jvm.options = "-Xmx16g, -Xms1g, -Dapp.name=ybload, -Dapp.pid=6255, -Dapp.repo=/opt/ybtools/lib, -Dapp.home=/opt/ybtools, -Dbasedir=/opt/ybtools"
jvm.vendor = "Oracle Corporation"
os.name_and_version = "Linux 4.4.0-31-generic (amd64)"
21:51:15.950 [ INFO] Gathering metadata on input files
21:51:15.971 [ INFO] Loaded file handler for azure://
21:51:16.009 [ INFO]
Configuration (Azure client):
endpoint : https://ybbobr.blob.core.windows.net
identity/credential: ********
...
Load a compressed file from an Azure container:
This example simply shows that ybload
can load compressed .gz
files from an object store. No compression options need to be specified. The file is recognized by its extension.
$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad'
--object-store-provider-config ybbobr.properties
azure://premdbnew/match01.gz
...
Use az login with an authentication code:
The az login
command in this example prompts for authentication with a code (OIDC token) on the Microsoft web portal. After this is done, the ybload
command can be run without the --object-store-identity
and --object-store-credential
options.
$ az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FJCDS23HP to authenticate.
[
{
"cloudName": "AzureCloud",
"homeTenantId": "****************************************",
"id": "****************************************",
"isDefault": true,
"managedByTenants": [],
"name": "JBond Engineering",
"state": "Enabled",
"tenantId": "****************************************",
"user": {
"name": "james.bond@agent007.com",
"type": "user"
}
}
]
$ ybload -d premdb --username jbond -W -t match --format csv --delimiter ',' --bad-row-file '/home/jbond/newazurebad'
--object-store-endpoint "https://ybjbond.blob.core.windows.net/"
azure://premdb/match.csv
Password for user jbond:
...
Use the Azure CLI to create a service principal for authentication:
This example shows how to:
- Create a service principal with the
reader
role (this can be done via the CLI or the web portal) - Set the following Azure environment variables:
AZURE_CLIENT_ID
AZURE_CLIENT_SECRET
AZURE_TENANT_ID
- Assign the Storage Blob Data Reader role to
AZURE_CLIENT_ID
(again this can be done via the CLI or the web portal) - Use the service principal (and associated environment variables) to authenticate with the
az login
command
$ az ad sp create-for-rbac -n “https://ybbobr” --role reader --scope /subscriptions/b0093a34-8f38-4752-8f4b-5b9e924ccc16/resourceGroups/yb-cloud-stack
$ export AZURE_CLIENT_ID=“****************************************"
$ export AZURE_CLIENT_SECRET=“****************************************”
$ export AZURE_TENANT_ID=“****************************************”
$ az role assignment create --role “Storage Blob Data Reader” --assignee $AZURE_CLIENT_ID --scope /subscriptions/b0093a34-8f38-4752-8f4b-5b9e924ccc16/resourceGroups/yb-cloud-stack/providers/Microsoft.Storage/storageAccounts/ybbobr
$ az login --service-principal --username $AZURE_CLIENT_ID --password $AZURE_CLIENT_SECRET --tenant $AZURE_TENANT_ID
...
At this point, you can run a ybload
command using implicit authentication via the established service principal identity and credential.
Parent topic:Loading from Azure Blob Storage