Skip to content

Azure Blob Storage Examples

The following ybload examples show how to load Yellowbrick tables from Azure Blob storage.

Load a table from a single CSV file in an Azure container:

Here is an example of a load from Azure Blob storage. The Azure storage account is ybbobr and the source file match.csv was uploaded to a container called premdb. The --object-store-identity and --object-store-credential options must be specified, but the --object-store-endpoint option is optional.

$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad'
--object-store-identity 'ybbobr'
--object-store-credential '****************************************'
--object-store-endpoint "https://ybbobr.blob.core.windows.net/" 
azure://premdb/match.csv
Password for user bobr:
18:49:56.744 [ INFO] ABOUT CLIENT:
   app.cli_args         = "-d" "premdb" "--username" "<USERNAME>" "-W" "-t" "match" "--format" "csv" "--delimiter" "," "--bad-row-file" "<FILENAME>" "--object-store-identity" "********" "--object-store-credential" "********" "--object-store-endpoint" "<ENDPOINT>" " azure://premdb/match.csv"
   app.name_and_version = "ybload version 4.0.1-22337"
   java.version         = "1.8.0_101"
   jvm.memory           = "981.50 MB (max=14.22 GB)"
   jvm.name_and_version = "Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)"
   jvm.options          = "-Xmx16g, -Xms1g, -Dapp.name=ybload, -Dapp.pid=23460, -Dapp.repo=/opt/ybtools/lib, -Dapp.home=/opt/ybtools, -Dbasedir=/opt/ybtools"
   jvm.vendor           = "Oracle Corporation"
   os.name_and_version  = "Linux 4.4.0-31-generic (amd64)"
 
18:49:56.821 [ INFO] Gathering metadata on input files
18:49:56.831 [ INFO] Loaded file handler for azure://
18:49:56.863 [ INFO]
Configuration (Azure client):
   endpoint           : https://ybbobr.blob.core.windows.net/
   identity/credential: ********
18:49:56.918 [ INFO] Assuming source encoding matches database server encoding: LATIN9
18:49:58.878 [ INFO] Starting source azure://premdb/match.csv
18:49:59.026 [ INFO] Using database locale: C
18:49:59.032 [ INFO] Auto-detected line separator = '\n'
18:49:59.154 [ INFO]
Configuration (pipeline):
   numSources         : 1
   numReaders         : 4
   numParsersPerReader: 2
Configuration (record/field separation):
   --format                     : CSV
   --delimiter                  : ,
   --linesep                    : \n
   --quote-char                 : "
   --escape-char                : "
   --no-trim-white             
   --skip-blank-lines          
   --on-missing-field           : ERROR
   --on-extra-field             : ERROR
   --on-unescaped-embedded-quote: ERROR
Configuration (pre-parsing):
   --on-zero-char            : ERROR
   --on-string-too-long      : ERROR
   --on-invalid-char         : REMOVE
   --no-convert-ascii-control
   --no-convert-c-escape    
Configuration (session):
   tableName       : "premdb"."public"."match"
   keepAliveSeconds: 60
   maxBadRows      : Unlimited
   sessionKey      : DKkIwVzD4vaaAyCKu43oNuucwSpn-avzIRYo9xfdY6hmWUNdCIkiZ0awIOzj5S__
Configuration (transaction):
   transactionType    : BySize
   rowsPerTransaction : Unlimited
   bytesPerTransaction: 1.0TB
18:49:59.618 [ INFO] Bad rows will be written to /home/brumsby/newazurebad
18:50:00.138 [ INFO] Opening transaction #1 for match ...
18:50:00.223 [ INFO] Opened transaction #1 for match
18:50:00.231 [ INFO] Flushing last 8606 rows (of 8606 total) in transaction #1 for match
READ:305.8KB(88.83KB/s). ROWS G/B: 8606/0( 2.44K/s). WRITE:285.7KB(83.01KB/s).  TIME E/R:   0:00:03/ --:--:--18:50:00.256 [ INFO] Committing 8606 rows into transaction #1 for match ...
18:50:01.446 [ INFO] Committed transaction #1 after a total of 292604 bytes and 8606 good rows for match
18:50:01.464 [ INFO] READ:305.8KB(66.00KB/s). ROWS G/B: 8606/0( 1.81K/s). WRITE:285.7KB(61.67KB/s).  TIME E/R:   0:00:04/ --:--:--
18:50:01.465 [ INFO] SUCCESSFUL BULK LOAD: Loaded 8606 good rows in   0:00:04 (READ: 66.00KB/s WRITE: 61.67KB/s)

Load a table from multiple CSV files in an Azure container:

In this example, the URI for the load is azure://premdb/match0. Five files are found in the premdb container with the prefix match0:

$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad' 
--object-store-credential '****************************************' 
--object-store-identity 'ybbobr' azure://premdb/match0
Password for user bobr: 
20:40:03.409 [ INFO] ABOUT CLIENT:
...
20:40:03.479 [ INFO] Gathering metadata on input files
20:40:03.489 [ INFO] Loaded file handler for azure://
20:40:03.522 [ INFO] 
Configuration (Azure client):
   endpoint           : null
   identity/credential: ********
20:40:03.572 [ INFO] Assuming source encoding matches database server encoding: LATIN9
20:40:05.474 [ INFO] Expanded azure://premdb/match0 into 5 sources
20:40:05.486 [ INFO] Choosing to read sources concurrently because most sources are likely to be slow
20:40:05.829 [ INFO] Starting source azure://premdb/match04.csv
20:40:05.946 [ INFO] Starting source azure://premdb/match03.csv
20:40:05.947 [ INFO] Starting source azure://premdb/match05.csv
20:40:05.949 [ INFO] Starting source azure://premdb/match01.csv
20:40:05.950 [ INFO] Starting source azure://premdb/match02.csv
20:40:05.999 [ INFO] Using database locale: C
20:40:06.004 [ INFO] Auto-detected line separator = '\n'
20:40:06.153 [ INFO] 
Configuration (pipeline):
   numSources             : 5
   readSourcesConcurrently: 5
   numReaders             : 4
   numParsersPerReader    : 2
Configuration (record/field separation):
...

The same load could be run by specifying the five sources explicitly at the end of the command:

azure://premdb/match01.csv
azure://premdb/match02.csv
azure://premdb/match03.csv
azure://premdb/match04.csv
azure://premdb/match05.csv

Load a table from files in separate Azure containers:

The following command loads a table from two files, one in the premdb container and one in the premdbnew container:

$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad' 
--object-store-credential '****************************************' 
--object-store-identity 'ybbobr' 
azure://premdb/match05.csv 
azure://premdbnew/match01.csv

Use a properties file to load a table from an Azure container:

First create a properties file on the client system. For example:

$ more ybbobr.properties

yb.file.endpoint = https://ybbobr.blob.core.windows.net
yb.file.identity = ybbobr
yb.file.credential = ****************************************

Now run the ybload command and name the ybbobr.properties file in the --object-store-provider-config option:

$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad' 
--object-store-provider-config ybbobr.properties 
azure://premdb/match05.csv
Password for user bobr: 
21:51:15.868 [ INFO] ABOUT CLIENT:
   app.cli_args         = "-d" "premdb" "--username" "<USERNAME>" "-W" "-t" "match" "--format" "csv" "--delimiter" "," "--bad-row-file" "<FILENAME>" "--object-store-provider-config" "<FILENAME>.properties" "azure://<URL>.csv"
   app.name_and_version = "ybload version 4.1.0-22656"
   java.version         = "1.8.0_101"
   jvm.memory           = "981.50 MB (max=14.22 GB)"
   jvm.name_and_version = "Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)"
   jvm.options          = "-Xmx16g, -Xms1g, -Dapp.name=ybload, -Dapp.pid=6255, -Dapp.repo=/opt/ybtools/lib, -Dapp.home=/opt/ybtools, -Dbasedir=/opt/ybtools"
   jvm.vendor           = "Oracle Corporation"
   os.name_and_version  = "Linux 4.4.0-31-generic (amd64)"

21:51:15.950 [ INFO] Gathering metadata on input files
21:51:15.971 [ INFO] Loaded file handler for azure://
21:51:16.009 [ INFO] 
Configuration (Azure client):
   endpoint           : https://ybbobr.blob.core.windows.net
   identity/credential: ********
...

Load a compressed file from an Azure container:

This example simply shows that ybload can load compressed .gz files from an object store. No compression options need to be specified. The file is recognized by its extension.

$ ybload -d premdb --username bobr -W -t match --format csv --delimiter ',' --bad-row-file '/home/brumsby/newazurebad' 
--object-store-provider-config ybbobr.properties 
azure://premdbnew/match01.gz
...

Use az login with an authentication code:

The az login command in this example prompts for authentication with a code (OIDC token) on the Microsoft web portal. After this is done, the ybload command can be run without the --object-store-identity and --object-store-credential options.

$ az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FJCDS23HP to authenticate.
[
  {
   "cloudName": "AzureCloud",
   "homeTenantId": "****************************************",
   "id": "****************************************",
   "isDefault": true,
   "managedByTenants": [],
   "name": "JBond Engineering",
   "state": "Enabled",
   "tenantId": "****************************************",
   "user": {
     "name": "james.bond@agent007.com",
     "type": "user"
   }
  }
]
$ ybload -d premdb --username jbond -W -t match --format csv --delimiter ',' --bad-row-file '/home/jbond/newazurebad' 
--object-store-endpoint "https://ybjbond.blob.core.windows.net/" 
azure://premdb/match.csv
Password for user jbond:
...

Use the Azure CLI to create a service principal for authentication:

This example shows how to:

  • Create a service principal with the reader role (this can be done via the CLI or the web portal)
  • Set the following Azure environment variables:
  • AZURE_CLIENT_ID
  • AZURE_CLIENT_SECRET
  • AZURE_TENANT_ID
  • Assign the Storage Blob Data Reader role to AZURE_CLIENT_ID (again this can be done via the CLI or the web portal)
  • Use the service principal (and associated environment variables) to authenticate with the az login command
$ az ad sp create-for-rbac -n “https://ybbobr” --role reader --scope /subscriptions/b0093a34-8f38-4752-8f4b-5b9e924ccc16/resourceGroups/yb-cloud-stack
$ export AZURE_CLIENT_ID=****************************************"
$ export AZURE_CLIENT_SECRET=“****************************************”
$ export AZURE_TENANT_ID=“****************************************”
$ az role assignment create --role “Storage Blob Data Reader” --assignee $AZURE_CLIENT_ID --scope /subscriptions/b0093a34-8f38-4752-8f4b-5b9e924ccc16/resourceGroups/yb-cloud-stack/providers/Microsoft.Storage/storageAccounts/ybbobr
$ az login --service-principal --username $AZURE_CLIENT_ID --password $AZURE_CLIENT_SECRET --tenant $AZURE_TENANT_ID
...

At this point, you can run a ybload command using implicit authentication via the established service principal identity and credential.

Parent topic:Loading from Azure Blob Storage