Appearance
Bulk Load Insert Examples
The following examples show how to use various ybload
options in the context of a default INSERT
load. See also Parquet Load Examples and Deletes, Updates, and Upserts.
Load the match table with the -d option from a single CSV file
$ ybload -d premdb --username bobr -t match -W --format csv /home/bobr/match.csv
Password for user bobr:
18:18:32.650 [ INFO] ABOUT CLIENT:
app.cli_args = -d premdb --username bobr -t match -W --format csv /home/bobr/match.csv
app.name_and_version = ybload version 2.0.0-10711
java.home = /usr/lib/jvm/java-8-oracle/jre
java.version = 1.8.0_101
jvm.memory = 981.50 MB (max=14.22 GB)
jvm.name_and_version = Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
jvm.options = -Xmx16g, -Xms1g, -ea, -Dapp.name=ybload, -Dapp.pid=19168, -Dapp.repo=/usr/lib/ybtools/lib, -Dapp.home=/usr/lib/ybtools, -Dbasedir=/usr/lib/ybtools
jvm.vendor = Oracle Corporation
os.name_and_version = Linux 4.4.0-31-generic (amd64)
18:18:32.697 [ INFO] Gathering metadata on input files
18:18:32.914 [ INFO] Assuming source encoding matches database server encoding: LATIN9
18:18:32.946 [ INFO] Starting 1 source PreReaders
18:18:32.967 [ INFO] Auto-detected line separator = '\n'
18:18:32.977 [ INFO] Configuration (record/field separation):
--format : CSV
--delimiter : ,
--linesep : \n
--quote-char : "
--escape-char : "
--trim-white : false
--skip-blank-lines : true
--on-missing-field : ERROR
--on-extra-field : ERROR
--on-unescaped-embedded-quote : ERROR
Internal scanner : RecordScannerQuote_1_1
18:18:33.004 [ INFO] Using database locale: C
18:18:33.846 [ INFO] Bad rows will be written to /home/bobr/match.csv.20190129181833.bad
18:18:33.854 [ INFO] Starting import of 1 files
18:18:34.127 [ INFO] Starting 4 segment readers
18:18:34.128 [ INFO] Opening transaction #1...
18:18:34.183 [ INFO] Opened transaction #1
18:18:34.234 [ INFO] Flushing last of 8606 rows for transaction #1
18:18:34.238 [ INFO] Committing 8606 rows into transaction #1...
18:18:34.256 [ INFO] Committed transaction #1 after a total of 292604 bytes and 8606 good rows
18:18:34.275 [ INFO] READ:305.8KB(193.0KB/s). ROWS G/B: 8606/0( 5.30K/s). WRITE:285.7KB(180.3KB/s). TIME E/R: 0:00:01/ --:--:--
18:18:34.276 [ INFO] SUCCESSFUL BULK LOAD: Loaded 8606 good rows in 0:00:01 (READ: 193.0KB/s WRITE: 180.3KB/s)
Load the same table from a CSV file with a single header line
$ ybload -d premdb --username bobr -t match -W --format csv --num-header-lines 1 /home/bobr/match_with_header.csv
Password for user bobr:
...
18:39:24.888 [ INFO] Gathering metadata on input files
18:39:25.163 [ INFO] Assuming source encoding matches database server encoding: LATIN9
18:39:25.233 [ INFO] Starting 1 source PreReaders
18:39:25.256 [ INFO] Auto-detected line separator = '\n'
18:39:25.263 [ INFO] Configuration (record/field separation):
--format : CSV
--delimiter : ,
--linesep : \n
--quote-char : "
--escape-char : "
--trim-white : false
--skip-blank-lines : true
--on-missing-field : ERROR
--on-extra-field : ERROR
--on-unescaped-embedded-quote : ERROR
Internal scanner : RecordScannerQuote_1_1
18:39:25.290 [ INFO] Using database locale: C
18:39:26.047 [ INFO] Bad rows will be written to /home/bobr/match_with_header.csv.20190129183926.bad
18:39:26.055 [ INFO] Starting import of 1 files
18:39:26.326 [ INFO] Starting 4 segment readers
18:39:26.327 [ INFO] Opening transaction #1...
18:39:26.374 [ INFO] Opened transaction #1
18:39:26.411 [ INFO] Flushing last of 8606 rows for transaction #1
18:39:26.414 [ INFO] Committing 8606 rows into transaction #1...
18:39:26.433 [ INFO] Committed transaction #1 after a total of 292604 bytes and 8606 good rows
18:39:26.452 [ INFO] READ:305.8KB(194.7KB/s). ROWS G/B: 8606/0( 5.35K/s). WRITE:285.7KB(181.9KB/s). TIME E/R: 0:00:01/ --:--:--
18:39:26.453 [ INFO] SUCCESSFUL BULK LOAD: Loaded 8606 good rows in 0:00:01 (READ: 194.7KB/s WRITE: 181.9KB/s)
Load the same table from multiple CSV files
$ ybload -d premdb --username bobr -t match -W --format csv /home/bobr/premdata/matchdata/*
Password for user bobr:
...
18:50:39.278 [ INFO] Auto-detected line separator = '\n'
18:50:39.288 [ INFO] Configuration (record/field separation):
--format : CSV
--delimiter : ,
--linesep : \n
--quote-char : "
--escape-char : "
--trim-white : false
--skip-blank-lines : true
--on-missing-field : ERROR
--on-extra-field : ERROR
--on-unescaped-embedded-quote : ERROR
Internal scanner : RecordScannerQuote_1_1
18:50:39.320 [ INFO] Using database locale: C
18:50:40.119 [ INFO] Bad rows will be written to /home/bobr/ybload.20190129185040.bad
18:50:40.127 [ INFO] Starting import of 3 files
18:50:40.416 [ INFO] Starting 4 segment readers
18:50:40.417 [ INFO] Opening transaction #1...
18:50:40.461 [ INFO] Opened transaction #1
18:50:40.557 [ INFO] Flushing last of 25818 rows for transaction #1
18:50:40.561 [ INFO] Committing 25818 rows into transaction #1...
18:50:40.586 [ INFO] Committed transaction #1 after a total of 877812 bytes and 25818 good rows
18:50:40.607 [ INFO] READ:917.4KB(570.9KB/s). ROWS G/B: 25818/0(15.69K/s). WRITE:857.2KB(533.5KB/s). TIME E/R: 0:00:01/ --:--:--
18:50:40.607 [ INFO] SUCCESSFUL BULK LOAD: Loaded 25818 good rows in 0:00:01 (READ: 570.9KB/s WRITE: 533.5KB/s)
Load a table with console logging turned off
This example turns off console logging, sends the output to a file, and sets the log level for the output that goes to that file.
$ ybload -d premdb --username bobr -t match -W --format csv --quiet --logfile load_match_log --logfile-log-level INFO /home/bobr/match.csv
Password for user bobr:
$ more load_match_log
2019-01-30 16:00:21.830 [ INFO] <main> ABOUT CLIENT:
app.cli_args = -d premdb --username bobr -t match -W --format csv --quiet --logfile load_match_log --logfile-log-level INFO /home/bobr/match.csv
app.name_and_version = ybload version 2.0.0-10711
java.home = /usr/lib/jvm/java-8-oracle/jre
java.version = 1.8.0_101
jvm.memory = 981.50 MB (max=14.22 GB)
jvm.name_and_version = Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
jvm.options = -Xmx16g, -Xms1g, -ea, -Dapp.name=ybload, -Dapp.pid=12595, -Dapp.repo=/usr/lib/ybtools/lib, -Dapp.home=/usr/lib/ybtools, -Dbasedir=/usr/lib/ybtools
jvm.vendor = Oracle Corporation
os.name_and_version = Linux 4.4.0-31-generic (amd64)
2019-01-30 16:00:21.831 [ INFO] <main> Logfile written to load_match_log
...
2019-01-30 16:00:23.580 [ INFO] <main> SUCCESSFUL BULK LOAD: Loaded 8606 good rows in 0:00:01 (READ: 179.2KB/s WRITE: 167.5KB/s)
Stop and roll back a load if >3 bad rows are found and log the errors
$ ybload -d premdb --username bobr -t team --max-bad-rows 3 -W --quiet --logfile load_team_log --logfile-log-level error /home/bobr/badteam.csv
Password for user bobr:
brumsby@brumsby:~$ more load_team_log
2019-01-30 16:10:46.559 [ERROR] <main> Unable to separate row into fields for parsing: Too few fields (8 < 9)
Bad row = "4,53,Barnsley,Tykes,Barnsley,Oakwell Stadium,23009,0"
2019-01-30 16:10:46.561 [ERROR] <main> Unable to separate row into fields for parsing: Too few fields (8 < 9)
Bad row = "18,67,Everton,Toffees,Liverpool,Goodison Park,40221,38.124"
2019-01-30 16:10:46.561 [ERROR] <main> Unable to separate row into fields for parsing: Too few fields (8 < 9)
Bad row = "31,80,Oldham Athletic,Latics,Oldham,Boundary Park,13309,0"
2019-01-30 16:10:46.562 [ERROR] <main> Unable to separate row into fields for parsing: Too few fields (8 < 9)
Bad row = "32,81,Portsmouth,Pompey,Portsmouth,Fratton Park,21100,0"
2019-01-30 16:10:46.562 [FATAL] <main> Number of bad rows exceeded maximum (3)
2019-01-30 16:10:46.620 [FATAL] <main> FAILED BULK LOAD: Nothing committed
Load a table from a text file with a special escape character
$ ybload -d premdb --username bobr -W -t matchstats --format text --escape-char '/' /home/bobr/premdata/matchstats.txt
Password for user bobr:
...
Load a table from a pipe-delimited file with literal "null" values
$ ybload -d premdb --username bobr -W -t team --format text --delimiter '|' --nullmarker 'null' /home/bobr/premdata/team2019.txt
Password for user bobr:
...
Load a table from a CSV file that has an extra field
This example loads a table from a source file that has an extra field at the end of each line, which is removed (not loaded). To achieve this result, the ybload
command specifies the --parse-header-line
and --on-extra-field remove
options.
The season
table has four columns, but the source file in this example has five values per line (comma-separated):
$ more season_five_cols.csv
SEASONID,SEASON_NAME,NUMTEAMS,WINNERS
25,2016-2017,20,Chelsea,380
26,2017-2018,20,Manchester City,380
27,2018-2019,20,Manchester City,380
28,2019-2020,20,Liverpool,380
29,2020-2021,20,Manchester City,380
30,2021-2022,20,Manchester City,380
The following load detects the mismatch in the number of columns versus fields, but succeeds in loading the table by removing the last field for each row:
$ ybload -d premdb -t season -U yellowbrick -W --delimiter ',' --format csv --parse-header-line --on-extra-field remove season_five_cols.csv
Password for user yellowbrick:
...
Configuration (record/field separation):
format : CSV
delimiter : ,
linesep : \n
quoteChar : "
escapeChar : "
trimWhite : false
trimTrailingWhite : false
skipBlankLines : true
onMissingField : ERROR
onExtraField : REMOVE
onUnescapedEmbeddedQuote: ERROR
...
The resulting rows look like this:
premdb=# select * from season where seasonid between 25 and 30;
seasonid | season_name | numteams | winners
----------+-------------+----------+-----------------
25 | 2016-2017 | 20 | Chelsea
26 | 2017-2018 | 20 | Manchester City
27 | 2018-2019 | 20 | Manchester City
28 | 2019-2020 | 20 | Liverpool
29 | 2020-2021 | 20 | Manchester City
30 | 2021-2022 | 20 | Manchester City
(6 rows)
Load a table from a CSV file with a missing field
This example loads a table from a source file that has a missing field at the end of each line, which is filled in with NULL
values. The season
table has four columns, but the source file in this example has only three values (comma-separated). The winners
column is defined in the header, but no winners
values are present in each line:
$ more season_three_cols.csv
SEASONID,SEASON_NAME,NUMTEAMS,WINNERS
25,2016-2017,20
26,2017-2018,20
27,2018-2019,20
28,2019-2020,20
29,2020-2021,20
30,2021-2022,20
To load data from this file, the ybload
command specifies the --parse-header-line
and --on-missing-field supplynull
options. The table is loaded with NULL
values for the missing column.
$ ybload -d premdb -t season -U yellowbrick -W --delimiter ',' --format csv --parse-header-line --on-missing-field supplynull season_three_cols.csv
Password for user yellowbrick:
...
Configuration (record/field separation):
format : CSV
delimiter : ,
linesep : \n
quoteChar : "
escapeChar : "
trimWhite : false
trimTrailingWhite : false
skipBlankLines : true
onMissingField : SUPPLYNULL
onExtraField : ERROR
onUnescapedEmbeddedQuote: ERROR
...
The resulting rows look like this:
premdb=# select * from season where seasonid between 25 and 30;
seasonid | season_name | numteams | winners
----------+-------------+----------+---------
25 | 2016-2017 | 20 | [NULL]
26 | 2017-2018 | 20 | [NULL]
27 | 2018-2019 | 20 | [NULL]
28 | 2019-2020 | 20 | [NULL]
29 | 2020-2021 | 20 | [NULL]
30 | 2021-2022 | 20 | [NULL]
(6 rows)
Load a calendar table with a specific date format
The --date-style
option is set to MDY
, which means that ybload
can parse and load incoming date values such as 08-13-2018
.
$ ybload -d premdb --username bobr -t -W season_date --format text --delimiter '|' --date-style MDY /home/ybdata/season_dates.txt
Database login password:
...
Load a table with a case-sensitive name
Note the -t
syntax required to load a table that was created with a quoted identifier:
$ ybload -d premdb --username bobr -W -t \"AwayTeam\" --delimiter ',' /home/premdata/awayteam.csv
...
12:19:02.492 [ INFO] Bulk Loading /home/premdata/awayteam.csv into TABLE **"public"."AwayTeam"** at 2018-10-27T12:19:01.757-07:00[America/Los_Angeles]
...