Skip to content

Bulk Load Insert Examples

The following examples show how to use various ybload options in the context of a default INSERT load. See also Parquet Load Examples and Deletes, Updates, and Upserts.

Load the match table with the -d option from a single CSV file

$ ybload -d premdb --username bobr -t match -W --format csv /home/bobr/match.csv 
Password for user bobr: 
18:18:32.650 [ INFO] ABOUT CLIENT:
   app.cli_args         = -d premdb --username bobr -t match -W --format csv /home/bobr/match.csv 
   app.name_and_version = ybload version 2.0.0-10711
   java.home            = /usr/lib/jvm/java-8-oracle/jre
   java.version         = 1.8.0_101
   jvm.memory           = 981.50 MB (max=14.22 GB)
   jvm.name_and_version = Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
   jvm.options          = -Xmx16g, -Xms1g, -ea, -Dapp.name=ybload, -Dapp.pid=19168, -Dapp.repo=/usr/lib/ybtools/lib, -Dapp.home=/usr/lib/ybtools, -Dbasedir=/usr/lib/ybtools
   jvm.vendor           = Oracle Corporation
   os.name_and_version  = Linux 4.4.0-31-generic (amd64)

18:18:32.697 [ INFO] Gathering metadata on input files
18:18:32.914 [ INFO] Assuming source encoding matches database server encoding: LATIN9
18:18:32.946 [ INFO] Starting 1 source PreReaders
18:18:32.967 [ INFO] Auto-detected line separator = '\n'
18:18:32.977 [ INFO] Configuration (record/field separation):
   --format                           : CSV
   --delimiter                        : ,
   --linesep                          : \n
   --quote-char                       : "
   --escape-char                      : "
   --trim-white                       : false
   --skip-blank-lines                 : true
   --on-missing-field                 : ERROR
   --on-extra-field                   : ERROR
   --on-unescaped-embedded-quote      : ERROR
   Internal scanner                   : RecordScannerQuote_1_1
18:18:33.004 [ INFO] Using database locale: C
18:18:33.846 [ INFO] Bad rows will be written to /home/bobr/match.csv.20190129181833.bad
18:18:33.854 [ INFO] Starting import of 1 files
18:18:34.127 [ INFO] Starting 4 segment readers
18:18:34.128 [ INFO] Opening transaction #1...
18:18:34.183 [ INFO] Opened transaction #1
18:18:34.234 [ INFO] Flushing last of 8606 rows for transaction #1
18:18:34.238 [ INFO] Committing 8606 rows into transaction #1...
18:18:34.256 [ INFO] Committed transaction #1 after a total of 292604 bytes and 8606 good rows
18:18:34.275 [ INFO] READ:305.8KB(193.0KB/s). ROWS G/B: 8606/0( 5.30K/s). WRITE:285.7KB(180.3KB/s).  TIME E/R:   0:00:01/ --:--:--
18:18:34.276 [ INFO] SUCCESSFUL BULK LOAD: Loaded 8606 good rows in   0:00:01 (READ: 193.0KB/s WRITE: 180.3KB/s)

Load the same table from a CSV file with a single header line

$ ybload -d premdb --username bobr -t match -W --format csv --num-header-lines 1 /home/bobr/match_with_header.csv 
Password for user bobr: 
...
18:39:24.888 [ INFO] Gathering metadata on input files
18:39:25.163 [ INFO] Assuming source encoding matches database server encoding: LATIN9
18:39:25.233 [ INFO] Starting 1 source PreReaders
18:39:25.256 [ INFO] Auto-detected line separator = '\n'
18:39:25.263 [ INFO] Configuration (record/field separation):
   --format                           : CSV
   --delimiter                        : ,
   --linesep                          : \n
   --quote-char                       : "
   --escape-char                      : "
   --trim-white                       : false
   --skip-blank-lines                 : true
   --on-missing-field                 : ERROR
   --on-extra-field                   : ERROR
   --on-unescaped-embedded-quote      : ERROR
   Internal scanner                   : RecordScannerQuote_1_1
18:39:25.290 [ INFO] Using database locale: C
18:39:26.047 [ INFO] Bad rows will be written to /home/bobr/match_with_header.csv.20190129183926.bad
18:39:26.055 [ INFO] Starting import of 1 files
18:39:26.326 [ INFO] Starting 4 segment readers
18:39:26.327 [ INFO] Opening transaction #1...
18:39:26.374 [ INFO] Opened transaction #1
18:39:26.411 [ INFO] Flushing last of 8606 rows for transaction #1
18:39:26.414 [ INFO] Committing 8606 rows into transaction #1...
18:39:26.433 [ INFO] Committed transaction #1 after a total of 292604 bytes and 8606 good rows
18:39:26.452 [ INFO] READ:305.8KB(194.7KB/s). ROWS G/B: 8606/0( 5.35K/s). WRITE:285.7KB(181.9KB/s).  TIME E/R:   0:00:01/ --:--:--
18:39:26.453 [ INFO] SUCCESSFUL BULK LOAD: Loaded 8606 good rows in   0:00:01 (READ: 194.7KB/s WRITE: 181.9KB/s)

Load the same table from multiple CSV files

$ ybload -d premdb --username bobr -t match -W --format csv /home/bobr/premdata/matchdata/*
Password for user bobr: 
...
18:50:39.278 [ INFO] Auto-detected line separator = '\n'
18:50:39.288 [ INFO] Configuration (record/field separation):
   --format                           : CSV
   --delimiter                        : ,
   --linesep                          : \n
   --quote-char                       : "
   --escape-char                      : "
   --trim-white                       : false
   --skip-blank-lines                 : true
   --on-missing-field                 : ERROR
   --on-extra-field                   : ERROR
   --on-unescaped-embedded-quote      : ERROR
   Internal scanner                   : RecordScannerQuote_1_1
18:50:39.320 [ INFO] Using database locale: C
18:50:40.119 [ INFO] Bad rows will be written to /home/bobr/ybload.20190129185040.bad
18:50:40.127 [ INFO] Starting import of 3 files
18:50:40.416 [ INFO] Starting 4 segment readers
18:50:40.417 [ INFO] Opening transaction #1...
18:50:40.461 [ INFO] Opened transaction #1
18:50:40.557 [ INFO] Flushing last of 25818 rows for transaction #1
18:50:40.561 [ INFO] Committing 25818 rows into transaction #1...
18:50:40.586 [ INFO] Committed transaction #1 after a total of 877812 bytes and 25818 good rows
18:50:40.607 [ INFO] READ:917.4KB(570.9KB/s). ROWS G/B: 25818/0(15.69K/s). WRITE:857.2KB(533.5KB/s).  TIME E/R:   0:00:01/ --:--:--
18:50:40.607 [ INFO] SUCCESSFUL BULK LOAD: Loaded 25818 good rows in   0:00:01 (READ: 570.9KB/s WRITE: 533.5KB/s)

Load a table with console logging turned off

This example turns off console logging, sends the output to a file, and sets the log level for the output that goes to that file.

$ ybload -d premdb --username bobr -t match -W --format csv --quiet --logfile load_match_log --logfile-log-level INFO /home/bobr/match.csv 
Password for user bobr: 

$ more load_match_log
2019-01-30 16:00:21.830 [ INFO] <main>  ABOUT CLIENT:
   app.cli_args         = -d premdb --username bobr -t match -W --format csv --quiet --logfile load_match_log --logfile-log-level INFO /home/bobr/match.csv 
   app.name_and_version = ybload version 2.0.0-10711
   java.home            = /usr/lib/jvm/java-8-oracle/jre
   java.version         = 1.8.0_101
   jvm.memory           = 981.50 MB (max=14.22 GB)
   jvm.name_and_version = Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
   jvm.options          = -Xmx16g, -Xms1g, -ea, -Dapp.name=ybload, -Dapp.pid=12595, -Dapp.repo=/usr/lib/ybtools/lib, -Dapp.home=/usr/lib/ybtools, -Dbasedir=/usr/lib/ybtools
   jvm.vendor           = Oracle Corporation
   os.name_and_version  = Linux 4.4.0-31-generic (amd64)

2019-01-30 16:00:21.831 [ INFO] <main>  Logfile written to load_match_log
...
2019-01-30 16:00:23.580 [ INFO] <main>  SUCCESSFUL BULK LOAD: Loaded 8606 good rows in   0:00:01 (READ: 179.2KB/s WRITE: 167.5KB/s)

Stop and roll back a load if >3 bad rows are found and log the errors

$ ybload -d premdb --username bobr -t team --max-bad-rows 3 -W --quiet --logfile load_team_log --logfile-log-level error /home/bobr/badteam.csv 
Password for user bobr: 

brumsby@brumsby:~$ more load_team_log
2019-01-30 16:10:46.559 [ERROR] <main>  Unable to separate row into fields for parsing: Too few fields (8 < 9)
Bad row = "4,53,Barnsley,Tykes,Barnsley,Oakwell Stadium,23009,0"
2019-01-30 16:10:46.561 [ERROR] <main>  Unable to separate row into fields for parsing: Too few fields (8 < 9)
Bad row = "18,67,Everton,Toffees,Liverpool,Goodison Park,40221,38.124"
2019-01-30 16:10:46.561 [ERROR] <main>  Unable to separate row into fields for parsing: Too few fields (8 < 9)
Bad row = "31,80,Oldham Athletic,Latics,Oldham,Boundary Park,13309,0"
2019-01-30 16:10:46.562 [ERROR] <main>  Unable to separate row into fields for parsing: Too few fields (8 < 9)
Bad row = "32,81,Portsmouth,Pompey,Portsmouth,Fratton Park,21100,0"
2019-01-30 16:10:46.562 [FATAL] <main>  Number of bad rows exceeded maximum (3)
2019-01-30 16:10:46.620 [FATAL] <main>  FAILED BULK LOAD: Nothing committed

Load a table from a text file with a special escape character

$ ybload -d premdb --username bobr -W -t matchstats --format text --escape-char '/' /home/bobr/premdata/matchstats.txt 
Password for user bobr:
...

Load a table from a pipe-delimited file with literal "null" values

$ ybload -d premdb --username bobr -W -t team --format text --delimiter '|' --nullmarker 'null' /home/bobr/premdata/team2019.txt
Password for user bobr: 
...

Load a table from a CSV file that has an extra field

This example loads a table from a source file that has an extra field at the end of each line, which is removed (not loaded). To achieve this result, the ybload command specifies the --parse-header-line and --on-extra-field remove options.

The season table has four columns, but the source file in this example has five values per line (comma-separated):

$ more season_five_cols.csv
SEASONID,SEASON_NAME,NUMTEAMS,WINNERS
25,2016-2017,20,Chelsea,380
26,2017-2018,20,Manchester City,380
27,2018-2019,20,Manchester City,380
28,2019-2020,20,Liverpool,380
29,2020-2021,20,Manchester City,380
30,2021-2022,20,Manchester City,380

The following load detects the mismatch in the number of columns versus fields, but succeeds in loading the table by removing the last field for each row:

$ ybload -d premdb -t season -U yellowbrick -W --delimiter ',' --format csv --parse-header-line --on-extra-field remove season_five_cols.csv
Password for user yellowbrick: 
...
Configuration (record/field separation):
   format                  : CSV
   delimiter               : ,
   linesep                 : \n
   quoteChar               : "
   escapeChar              : "
   trimWhite               : false
   trimTrailingWhite       : false
   skipBlankLines          : true
   onMissingField          : ERROR
   onExtraField            : REMOVE
   onUnescapedEmbeddedQuote: ERROR
...

The resulting rows look like this:

premdb=# select * from season where seasonid between 25 and 30;
 seasonid | season_name | numteams |     winners     
----------+-------------+----------+-----------------
      25 | 2016-2017   |       20 | Chelsea
      26 | 2017-2018   |       20 | Manchester City
      27 | 2018-2019   |       20 | Manchester City
      28 | 2019-2020   |       20 | Liverpool
      29 | 2020-2021   |       20 | Manchester City
      30 | 2021-2022   |       20 | Manchester City
(6 rows)

Load a table from a CSV file with a missing field

This example loads a table from a source file that has a missing field at the end of each line, which is filled in with NULL values. The season table has four columns, but the source file in this example has only three values (comma-separated). The winners column is defined in the header, but no winners values are present in each line:

$ more season_three_cols.csv
SEASONID,SEASON_NAME,NUMTEAMS,WINNERS
25,2016-2017,20
26,2017-2018,20
27,2018-2019,20
28,2019-2020,20
29,2020-2021,20
30,2021-2022,20

To load data from this file, the ybload command specifies the --parse-header-line and --on-missing-field supplynull options. The table is loaded with NULL values for the missing column.

$ ybload -d premdb -t season -U yellowbrick -W --delimiter ',' --format csv --parse-header-line --on-missing-field supplynull season_three_cols.csv
Password for user yellowbrick: 
...
Configuration (record/field separation):
   format                  : CSV
   delimiter               : ,
   linesep                 : \n
   quoteChar               : "
   escapeChar              : "
   trimWhite               : false
   trimTrailingWhite       : false
   skipBlankLines          : true
   onMissingField          : SUPPLYNULL
   onExtraField            : ERROR
   onUnescapedEmbeddedQuote: ERROR
...

The resulting rows look like this:

premdb=# select * from season where seasonid between 25 and 30;
 seasonid | season_name | numteams | winners 
----------+-------------+----------+---------
      25 | 2016-2017   |       20 | [NULL]
      26 | 2017-2018   |       20 | [NULL]
      27 | 2018-2019   |       20 | [NULL]
      28 | 2019-2020   |       20 | [NULL]
      29 | 2020-2021   |       20 | [NULL]
      30 | 2021-2022   |       20 | [NULL]
(6 rows)

Load a calendar table with a specific date format

The --date-style option is set to MDY, which means that ybload can parse and load incoming date values such as 08-13-2018.

$ ybload -d premdb --username bobr -t -W season_date --format text --delimiter '|' --date-style MDY /home/ybdata/season_dates.txt
Database login password:
...

Load a table with a case-sensitive name

Note the -t syntax required to load a table that was created with a quoted identifier:

$ ybload -d premdb --username bobr -W -t \"AwayTeam\" --delimiter ',' /home/premdata/awayteam.csv
...
12:19:02.492 [ INFO] Bulk Loading /home/premdata/awayteam.csv into TABLE **"public"."AwayTeam"** at 2018-10-27T12:19:01.757-07:00[America/Los_Angeles]
...