Skip to content

LOAD TABLE

Bulk load a table from external storage by using a SQL command (instead of the ybload client).

LOAD TABLE table_name
[ SOURCE FIELDS (col1 type [, col2 type ] [, ...]) ]
[ FROM ('string1' [,'string2'] [, ...]) ]
EXTERNAL LOCATION location_name 
[ EXTERNAL FORMAT { format_name | 
  (TYPE file_type WITH (option value [, option value ] [, ...])) } ]
[ WITH (option value [, option value ] [, ...]) ]
table_name

Name of a regular user table that is the target for the load. The source files will be read from external storage and written to the table. To run LOAD TABLE commands, you must have BULK LOAD privilege on the database and INSERT privilege on the target table.

SOURCE FIELDS

Specify all of the fields found in the source files and their data types. These fields map to the columns in the target table that you want to load. You must specify all of the fields that appear in the source files, in order, but you can load a table that only contains a subset of those fields. For example, you can load a table with four columns from a source file that has five or more fields. You do not have to specify the same order of columns in the CREATE TABLE statement. The load operation can detect which columns correspond to which source fields by name and data type.

In this example, the source file contains five fields:

...
source fields(seasonid smallint,
  matchday timestamp,
  htid smallint,
  atid smallint,
  moment varchar(5))
...

You can load a table that has one or more columns that correspond to these fields. See also LOAD TABLE Examples.

Conversely, you can also load a table that has columns that do not have corresponding fields in the source file if DEFAULT column constraints are specified in the CREATE TABLE statement. For example, you can load a two-column table from a source file that has only one field. In this case, SOURCE FIELDS identifies the one field that exists in the source file; the second column in the table is loaded with the specified default value. For example:

create table t1 (c1 int, c2 int not null default 1);
...
load table t1 
source fields (c1 int)
...

Note: The SOURCE FIELDS option is not available in the Load Data Assistant.

FROM('string1','string2', ...)

Specify one or more character strings that represent a list of the S3 source files or a "prefix" that identifies those files. These files will be loaded from the specified external location, which is a bucket name.

If you do not specify a FROM list, the load operation will scan the entire bucket and load any files that have a schema that matches the columns in the target table. Therefore, it is recommended that you specify a FROM list or prefix.

If a string exactly matches the key of a specific file stored in the bucket, only that S3 file will be loaded. Otherwise, the string is checked for matches against the beginning part of the key of any objects in the bucket. All of the files that match will be loaded.

For example, the following commands all produce the same results, loading the same two files:

premdb=# load table match from ('/premdb/match.csv','/premdb/matchcopy.csv') external location premdbs3data;
INSERT 0 17212

premdb=# load table match from ('/premdb/match') external location premdbs3data;
INSERT 0 17212

Note: Wildcard patterns are not supported for S3 source files. For example, you cannot use the following FROM clause to load files that have a prefix of match:

from ('/premdb/mat*.csv')

Instead you could use:

from ('/premdb/mat')
EXTERNAL LOCATION

Name of an external location object. See CREATE EXTERNAL LOCATION.

EXTERNAL FORMAT

Name of an external format object. See CREATE EXTERNAL FORMAT. Alternatively, you can define the format in the TYPE clause. You can also omit the EXTERNAL FORMAT clause completely and use the default format settings for the table.

TYPE...WITH

Specify the source file type: CSV, TEXT, PARQUET, or BCP. You must enclose the TYPE clause inside parentheses, and the following WITH clause inside its own set of parentheses:

(TYPE format_type WITH(...))

The WITH clause that follows TYPE defines formatting load options, such as the line separator, field delimiter, and field definitions for different data types. Each name-value pair must consist of a valid option (in this context) and a valid setting for that option. For example:

(type csv with (skip_blank_lines true, delimiter '|'))

See External Format Load Options.

WITH

The WITH clause at the end of the LOAD TABLE statement defines general load-processing options. Each name-value pair must consist of a valid option (in this context) and a valid setting for that option. These are separate and distinct from the formatting load options defined by the external format object or in the TYPE WITH() list.

For example:

with (source_compression 'gz')

See LOAD TABLE Options.

LOAD TABLE Examples

Load a table from a single CSV file:

load table match 
from ('/premdb/match.csv') 
external location premdbs3data;
INSERT 0 8606

Load a table from files that match the prefix /premdb/newmat:

load table newmatchstats 
from ('/premdb/newmat') 
external location premdbs3data;
INSERT 0 24785280

Load a table from a single GZ-compressed file:

load table newmatchstats 
from ('/premdb/newmatchstats25mil.gz') 
external location premdbs3data 
with(source_compression 'gz');
INSERT 0 24785280

Attempt to load some data that is not in the specified file format (given the specified prefix):

load table newmatchstats 
from ('/premdb/newmatchstats') 
external location premdbs3data 
with(source_compression 'gz');
ERROR:  [bulksvc-load] Unable to open channel for s3://ybpremdb/premdb/newmatchstats25mil.csv CAUSED BY: Input is not in the .gz format

Load a table and reference an existing external format object:

load table newmatchstats 
from ('/premdb/newmatchstats25mil.csv') 
external location premdbs3data 
external format premdbs3format;

Load a table and define the external format within the statement, instead of referencing an existing external format object:

load table newmatchstats 
from ('/premdb/newmatchstats25mil.csv') 
external location premdbs3data 
external format (type csv with(skip_blank_lines true, delimiter ','));

Create a new table by selecting four columns from an existing five-column table. Then delete all of its rows. Reload the table by using the same source file that loaded the original five-column table. Specify all five source fields in the LOAD TABLE statement.

create table matchday as 
select seasonid, matchday, htid, atid from newmatchstats;
SELECT  24785280

delete from matchday;
DELETE  24785280

load table matchday
source fields(seasonid smallint,
  matchday timestamp,
  htid smallint,
  atid smallint,
  moment varchar)
from('/premdb/newmatch')
external location premdbs3data;
INSERT  24785280

Create a table with two columns, c1 and c2:

create table t1(c1 int, c2 int);

Load the table from a source file that contains three fields that are in a different order from the corresponding columns in the CREATE TABLE statement:

load table t1 
source fields(c3 int, c2 int, c1 int)
...

LOAD TABLE Options

The WITH clause in the LOAD TABLE statement supports some load-processing options and source options. These options are listed alphabetically.

log_level INFO | OFF | WARN | ERROR | DEBUG | TRACE

Specify the logging level for the load output. The default level is INFO. Other options provide more verbose output.

max_bad_rows NUMBER

Set the maximum number of rejected rows that a load will tolerate before aborting and starting to roll back the transaction. The default is 0 (no bad rows); make sure that the source data you are about to load is clean and consistent if you use the default.

Note: max_bad_rows 100 means 100 bad rows are allowed; the load will fail on the 101st bad row.

read_sources_concurrently ALWAYS | NEVER | ALLOW | <number>

Define the behavior for reading source files in parallel: ALWAYS, NEVER, ALLOW (the default), or a specific number of source files. See "ybload Advanced Processing Options" in the main Yellowbrick documentation.

send_compression ALWAYS | NEVER | AUTO

Define the compression policy for data buffers before they are sent from the client to the worker nodes. This option is equivalent to the --compression-policy option in ybload. See "ybload Advanced Processing Options" in the main Yellowbrick documentation.

source_compression GZ | BZIP2 | XZ | PACK200 | LZ4

This option explicitly defines the type of compression used by source data. This option applies to loads from all supported source types and overrides other compression detection methods.

Order of Precedence for Format Specifications

There are three different ways to specify the format for a LOAD TABLE operation. The following order of precedence is enforced:

  1. In-line format options, as specified with EXTERNAL FORMAT (TYPE...WITH(…))
  2. An in-line format name, as specified with EXTERNAL FORMAT format_name
  3. A default format, if one exists, for the external location object specified in the LOAD TABLE command

If no format options are specified at all, the behavior is similar to that of ybload operations, which means that some values are automatically detected and the load is attempted with those settings. (The load may fail in this case.)

Parent topic:SQL Commands