LOAD TABLE
Bulk load a table from external storage by using a SQL command (instead of the ybload
client).
LOAD TABLE table_name
[ SOURCE FIELDS (col1 type [, col2 type ] [, ...]) ]
[ FROM ('string1' [,'string2'] [, ...]) ]
EXTERNAL LOCATION location_name
[ EXTERNAL FORMAT { format_name |
(TYPE file_type WITH (option value [, option value ] [, ...])) } ]
[ WITH (option value [, option value ] [, ...]) ]
- table_name
Name of a regular user table that is the target for the load. The source files will be read from external storage and written to the table. To run
LOAD TABLE
commands, you must haveBULK LOAD
privilege on the database andINSERT
privilege on the target table.- SOURCE FIELDS
Specify all of the fields found in the source files and their data types. These fields map to the columns in the target table that you want to load. You must specify all of the fields that appear in the source files, in order, but you can load a table that only contains a subset of those fields. For example, you can load a table with four columns from a source file that has five or more fields. You do not have to specify the same order of columns in the
CREATE TABLE
statement. The load operation can detect which columns correspond to which source fields by name and data type.In this example, the source file contains five fields:
... source fields(seasonid smallint, matchday timestamp, htid smallint, atid smallint, moment varchar(5)) ...
You can load a table that has one or more columns that correspond to these fields. See also LOAD TABLE Examples.
Conversely, you can also load a table that has columns that do not have corresponding fields in the source file if
DEFAULT
column constraints are specified in theCREATE TABLE
statement. For example, you can load a two-column table from a source file that has only one field. In this case,SOURCE FIELDS
identifies the one field that exists in the source file; the second column in the table is loaded with the specified default value. For example:create table t1 (c1 int, c2 int not null default 1); ... load table t1 source fields (c1 int) ...
Note: The
SOURCE FIELDS
option is not available in the Load Data Assistant.- FROM('string1','string2', ...)
Specify one or more character strings that represent a list of the S3 source files or a "prefix" that identifies those files. These files will be loaded from the specified external location, which is a bucket name.
If you do not specify a
FROM
list, the load operation will scan the entire bucket and load any files that have a schema that matches the columns in the target table. Therefore, it is recommended that you specify aFROM
list or prefix.If a string exactly matches the key of a specific file stored in the bucket, only that S3 file will be loaded. Otherwise, the string is checked for matches against the beginning part of the key of any objects in the bucket. All of the files that match will be loaded.
For example, the following commands all produce the same results, loading the same two files:
premdb=# load table match from ('/premdb/match.csv','/premdb/matchcopy.csv') external location premdbs3data; INSERT 0 17212 premdb=# load table match from ('/premdb/match') external location premdbs3data; INSERT 0 17212
Note: Wildcard patterns are not supported for S3 source files. For example, you cannot use the following
FROM
clause to load files that have a prefix ofmatch
:from ('/premdb/mat*.csv')
Instead you could use:
from ('/premdb/mat')
- EXTERNAL LOCATION
Name of an external location object. See CREATE EXTERNAL LOCATION.
- EXTERNAL FORMAT
Name of an external format object. See CREATE EXTERNAL FORMAT. Alternatively, you can define the format in the
TYPE
clause. You can also omit theEXTERNAL FORMAT
clause completely and use the default format settings for the table.- TYPE...WITH
Specify the source file type:
CSV
,TEXT
,PARQUET
, orBCP
. You must enclose theTYPE
clause inside parentheses, and the followingWITH
clause inside its own set of parentheses:(TYPE format_type WITH(...))
The
WITH
clause that followsTYPE
defines formatting load options, such as the line separator, field delimiter, and field definitions for different data types. Each name-value pair must consist of a valid option (in this context) and a valid setting for that option. For example:(type csv with (skip_blank_lines true, delimiter '|'))
- WITH
The
WITH
clause at the end of theLOAD TABLE
statement defines general load-processing options. Each name-value pair must consist of a valid option (in this context) and a valid setting for that option. These are separate and distinct from the formatting load options defined by the external format object or in theTYPE WITH()
list.For example:
with (source_compression 'gz')
See LOAD TABLE Options.
LOAD TABLE Examples
Load a table from a single CSV file:
load table match
from ('/premdb/match.csv')
external location premdbs3data;
INSERT 0 8606
Load a table from files that match the prefix /premdb/newmat
:
load table newmatchstats
from ('/premdb/newmat')
external location premdbs3data;
INSERT 0 24785280
Load a table from a single GZ-compressed file:
load table newmatchstats
from ('/premdb/newmatchstats25mil.gz')
external location premdbs3data
with(source_compression 'gz');
INSERT 0 24785280
Attempt to load some data that is not in the specified file format (given the specified prefix):
load table newmatchstats
from ('/premdb/newmatchstats')
external location premdbs3data
with(source_compression 'gz');
ERROR: [bulksvc-load] Unable to open channel for s3://ybpremdb/premdb/newmatchstats25mil.csv CAUSED BY: Input is not in the .gz format
Load a table and reference an existing external format object:
load table newmatchstats
from ('/premdb/newmatchstats25mil.csv')
external location premdbs3data
external format premdbs3format;
Load a table and define the external format within the statement, instead of referencing an existing external format object:
load table newmatchstats
from ('/premdb/newmatchstats25mil.csv')
external location premdbs3data
external format (type csv with(skip_blank_lines true, delimiter ','));
Create a new table by selecting four columns from an existing five-column table. Then delete all of its rows. Reload the table by using the same source file that loaded the original five-column table. Specify all five source fields in the LOAD TABLE
statement.
create table matchday as
select seasonid, matchday, htid, atid from newmatchstats;
SELECT 24785280
delete from matchday;
DELETE 24785280
load table matchday
source fields(seasonid smallint,
matchday timestamp,
htid smallint,
atid smallint,
moment varchar)
from('/premdb/newmatch')
external location premdbs3data;
INSERT 24785280
Create a table with two columns, c1
and c2
:
create table t1(c1 int, c2 int);
Load the table from a source file that contains three fields that are in a different order from the corresponding columns in the CREATE TABLE
statement:
load table t1
source fields(c3 int, c2 int, c1 int)
...
LOAD TABLE Options
The WITH
clause in the LOAD TABLE
statement supports some load-processing options and source options. These options are listed alphabetically.
- log_level INFO | OFF | WARN | ERROR | DEBUG | TRACE
Specify the logging level for the load output. The default level is
INFO
. Other options provide more verbose output.- max_bad_rows NUMBER
Set the maximum number of rejected rows that a load will tolerate before aborting and starting to roll back the transaction. The default is
0
(no bad rows); make sure that the source data you are about to load is clean and consistent if you use the default.Note:
max_bad_rows 100
means 100 bad rows are allowed; the load will fail on the 101st bad row.- read_sources_concurrently ALWAYS | NEVER | ALLOW | <number>
Define the behavior for reading source files in parallel:
ALWAYS
,NEVER
,ALLOW
(the default), or a specific number of source files. See "ybload Advanced Processing Options" in the main Yellowbrick documentation.- send_compression ALWAYS | NEVER | AUTO
Define the compression policy for data buffers before they are sent from the client to the worker nodes. This option is equivalent to the
--compression-policy
option inybload
. See "ybload Advanced Processing Options" in the main Yellowbrick documentation.- source_compression GZ | BZIP2 | XZ | PACK200 | LZ4
This option explicitly defines the type of compression used by source data. This option applies to loads from all supported source types and overrides other compression detection methods.
Order of Precedence for Format Specifications
There are three different ways to specify the format for a LOAD TABLE
operation. The following order of precedence is enforced:
- In-line format options, as specified with
EXTERNAL FORMAT (TYPE...WITH(…))
- An in-line format name, as specified with
EXTERNAL FORMAT format_name
- A default format, if one exists, for the external location object specified in the
LOAD TABLE
command
If no format options are specified at all, the behavior is similar to that of ybload
operations, which means that some values are automatically detected and the load is attempted with those settings. (The load may fail in this case.)
Parent topic:SQL Commands