Appearance
External Format Load Options
This section contains a list of the bulk load options that are supported in the WITH
clause for the CREATE EXTERNAL FORMAT
command:
See also LOAD TABLE Options and WITH ADVANCED
in CREATE EXTERNAL STORAGE.
These options are a subset of the options supported by the ybload
client tool and in general have the same function. See "ybload Field Options" in the main Yellowbrick documentation.
Note: Unlike their ybload
client counterparts, these options do not have leading --
characters, and they use underscores instead of hyphens (-
) in their names. For example, trim_white
is equivalent to --trim-white
.
Conventions
Values for load options specified in SQL statements follow these conventions:
- Numeric values, including year values, require quotes. For example:
y2base '1990'
- Boolean values do not require quotes:
trim_white true
- Strings require quotes:
delimiter '|'
- JSON formatting: use single quotes (if embedded single quotes appear inside the JSON, escape them):
'{json text}'
CREATE EXTERNAL FORMAT Options
The WITH
clause in the CREATE EXTERNAL FORMAT
statement supports flat file record format options and field-parsing options. These options are listed alphabetically.
- comment_char ASCII_CHARACTER
Define the comment character that is used in source files. The default value is the pound sign (#). The value must be a single ASCII character or a valid escape sequence. If
skip_comment_lines
is set, commented rows in the source file are skipped, not rejected. The comment character may be a single-byte or multi-byte character.- convert_ascii_control true | false
Allow the caret control ASCII character
^@
to be parsed as a single-byte representation ofnull
within a character string. The default isfalse
.- convert_c_escape true | false
Convert (or do not convert) C-style escape sequences when they appear in
CHAR
andVARCHAR
fields. For example, the two-character sequence\t
can be converted into a single tab character (0x09
) or it can be loaded unchanged.This option does not apply to fields with data types other than
CHAR
andVARCHAR
. For example, in anINTEGER
field, the value4\x35
, where\x3
is the C-style escape sequence for the number 5, returns an error:'\' is invalid digit
. (However, theybsql \copy
command will convert C-style escape sequences in all fields.)If
type text
is used,convert_c_escape true
is the default behavior. Iftype csv
ortype bcp
is used,convert_c_escape false
is the default.- date_style YMD | DMY | MDY | MONDY | DMONY | Y2MD | DMY2 | MDY2 | MONDY2 | DMONY2
Define the date format in terms of the order and style of the date parts, to avoid any ambiguity in parsing dates. For example,
date_style 'MDY'
means accept date values such as08-13-2016
.If you specify one of the
Y2
values, such asDMY2
, you must also specify ay2base
value. For example:y2base 1990, date_style 'DMY2'
You can only specify the
date_style
option once in a single command.If
date_style
is not specified, the defaults derive fromdate_field_options
.- default_field_options {JSON formatted <FieldOptions>}
Specify the field options to use for fields that do not have their own per-field options or per-type options. See the main Yellowbrick documentation for the
ybload
client.- delimiter SPECIAL_CHARACTER
Define the special character that the source file uses as its field delimiter. All of the following are supported:
- A single Unicode character (may be multi-byte)
- A hex value that corresponds to any ASCII control code (such as
0x1f
) - A valid character escape sequence
If you do not specify a field delimiter,
ybload
auto-detects it from among the following characters:,
|
\t
\us
\uFFFA
- emptymarker STRING_WITH_ESCAPES
Define a marker that matches the character used to represent an empty string in your source file (an empty
CHAR
orVARCHAR
field). The value may be a string, a character, or a valid escape sequence.For example:
emptymarker '\t'
.Note: To use
"
as theemptymarker
character, you have to quote strings with a different character ("
is the default). For example:emptymarker '\"', quote-char '|'
If null bytes exist within strings, use
on_zero_char remove
.- encoding ENCODING_NAME
Specify the source file's encoding (character set:
UTF8
,LATIN9
,ISO-8859-1
,UTF16
). If no encoding is specified, the encoding of the source file is assumed to match the database encoding. If the encoding of the source file does not match the encoding of the destination database,ybload
will use Java transcoding to "translate" the source data to the encoding of the destination database.Check the documentation for the version of Java you are using on the
ybload
client system. For example, if you are using the Oracle JVM, check this list of supported character sets. Names from either of the first two columns can be used with the--encoding
option.For the fastest load performance under
ybload
, export the data from the source system in the character set of the destination database. For example, load exportedUTF8
data into aUTF8
database. Any required transcoding is likely to have an impact on load performance.Note: To load UTF-16 data, create a
UTF8
database and set theencoding
option toUTF16
.- escape_char SPECIAL_CHARACTER
Specify an escape character: any single Unicode character (may be multi-byte). The behavior of this option depends on the
TYPE
:CSV
: used to escape embedded quote characters inside quoted fields. The default escape character is"
.TEXT
: used to escape embedded field delimiters and line separators. The default escape character is\
.BCP
: disallowed withBCP
format, which does not use escape characters.
- *_field_options
Options specific to individual data types. See "ybload Field Options" in the main Yellowbrick documentation for the
ybload
client.- ignore_emptymarker_case true | false
The first option supports case-insensitive empty-marker comparisons. If you use
--ignore-emptymarker-case
, values ofEMPTY
,Empty
, andempty
are all recognized as empty values when--emptymarker
is set toNONE
. These options apply globally; you cannot specify them per data type or per field. They are applied globally regardless of how the--emptymarker
option was specified.- ignore_nullmarker_case true | false
Set to
true
, this option supports case-insensitive null-marker comparisons. Values ofNULL
,Null
, andnull
are all recognized as null values whennullmarker
is set toNULL
. This option applies globally; you cannot specify it per data type or per field. It is applied globally regardless of thenullmarker
setting.- linesep LINE_SEPARATOR
Define the line separator (or row separator) that is used in source files: any single Unicode character (may be multi-byte) or the
\r\n
escape sequence.If you do not specify a line separator,
ybload
auto-detects it from among the following characters:\n
\r\n
\rs
\uFFFB
- locale LOCALE_NAME
The name of the locale to use for parsing dates, timestamps, and so on. If the locale is not specified, the database locale is assumed to be
C
.Locale names must be of the following form:
<language code>[_<country code>[_<variant code>]]
For example:
locale en locale en_US locale zh_CN
Variant codes are rarely used; for details, see the Java documentation.
- nullmarker STRING
Define a string that matches the string used to represent
null
in your source file. If this option is unspecified or set to an empty string, adjacent delimiters without text between them are parsed asNULL
values. This option supports valid escape sequences.You cannot load data in which more than one value in the same column is intended to be parsed as
NULL
. However, you can load data in which different columns have different values forNULL
.- num_header_lines [ NUMBER ]
Ignore one or more header lines at the top of the source file. (The first or only header line in a file is typically a list of field names.) The default is
0
. The maximum number is5
. If you specify multiple source files, the first line is skipped in all of them.- on_extra_field [ remove | error ]
Specify
remove
to allow rows to be loaded when the source file contains more fields than the table has columns. The removed fields must be at the end of the line in the source file. Specifyerror
to reject rows with extra fields.error
is the default.- on_invalid_char [ replace | error ]
Specify the action to take when the source file contains characters that cannot be represented in the database (or characters that are invalid in the source file itself):
- The replacement character for a LATIN9 database is the question mark:
0x3F
(?
). - The replacement character for a UTF8 database is the Unicode replacement character:
U+FFFD
(a question mark in a diamond).
The default is
replace
.- The replacement character for a LATIN9 database is the question mark:
- on_missing_field [ supplynull | error ]
Allow rows to be loaded when the source file contains fewer fields than the table has columns. Specify
supplynull
to load the missing columns at the end of the row withnull
values. Specifyerror
to reject rows with missing fields.error
is the default.- on_string_too_long truncate | error
Truncate character strings that are longer than the specified column width (or return an error). The default is
ERROR
.- on_unescaped_embedded_quote [ preserve | error ]
Preserve or return an error when unescaped quotes are found inside quoted strings. The default is
error
.- on_zero_char [ remove | error ]
Remove null bytes (
0x00
characters) that appear within strings inCHAR
andVARCHAR
fields (or return an error). The default isERROR
.- per_field_options {JSON Object}
Specify parsing options for individual fields. See "ybload Field Options" in the main Yellowbrick documentation for the
ybload
client.- quote_char SPECIAL_CHARACTER
Define the character that is used in source files to quote field values that contain embedded delimiters. Specify any single Unicode character (may be multi-byte). The default is
"
. This option only applies when you are loading fromTYPE CSV
source files.- skip_blank_lines true | false
Skip blank lines in the source file (
true
) or detect blank rows as bad rows (false
). The default value istrue
.- skip_comment_lines true | false
Skip lines that are commented out in the source file or detect them as bad rows. The default value is
false
(do not skip).Note: If you use
skip_comment_lines
, make sure the data does not contain any lines that begin with the comment character (which defaults to#
but may bet set to a different character with thecomment_char
option).- trim_white true | false
Trim or retain leading and trailing whitespace characters in each field. With
type csv
, whitespace characters inside of quotes are always preserved. The default value isfalse
(preserved).- TYPE
The file type must be specified in the
TYPE
clause of the CREATE EXTERNAL FORMAT statement, not inside theWITH
clause.- y2base YEAR
Define the pivot year (such as
1970
) for two-digit year values (such as97
and16
). For example, if--y2base
is set to1970
, two-digit years of70
and later are assumed to be in the 1900s. Values of69
and earlier are assumed to be in the 2000s.If you want to specify one of the
Y2
values fordate_style
, such asDMY2
, you must also specify ay2base
value.