Appearance
External Format Load Options
This section contains a list of the bulk load options that are supported in the WITH clause for the CREATE EXTERNAL FORMAT command:
See also LOAD TABLE Options and WITH ADVANCED in CREATE EXTERNAL STORAGE.
These options are a subset of the options supported by the ybload client tool and in general have the same function. See "ybload Field Options" in the main Yellowbrick documentation.
Note: Unlike their ybload client counterparts, these options do not have leading -- characters, and they use underscores instead of hyphens (-) in their names. For example, trim_white is equivalent to --trim-white.
Conventions
Values for load options specified in SQL statements follow these conventions:
- Numeric values, including year values, require quotes. For example:
y2base '1990'- Boolean values do not require quotes:
trim_white true- Strings require quotes:
delimiter '|'- JSON formatting: use single quotes (if embedded single quotes appear inside the JSON, escape them):
'{json text}'CREATE EXTERNAL FORMAT Options
The WITH clause in the CREATE EXTERNAL FORMAT statement supports flat file record format options and field-parsing options. These options are listed alphabetically.
- comment_char ASCII_CHARACTER
Define the comment character that is used in source files. The default value is the pound sign (#). The value must be a single ASCII character or a valid escape sequence. If
skip_comment_linesis set, commented rows in the source file are skipped, not rejected. The comment character may be a single-byte or multi-byte character.- convert_ascii_control true | false
Allow the caret control ASCII character
^@to be parsed as a single-byte representation ofnullwithin a character string. The default isfalse.- convert_c_escape true | false
Convert (or do not convert) C-style escape sequences when they appear in
CHARandVARCHARfields. For example, the two-character sequence\tcan be converted into a single tab character (0x09) or it can be loaded unchanged.This option does not apply to fields with data types other than
CHARandVARCHAR. For example, in anINTEGERfield, the value4\x35, where\x3is the C-style escape sequence for the number 5, returns an error:'\' is invalid digit. (However, theybsql \copycommand will convert C-style escape sequences in all fields.)If
type textis used,convert_c_escape trueis the default behavior. Iftype csvortype bcpis used,convert_c_escape falseis the default.- date_style YMD | DMY | MDY | MONDY | DMONY | Y2MD | DMY2 | MDY2 | MONDY2 | DMONY2
Define the date format in terms of the order and style of the date parts, to avoid any ambiguity in parsing dates. For example,
date_style 'MDY'means accept date values such as08-13-2016.If you specify one of the
Y2values, such asDMY2, you must also specify ay2basevalue. For example:y2base 1990, date_style 'DMY2'You can only specify the
date_styleoption once in a single command.If
date_styleis not specified, the defaults derive fromdate_field_options.- default_field_options {JSON formatted <FieldOptions>}
Specify the field options to use for fields that do not have their own per-field options or per-type options. See the main Yellowbrick documentation for the
ybloadclient.- delimiter SPECIAL_CHARACTER
Define the special character that the source file uses as its field delimiter. All of the following are supported:
- A single Unicode character (may be multi-byte)
- A hex value that corresponds to any ASCII control code (such as
0x1f) - A valid character escape sequence
If you do not specify a field delimiter,
ybloadauto-detects it from among the following characters:,|\t\us\uFFFA
- emptymarker STRING_WITH_ESCAPES
Define a marker that matches the character used to represent an empty string in your source file (an empty
CHARorVARCHARfield). The value may be a string, a character, or a valid escape sequence.For example:
emptymarker '\t'.Note: To use
"as theemptymarkercharacter, you have to quote strings with a different character ("is the default). For example:emptymarker '\"', quote-char '|'If null bytes exist within strings, use
on_zero_char remove.- encoding ENCODING_NAME
Specify the source file's encoding (character set:
UTF8,LATIN9,ISO-8859-1,UTF16). If no encoding is specified, the encoding of the source file is assumed to match the database encoding. If the encoding of the source file does not match the encoding of the destination database,ybloadwill use Java transcoding to "translate" the source data to the encoding of the destination database.Check the documentation for the version of Java you are using on the
ybloadclient system. For example, if you are using the Oracle JVM, check this list of supported character sets. Names from either of the first two columns can be used with the--encodingoption.For the fastest load performance under
ybload, export the data from the source system in the character set of the destination database. For example, load exportedUTF8data into aUTF8database. Any required transcoding is likely to have an impact on load performance.Note: To load UTF-16 data, create a
UTF8database and set theencodingoption toUTF16.- escape_char SPECIAL_CHARACTER
Specify an escape character: any single Unicode character (may be multi-byte). The behavior of this option depends on the
TYPE:CSV: used to escape embedded quote characters inside quoted fields. The default escape character is".TEXT: used to escape embedded field delimiters and line separators. The default escape character is\.BCP: disallowed withBCPformat, which does not use escape characters.
- *_field_options
Options specific to individual data types. See "ybload Field Options" in the main Yellowbrick documentation for the
ybloadclient.- ignore_emptymarker_case true | false
The first option supports case-insensitive empty-marker comparisons. If you use
--ignore-emptymarker-case, values ofEMPTY,Empty, andemptyare all recognized as empty values when--emptymarkeris set toNONE. These options apply globally; you cannot specify them per data type or per field. They are applied globally regardless of how the--emptymarkeroption was specified.- ignore_nullmarker_case true | false
Set to
true, this option supports case-insensitive null-marker comparisons. Values ofNULL,Null, andnullare all recognized as null values whennullmarkeris set toNULL. This option applies globally; you cannot specify it per data type or per field. It is applied globally regardless of thenullmarkersetting.- linesep LINE_SEPARATOR
Define the line separator (or row separator) that is used in source files: any single Unicode character (may be multi-byte) or the
\r\nescape sequence.If you do not specify a line separator,
ybloadauto-detects it from among the following characters:\n\r\n\rs\uFFFB
- locale LOCALE_NAME
The name of the locale to use for parsing dates, timestamps, and so on. If the locale is not specified, the database locale is assumed to be
C.Locale names must be of the following form:
<language code>[_<country code>[_<variant code>]]For example:
locale en locale en_US locale zh_CNVariant codes are rarely used; for details, see the Java documentation.
- nullmarker STRING
Define a string that matches the string used to represent
nullin your source file. If this option is unspecified or set to an empty string, adjacent delimiters without text between them are parsed asNULLvalues. This option supports valid escape sequences.You cannot load data in which more than one value in the same column is intended to be parsed as
NULL. However, you can load data in which different columns have different values forNULL.- num_header_lines [ NUMBER ]
Ignore one or more header lines at the top of the source file. (The first or only header line in a file is typically a list of field names.) The default is
0. The maximum number is5. If you specify multiple source files, the first line is skipped in all of them.- on_extra_field [ remove | error ]
Specify
removeto allow rows to be loaded when the source file contains more fields than the table has columns. The removed fields must be at the end of the line in the source file. Specifyerrorto reject rows with extra fields.erroris the default.- on_invalid_char [ replace | error ]
Specify the action to take when the source file contains characters that cannot be represented in the database (or characters that are invalid in the source file itself):
- The replacement character for a LATIN9 database is the question mark:
0x3F(?). - The replacement character for a UTF8 database is the Unicode replacement character:
U+FFFD(a question mark in a diamond).
The default is
replace.- The replacement character for a LATIN9 database is the question mark:
- on_missing_field [ supplynull | error ]
Allow rows to be loaded when the source file contains fewer fields than the table has columns. Specify
supplynullto load the missing columns at the end of the row withnullvalues. Specifyerrorto reject rows with missing fields.erroris the default.- on_string_too_long truncate | error
Truncate character strings that are longer than the specified column width (or return an error). The default is
ERROR.- on_unescaped_embedded_quote [ preserve | error ]
Preserve or return an error when unescaped quotes are found inside quoted strings. The default is
error.- on_zero_char [ remove | error ]
Remove null bytes (
0x00characters) that appear within strings inCHARandVARCHARfields (or return an error). The default isERROR.- per_field_options {JSON Object}
Specify parsing options for individual fields. See "ybload Field Options" in the main Yellowbrick documentation for the
ybloadclient.- quote_char SPECIAL_CHARACTER
Define the character that is used in source files to quote field values that contain embedded delimiters. Specify any single Unicode character (may be multi-byte). The default is
". This option only applies when you are loading fromTYPE CSVsource files.- skip_blank_lines true | false
Skip blank lines in the source file (
true) or detect blank rows as bad rows (false). The default value istrue.- skip_comment_lines true | false
Skip lines that are commented out in the source file or detect them as bad rows. The default value is
false(do not skip).Note: If you use
skip_comment_lines, make sure the data does not contain any lines that begin with the comment character (which defaults to#but may bet set to a different character with thecomment_charoption).- trim_white true | false
Trim or retain leading and trailing whitespace characters in each field. With
type csv, whitespace characters inside of quotes are always preserved. The default value isfalse(preserved).- TYPE
The file type must be specified in the
TYPEclause of the CREATE EXTERNAL FORMAT statement, not inside theWITHclause.- y2base YEAR
Define the pivot year (such as
1970) for two-digit year values (such as97and16). For example, if--y2baseis set to1970, two-digit years of70and later are assumed to be in the 1900s. Values of69and earlier are assumed to be in the 2000s.If you want to specify one of the
Y2values fordate_style, such asDMY2, you must also specify ay2basevalue.