Skip to content

External Format Load Options

This section contains a list of the bulk load options that are supported in the WITH clause for the CREATE EXTERNAL FORMAT command:

See also LOAD TABLE Options and WITH ADVANCED in CREATE EXTERNAL STORAGE.

These options are a subset of the options supported by the ybload client tool and in general have the same function. See "ybload Field Options" in the main Yellowbrick documentation.

Note: Unlike their ybload client counterparts, these options do not have leading -- characters, and they use underscores instead of hyphens (-) in their names. For example, trim_white is equivalent to --trim-white.

Conventions

Values for load options specified in SQL statements follow these conventions:

  • Numeric values, including year values, require quotes. For example:
y2base '1990'
  • Boolean values do not require quotes:
trim_white true
  • Strings require quotes:
delimiter '|'
  • JSON formatting: use single quotes (if embedded single quotes appear inside the JSON, escape them):
'{json text}'

CREATE EXTERNAL FORMAT Options

The WITH clause in the CREATE EXTERNAL FORMAT statement supports flat file record format options and field-parsing options. These options are listed alphabetically.

comment_char ASCII_CHARACTER

Define the comment character that is used in source files. The default value is the pound sign (#). The value must be a single ASCII character or a valid escape sequence. If skip_comment_lines is set, commented rows in the source file are skipped, not rejected. The comment character may be a single-byte or multi-byte character.

convert_ascii_control true | false

Allow the caret control ASCII character ^@ to be parsed as a single-byte representation of null within a character string. The default is false.

convert_c_escape true | false

Convert (or do not convert) C-style escape sequences when they appear in CHAR and VARCHAR fields. For example, the two-character sequence \t can be converted into a single tab character (0x09) or it can be loaded unchanged.

This option does not apply to fields with data types other than CHAR and VARCHAR. For example, in an INTEGER field, the value 4\x35, where \x3 is the C-style escape sequence for the number 5, returns an error: '\' is invalid digit. (However, the ybsql \copy command will convert C-style escape sequences in all fields.)

If type text is used, convert_c_escape true is the default behavior. If type csv or type bcp is used, convert_c_escape false is the default.

date_style YMD | DMY | MDY | MONDY | DMONY | Y2MD | DMY2 | MDY2 | MONDY2 | DMONY2

Define the date format in terms of the order and style of the date parts, to avoid any ambiguity in parsing dates. For example, date_style 'MDY' means accept date values such as 08-13-2016.

If you specify one of the Y2 values, such as DMY2, you must also specify a y2base value. For example:

y2base 1990, date_style 'DMY2'

You can only specify the date_style option once in a single command.

If date_style is not specified, the defaults derive from date_field_options.

default_field_options {JSON formatted <FieldOptions>}

Specify the field options to use for fields that do not have their own per-field options or per-type options. See the main Yellowbrick documentation for the ybload client.

delimiter SPECIAL_CHARACTER

Define the special character that the source file uses as its field delimiter. All of the following are supported:

  • A single Unicode character (may be multi-byte)
  • A hex value that corresponds to any ASCII control code (such as 0x1f)
  • A valid character escape sequence

If you do not specify a field delimiter, ybload auto-detects it from among the following characters:

  • ,
  • |
  • \t
  • \us
  • \uFFFA
emptymarker STRING_WITH_ESCAPES

Define a marker that matches the character used to represent an empty string in your source file (an empty CHAR or VARCHAR field). The value may be a string, a character, or a valid escape sequence.

For example: emptymarker '\t'.

Note: To use " as the emptymarker character, you have to quote strings with a different character (" is the default). For example:

emptymarker '\"', quote-char '|'

If null bytes exist within strings, use on_zero_char remove.

encoding ENCODING_NAME

Specify the source file's encoding (character set: UTF8, LATIN9, ISO-8859-1, UTF16). If no encoding is specified, the encoding of the source file is assumed to match the database encoding. If the encoding of the source file does not match the encoding of the destination database, ybload will use Java transcoding to "translate" the source data to the encoding of the destination database.

Check the documentation for the version of Java you are using on the ybload client system. For example, if you are using the Oracle JVM, check this list of supported character sets. Names from either of the first two columns can be used with the --encoding option.

For the fastest load performance under ybload, export the data from the source system in the character set of the destination database. For example, load exported UTF8 data into a UTF8 database. Any required transcoding is likely to have an impact on load performance.

Note: To load UTF-16 data, create a UTF8 database and set the encoding option to UTF16.

escape_char SPECIAL_CHARACTER

Specify an escape character: any single Unicode character (may be multi-byte). The behavior of this option depends on the TYPE:

  • CSV: used to escape embedded quote characters inside quoted fields. The default escape character is ".
  • TEXT: used to escape embedded field delimiters and line separators. The default escape character is \.
  • BCP: disallowed with BCP format, which does not use escape characters.
*_field_options

Options specific to individual data types. See "ybload Field Options" in the main Yellowbrick documentation for the ybload client.

ignore_emptymarker_case true | false

The first option supports case-insensitive empty-marker comparisons. If you use --ignore-emptymarker-case, values of EMPTY, Empty, and empty are all recognized as empty values when --emptymarker is set to NONE. These options apply globally; you cannot specify them per data type or per field. They are applied globally regardless of how the --emptymarker option was specified.

ignore_nullmarker_case true | false

Set to true, this option supports case-insensitive null-marker comparisons. Values of NULL, Null, and null are all recognized as null values when nullmarker is set to NULL. This option applies globally; you cannot specify it per data type or per field. It is applied globally regardless of the nullmarker setting.

linesep LINE_SEPARATOR

Define the line separator (or row separator) that is used in source files: any single Unicode character (may be multi-byte) or the \r\n escape sequence.

If you do not specify a line separator, ybload auto-detects it from among the following characters:

  • \n
  • \r\n
  • \rs
  • \uFFFB
locale LOCALE_NAME

The name of the locale to use for parsing dates, timestamps, and so on. If the locale is not specified, the database locale is assumed to be C.

Locale names must be of the following form:

<language code>[_<country code>[_<variant code>]]

For example:

locale en
locale en_US
locale zh_CN

Variant codes are rarely used; for details, see the Java documentation.

nullmarker STRING

Define a string that matches the string used to represent null in your source file. If this option is unspecified or set to an empty string, adjacent delimiters without text between them are parsed as NULL values. This option supports valid escape sequences.

You cannot load data in which more than one value in the same column is intended to be parsed as NULL. However, you can load data in which different columns have different values for NULL.

num_header_lines [ NUMBER ]

Ignore one or more header lines at the top of the source file. (The first or only header line in a file is typically a list of field names.) The default is 0. The maximum number is 5. If you specify multiple source files, the first line is skipped in all of them.

on_extra_field [ remove | error ]

Specify remove to allow rows to be loaded when the source file contains more fields than the table has columns. The removed fields must be at the end of the line in the source file. Specify error to reject rows with extra fields. error is the default.

on_invalid_char [ replace | error ]

Specify the action to take when the source file contains characters that cannot be represented in the database (or characters that are invalid in the source file itself):

  • The replacement character for a LATIN9 database is the question mark: 0x3F (?).
  • The replacement character for a UTF8 database is the Unicode replacement character: U+FFFD (a question mark in a diamond).

The default is replace.

on_missing_field [ supplynull | error ]

Allow rows to be loaded when the source file contains fewer fields than the table has columns. Specify supplynull to load the missing columns at the end of the row with null values. Specify error to reject rows with missing fields. error is the default.

on_string_too_long truncate | error

Truncate character strings that are longer than the specified column width (or return an error). The default is ERROR.

on_unescaped_embedded_quote [ preserve | error ]

Preserve or return an error when unescaped quotes are found inside quoted strings. The default is error.

on_zero_char [ remove | error ]

Remove null bytes (0x00 characters) that appear within strings in CHAR and VARCHAR fields (or return an error). The default is ERROR.

per_field_options {JSON Object}

Specify parsing options for individual fields. See "ybload Field Options" in the main Yellowbrick documentation for the ybload client.

quote_char SPECIAL_CHARACTER

Define the character that is used in source files to quote field values that contain embedded delimiters. Specify any single Unicode character (may be multi-byte). The default is ". This option only applies when you are loading from TYPE CSV source files.

skip_blank_lines true | false

Skip blank lines in the source file (true) or detect blank rows as bad rows (false). The default value is true.

skip_comment_lines true | false

Skip lines that are commented out in the source file or detect them as bad rows. The default value is false (do not skip).

Note: If you use skip_comment_lines, make sure the data does not contain any lines that begin with the comment character (which defaults to # but may bet set to a different character with the comment_char option).

trim_white true | false

Trim or retain leading and trailing whitespace characters in each field. With type csv, whitespace characters inside of quotes are always preserved. The default value is false (preserved).

TYPE

The file type must be specified in the TYPE clause of the CREATE EXTERNAL FORMAT statement, not inside the WITH clause.

y2base YEAR

Define the pivot year (such as 1970) for two-digit year values (such as 97 and 16). For example, if --y2base is set to 1970, two-digit years of 70 and later are assumed to be in the 1900s. Values of 69 and earlier are assumed to be in the 2000s.

If you want to specify one of the Y2 values for date_style, such as DMY2, you must also specify a y2base value.