ybload Options

This section contains detailed descriptions of the bulk load options. Note the following points about the format of these options and their values:

Options are listed in alphabetical order for quick reference.
Option names are shown in lowercase; they are case-sensitive.
Specific valid option values (such as true and false) are shown in lowercase. Variables for option values, such as STRING, are shown in uppercase. Option values are not case-sensitive.
The requirements for quoting option strings vary by client platform. Values are shown without quotes, but quotes are sometimes required. For example, if you specify the # character in a Linux shell, it must be enclosed by single or double quotes. If you are using a Windows client, see also Escaping Quotes in Windows Clients.
(stdin) : Load from stdin (standard input). To load from stdin instead of named source files, enter the single-dash character (-) at the end of the ybload command. This must be the last character on the command line, and it must be prefixed with -- to indicate that option parsing is complete. For example: ./ybload -d premdb --username bobr -t match -- -

@file

Specify a file that includes a set of options and values to use for the load. See Saving Load Options to a File.

--bad-row-file STRING

Define the location and name of a file where rejected rows will be logged. If you do not specify this option, the file defaults to the name SOURCE_FILENAME.TIMESTAMP.bad and is written to a location that is reported early in the console or log file output. If the file already exists, it is truncated.

Note: When object storage is used for loading data, bad rows must be written to the local file system. Specifying a bad row file in an object storage location, such as an S3 bucket, is not supported.

--bigint-field-options

See ybload Field Options.

--boolean-field-options

See ybload Field Options.

--bytes-per-transaction

Set the number of bytes to load per commit. The default is 1TB (1099511627776 bytes). You can set this option to modify the frequency of commits when bulk loads are running. This option works in conjunction with --rows-per-transaction. The threshold that is met first is applied.

--cacert STRING

Customize trust with secured communication; use this option in combination with the --secured option. Enter the file name of a custom PEM-encoded certificate or the file name and password for a Java KeyStore (JKS).

For PEM format, the file must be named with a .pem, .cert, .cer, .crt, or .key extension. For example:

--cacert cacert.pem

For JKS format, files are always password-protected. Use the following format:

--cacert yellowbrick.jks:changeit

where the : character separates the file name from the password.

--char-field-options

See ybload Field Options.

--comment-char ASCII_CHARACTER

Define the comment character that is used in source files. The default value is the pound sign (#). The value must be a single ASCII character or a valid escape sequence. If --skip-comment-lines is set, commented rows in the source file are skipped, not rejected, and do not appear in the bad rows file.

When the --format option is set, the comment character may be a single-byte or multi-byte character.

--compression-policy

Define the compression policy for data buffers before they are sent from the client to the worker nodes. See ybload Advanced Processing Options.

--convert-ascii-control, --no-convert-ascii-control

Allow the caret control ASCII character ^@ to be parsed as a single-byte representation of null within a character string. The default is --no-convert-ascii-control. See also ybload Field Options.

--convert-c-escape, --no-convert-c-escape

Convert (or do not convert) C-style escape sequences when they appear in CHAR and VARCHAR fields. For example, the two-character sequence \t can be converted into a single tab character (0x09) or it can be loaded unchanged.

This option does not apply to fields with data types other than CHAR and VARCHAR. For example, in an INTEGER field, the value 4\x35, where \x3 is the C-style escape sequence for the number 5, returns an error: '\' is invalid digit. (However, the ybsql \copy command will convert C-style escape sequences in all fields.)

This option is only supported when the --format option is specified. If --format text is used, --convert-c-escape is the default behavior. If --format csv or --format bcp is used, --no-convert-c-escape is the default.

--date-field-options

See ybload Date Formats.

--date-style YMD | DMY | MDY | MONDY | DMONY | Y2MD | DMY2 | MDY2 | MONDY2 | DMONY2

Define the date format in terms of the order and style of the date parts, to avoid any ambiguity in parsing dates. For example, --date-style MDY means accept date values such as 08-13-2016.

If you specify one of the Y2 values, such as DMY2, you must also specify a --y2base value.

You can specify the --date-style option more than once in a single ybload command. For example:

--y2base 1990 --date-style MDY --date-style MONDY --date-style DMY2

Note that you can use JSON-style formatting to abbreviate this syntax.

If --date-style is not specified, the defaults derive from --date-field-options.

--dbname, -d

Name of the destination database. See Setting up a Database Connection.

--decimal-field-options

See ybload Field Options.

--default-field-options {JSON formatted <FieldOptions>}

Specify the field options to use for fields that do not have their own per-field options or per-type options. See ybload Field Options. The default value is {}.

--delimiter SPECIAL_CHARACTER

Define the special character that the source file uses as its field delimiter. All of the following are supported:

A single Unicode character
A hex value that corresponds to any ASCII control code (such as 0x1f)
A valid escape sequence

When the --format option is set, the delimiter may be a multi-byte character.

If you do not specify a field delimiter, ybload auto-detects it from among the following characters:

,
|
\t
\us
\uFFFA

See also Loading Generated Key Values.

--format CSV | TEXT | BCP

Specify the formatting style of the incoming data (how the source files were formatted by the export or unload tool that produced them). See also Setting the --format Option. In particular, this option refers to how field delimiters are protected in the data:

CSV: Delimiters in field values were protected by wrapping the field values in quotes. For example: "2012, Mini Cooper S, ALL4"
TEXT: Delimiters in field values were protected by preceding them with a backslash escape character. For example: 2012\, Mini Cooper S\, ALL4
BCP: Delimiters in fields were not protected (for compatibility with the Microsoft SQL Server bcp tool). For example: 2012, Mini Cooper S, ALL4

--help

Return basic usage information for the ybload command and its options.

--help-advanced

Return more advanced usage information for the ybload command and its options.

--host

Host name. See Setting up a Database Connection.

--ignore-emptymarker-case, --no-ignore-emptymarker-case

The first option supports case-insensitive empty-marker comparisons. If you use --ignore-emptymarker-case, values of EMPTY, Empty, and empty are all recognized as empty values when --emptymarker is set to NONE. These options apply globally; you cannot specify them per data type or per field. They are applied globally regardless of how the --emptymarker option was specified.

--ignore-nullmarker-case, --no-ignore-nullmarker-case

The first option supports case-insensitive null-marker comparisons. If you use --ignore-nullmarker-case, values of NULL, Null, and null are all recognized as null values when --nullmarker is set to NULL. These options apply globally; you cannot specify them per data type or per field. They are applied globally regardless of how the --nullmarker option was specified.

--initial-connection-timeout NUMBER

Number of seconds to wait for initial connections to the database. The default is 120. This timeout option ensures that ybload does not wait too long when there is a basic problem with incorrect connection parameters, or a firewall is preventing connection errors from reaching the client. To turn off this option and allow an unlimited wait time, set this option to 0.

--integer-field-options

See ybload Field Options.

--ip-field-options

See ybload Field Options.

--java-version

Return the Java version that is running on the client system. The client tools require the 64-bit version of Java 8 (also known as Java 1.8). Java 9 and 10 are not supported.

--key-field-names SQL_NAME,...

Comma separated list of source field names to be used as key fields for --write-op operations. Key field names must be declared as NOT NULL in the CREATE TABLE statement.

If this option is not specified, primary keys either specified with the --field-defs option or declared by the target table are used. If no key fields are specified and no primary key exists, an attempt to load a table with --write-op delete, update, or upsert will result in an error.

Field names are case-insensitive unless they are quoted (same behavior as standard SQL). For example:

--key-field-names user_name,"deptId"

The --parse-header-line option cannot be used in conjunction with the --key-field-names option.

--linesep LINE_SEPARATOR

Define the line separator (or row separator) that is used in source files: any single Unicode character or the \r\n escape sequence.

If you do not specify a line separator, ybload auto-detects it from among the following characters:

\n
\r\n
\rs
\uFFFB

When the --format option is set, the line separator may be a multi-byte character.

--locale LOCALE_NAME | -L LOCALE_NAME

The name of the locale to use for parsing dates, timestamps, and so on. If the locale is not specified, the database locale is assumed to be C.

Locale names must be of the following form:

<language code>[_<country code>[_<variant code>]]

For example:

--locale en
--locale en_US
--locale zh_CN

Variant codes are rarely used; for details, see the Java documentation.

--logfile STRING

Specify the name and location of a log file for the load operation. If the specified file already exists, it will be truncated. If this option is not specified, no log file is written. When you specify this --logfile option, also specify a --logfile-log-level value other than OFF.

Note: When object storage is used for loading or unloading data, logs must be written to the local file system. Specifying a log file in an object storage location, such as an S3 bucket, is not supported.

Specify the logging level for the default console output. The default level is INFO. (Use the --logfile-log-level option to specify the logging level for a named log file.)

Specify the logging level for a given log file (as defined with the --logfile option). If the level is not specified, it defaults to the --log-level value. You must specify a --logfile-log-level value other than OFF when you specify the --logfile option.

--mac-field-options

Specify field options for MACADDR and MACADDR8 fields. See ybload Field Options.

--max-bad-rows NUMBER

Set the maximum number of rejected rows that ybload will tolerate before aborting and starting to roll back the transaction. (Additional bad rows may be reported before the transaction has finished aborting.) The default is -1, which means do not abort and roll back, regardless of the number of bad rows.

Note: --max-bad-rows 32 means 32 bad rows are allowed; the load will fail on the 33rd bad row.

Rejected rows are written to the location specified with the --bad-row-file option.

--nullmarker STRING

Define a string that matches the string used to represent null in your source file. If this option is unspecified or set to an empty string, adjacent delimiters without text between them are parsed as NULL values. This option supports valid escape sequences.

You cannot load data in which more than one value in the same column is intended to be parsed as NULL. However, you can load data in which different columns have different values for NULL. See Specifying NULL Behavior for Different Columns and NULL and Empty Markers.

--num-cores NUMBER_MIN_1

Set the number of CPU cores ybload will attempt to use to 1 or greater. By default, ybload tries to saturate all of the CPU cores on the host computer. For example, the default values for --num-readers and the number of actual concurrent readers used by --read-sources-concurrently ALWAYS are both based on the number of cores on the host computer. The primary purpose of this setting is to restrict ybload to the use of fewer resources when it shares the host computer with other programs.

--num-header-lines [ NUMBER ]

Ignore one or more header lines at the top of the source file. (The first or only header line in a file is typically a list of field names.) The default is 0, or 1 if --parse-header-line is specified. The maximum number is 5. If you specify multiple source files, the first line is skipped in all of them.

--num-readers

Define the behavior for reading input files in parallel. See ybload Advanced Processing Options.

Note: When you are loading from named pipes, setting this option equal to the number of pipes is recommended. Not loading from pipes concurrently may cause a deadlock with the program that the pipes communicate with.

--num-parsers-per-reader

Define multiple parsers per reader. See ybload Advanced Processing Options.

--object-storage-*

See ybload Object Storage Options. These options apply to loads from Azure, AWS S3, and S3-compatible systems.

--on-extra-field [ REMOVE | ERROR ]

Specify REMOVE to allow rows to be loaded when the source file (CSV, TEXT, or BCP) contains either more fields than the columns defined in the header, as detected when the --parse-header-line option is used, or more fields than the columns defined with the --field-defs option.

If neither --parse-header-line nor --field-defs is specified, --on-extra-field has no effect; ybload expects the source file to have the same number of fields as the number of columns in the target table. The --parse-header-line option, if specified, overrides the --field-defs option.

This option takes effect only when the extra fields are at the end of the line in the source file. For example:

colA, colB, colC            <- header
data1, data2, data3, data4  <- on-extra-field

The default behavior (ERROR) is to reject rows with extra fields at the end of the line.

--on-invalid-char [ REPLACE | ERROR ]

Specify the action to take when the source file contains characters that cannot be represented in the database (or characters that are invalid in the source file itself):

The replacement character for a LATIN9 database is the question mark: 0x3F (?).
The replacement character for a UTF8 database is the Unicode replacement character: U+FFFD (a question mark in a diamond).

The default is REPLACE.

--on-missing-field [ SUPPLYNULL | ERROR ]

Specify SUPPLYNULL to allow rows to be loaded when the source file (CSV, TEXT, or BCP) contains either fewer fields than the columns defined in the header, as detected when the --parse-header-line option is used, or fewer fields than the columns defined with the --field-defs option.

If neither --parse-header-line nor --field-defs is specified, --on-missing-field has no effect; ybload expects the source file to have the same number of fields as the number of columns in the target table. The --parse-header-line option, if specified, overrides the --field-defs option.

This option takes effect only when the missing fields are at the end of the line in the source file. For example:

colA, colB, colC    <- header
data1, data2        <- on-missing-field

The default behavior (ERROR) is to reject rows with missing fields at the end of the line.

--on-string-too-long TRUNCATE | ERROR

Truncate character strings that are longer than the specified column width (or return an error). The default is ERROR.

--on-unescaped-embedded-quote [ PRESERVE | ERROR ]

Preserve or return an error when unescaped quotes are found inside quoted strings. The default is ERROR.

--on-zero-char [ REMOVE | ERROR ]

Remove null bytes (0x00 characters) that appear within strings in CHAR and VARCHAR fields (or return an error). The default is ERROR.

--parse-header-line

Use the header line at the top of the source file (a list of field names) to determine the column names in the target table. This option overrides the --field-defs option and can be used in combination with the --on-extra-field and --on-missing-field options.

--password

Prompt for the ybload user's password. See Setting up a Database Connection. The user who runs the load must have INSERT permissions on the table (but does not have to own the table).

--per-field-options {JSON Object}

Specify parsing options for individual fields. See ybload Field Options.

--port

Port number. See Setting up a Database Connection.

--quiet

Do not write any output to console. This option is suitable for cron invocations of the loader. If --quiet is specified, you must also specify --logfile.

--quote-char SPECIAL_CHARACTER

Define the character that is used in source files to quote field values that contain embedded delimiters. Specify any single Unicode character. The default is ". This option only applies when you are using --format CSV.

When --format CSV is set, the delimiter may be a multi-byte character.

--read-sources-concurrently ALWAYS | NEVER | ALLOW | <NUMBER>

Define the behavior for reading source files in parallel: ALWAYS, NEVER, ALLOW (the default), or a specific number of source files. See ybload Advanced Processing Options.

Note: If you are loading from multiple pipes and this option is not set to ALWAYS, ybload sets it to ALWAYS and returns an INFO message stating that change. (Not reading from pipes concurrently could cause a deadlock with the program that they communicate with.)

--real-field-options

See ybload Field Options.

--resume-partial-load-from-offset NUMBER

Skip the specified number of bytes in the source file in order to resume a failed bulk load. The default value is 0. See Resuming a Partial Load.

--rows-per-transaction

Set the number of rows to load per commit. The default is set to the maximum number of rows that can be loaded (263 – 1). You can reduce this number to increase the frequency of commits when bulk loads are running. This option works in conjunction with --bytes-per-transaction. The threshold that is met first is applied.

--secured

Use SSL/TLS to secure all communications. The default is not secured. See also Enabling and Verifying SSL/TLS Encryption.

--skip-blank-lines, --no-skip-blank-lines

Skip blank lines in the source file (true) or detect blank rows as bad rows (false). The default value is true.

--skip-comment-lines, --no-skip-comment-lines

Skip lines that are commented out in the source file or detect them as bad rows. The default value is false (do not skip).

Note: If you use --skip-comment-lines, make sure the data does not contain any lines that begin with the comment character (which defaults to # but may bet set to a different character with the --comment-char option).

--smallint-field-options

See ybload Field Options.

--source-compression GZ | BZIP2 | XZ | PACK200 | LZ4

This option explicitly defines the type of compression used by source data and is primarily intended for data sources that do not have file names (such as STDIN). This option applies to loads from all supported source types and overrides other compression detection methods.

--table (or -t)

Name the target table to load and optionally its schema: schema_name.table_name. If you do not specify the schema name, the table is assumed to be in the public schema. The schema of the target table is not based on the user's search_path, regardless of how it is set. The table must exist in the database that you are connecting to for the load. (Do not try to specify the database name as part of the table name. Use the YBDATABASE environment variable, -d, or --dbname.)

Note: If you used a quoted case-sensitive identifier to create the target table, you must quote the table name and escape the quotes in the ybload command line. For example, if your table is named PremLeagueStats:

-t \"PremLeagueStats\"

--time-field-options

See ybload Field Options.

--timestamp-field-options

See ybload Field Options.

--timestamptz-field-options

See ybload Field Options.

--trim-white, --no-trim-white

Trim or retain leading and trailing whitespace characters in each field. With --format csv, whitespace characters inside of quotes are always preserved. The default value is false (preserved).

--truncate-before-insert, --no-truncate-before-insert

Truncate the target table before inserting new rows. Use this option if you want to ensure that the load runs against an empty table. The TRUNCATE statement is executed in the first transaction of the bulk load session. If the bulk load fails before that transaction is committed, the TRUNCATE is rolled back.

To use this option, you must have DELETE, TRUNCATE, and INSERT privileges on the target table (in addition to BULK LOAD privilege on the database).

--username

Database username. See Setting up a Database Connection.

--uuid-field-options

See ybload Field Options.

--version

Display the version of ybload you are running (as part of ybtools). This option is not intended to be combined with other options. For example:

$ ybload --version
ybload version 1.2.2-5563

--write-op INSERT | UPDATE | DELETE | UPSERT

How the source rows should be written to the target table.

INSERT (the default): insert source rows as new rows (append them to the table).
UPDATE: update rows that match source rows.
DELETE: delete rows that match source rows.
UPSERT: update rows that match source rows, and insert new rows.

Updates, deletes, and upserts require primary keys or declared key fields to match against the target table. See --key-field-names.

By default, source rows are inserted (appended to the table) and duplicate rows are not discarded. Incoming rows are assumed to be unique. Updates and upserts may require duplicate handling. See the --duplicate-handler option.

Note: Make sure the user running ybload has BULK LOAD privileges on the database and appropriate additional privileges on the target table:

INSERT for default loads
UPDATE and SELECT for --write-op update and --write-op upsert loads
DELETE and SELECT for --write-op delete loads

--y2base YEAR

Define the pivot year (such as 1970) for two-digit year values (such as 97 and 16). For example, if --y2base is set to 1970, two-digit years of 70 and later are assumed to be in the 1900s. Values of 69 and earlier are assumed to be in the 2000s.

If you want to specify one of the Y2 values for --date-style, such as DMY2, you must also specify a --y2base value.

Parent topic:ybload Command

Setting Up Encryption

Creating an Alert Endpoint

config

Secure Connections for ODBC/JDBC Clients and ybsql

LDAP Authentication

Synchronizing Users and Groups

Running a Bulk Load

Bulk Load Examples

Loading Tables from Object Storage

Loading from Amazon S3

Loading from Azure Blob Storage

ybload Command

Setting Up the ybrelay Service

Setting up and Running a Spark Job

ybunload Command

Creating WLM Resource Pools

Creating WLM Rules

Rule Examples

DECIMAL

Data Type Casting

SQL String Constants

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

SELECT

Subqueries

SQL Conditions

String Functions

ENCRYPT_KS

Pattern Matching

SQL Operators and Pattern Matching Functions

Regular Expression Details

Datetime Functions

Mathematical Functions

Aggregate Functions

Window Functions

Conditional Expressions

Formatting Functions

Type-Safe Casting Functions

JSON Functions

System Functions

Network Address Functions

ybload Options ​

ybload Options