Appearance
ybload Options
This section contains detailed descriptions of the bulk load options. Note the following points about the format of these options and their values:
Options are listed in alphabetical order for quick reference.
Option names are shown in lowercase; they are case-sensitive.
Specific valid option values (such as
trueandfalse) are shown in lowercase. Variables for option values, such asSTRING, are shown in uppercase. Option values are not case-sensitive.The requirements for quoting option strings vary by client platform. Values are shown without quotes, but quotes are sometimes required. For example, if you specify the
#character in a Linux shell, it must be enclosed by single or double quotes. If you are using a Windows client, see also Escaping Quotes in Windows Clients.See also Common Options in ybtools.
(stdin) : Load from
stdin(standard input). To load fromstdininstead of named source files, enter the single-dash character (-) at the end of theybloadcommand. This must be the last character on the command line, and it must be prefixed with--to indicate that option parsing is complete. For example:./ybload -d premdb --username bobr -t match -- -
- @file
Specify a file that includes a set of options and values to use for the operation. See Saving Load Options to a File and Saving Unload Options to a File.
You can use this option and the
--select-fileoption in the sameybunloadcommand.- --bad-row-file STRING
Define the location and name of a file where rejected rows will be logged. If you do not specify this option, the file defaults to the name
SOURCE_FILENAME.TIMESTAMP.badand is written to a location that is reported early in the console or log file output. If the file already exists, it is truncated.Note: When object storage is used for loading data, bad rows must be written to the local file system. Specifying a bad row file in an object storage location, such as an S3 bucket, is not supported.
- --bigint-field-options
See ybload Field Options.
- --boolean-field-options
See ybload Field Options.
- --bytes-per-transaction
Set the number of bytes to load per commit. The default is 1TB (1099511627776 bytes). You can set this option to modify the frequency of commits when bulk loads are running. This option works in conjunction with
--rows-per-transaction. The threshold that is met first is applied.- --char-field-options
See ybload Field Options.
- --comment-char ASCII_CHARACTER
Define the comment character that is used in source files. The default value is the pound sign (#). The value must be a single ASCII character or a valid escape sequence. If
--skip-comment-linesis set, commented rows in the source file are skipped, not rejected, and do not appear in the bad rows file.When the
--formatoption is set, the comment character may be a single-byte or multi-byte character.- --compression-policy
Define the compression policy for data buffers before they are sent from the client to the compute nodes. See ybload Advanced Processing Options.
- --convert-ascii-control, --no-convert-ascii-control
Allow the caret control ASCII character
^@to be parsed as a single-byte representation ofnullwithin a character string. The default is--no-convert-ascii-control. See also ybload Field Options.- --convert-c-escape, --no-convert-c-escape
Convert (or do not convert) C-style escape sequences when they appear in
CHARandVARCHARfields. For example, the two-character sequence\tcan be converted into a single tab character (0x09) or it can be loaded unchanged.This option does not apply to fields with data types other than
CHARandVARCHAR. For example, in anINTEGERfield, the value4\x35, where\x3is the C-style escape sequence for the number 5, returns an error:'\' is invalid digit. (However, the ybsql \copy command will convert C-style escape sequences in all fields.)This option is only supported when the
--formatoption is specified. If--format textis used,--convert-c-escapeis the default behavior. If--format csvor--format bcpis used,--no-convert-c-escapeis the default.- --date-field-options
See ybload Date Formats.
- --date-style YMD | DMY | MDY | MONDY | DMONY | Y2MD | DMY2 | MDY2 | MONDY2 | DMONY2
Define the date format in terms of the order and style of the date parts, to avoid any ambiguity in parsing dates. For example,
--date-style MDYmeans accept date values such as08-13-2016.If you specify one of the
Y2values, such asDMY2, you must also specify a--y2basevalue.You can specify the
--date-styleoption more than once in a singleybloadcommand. For example:--y2base 1990 --date-style MDY --date-style MONDY --date-style DMY2Note that you can use JSON-style formatting to abbreviate this syntax.
If
--date-styleis not specified, the defaults derive from--date-field-options.- --decimal-field-options
See ybload Field Options.
Note that
ybloadaccepts both the dot character (.) and the comma character (,) as the separator for decimal values.- --default-field-options {JSON formatted <FieldOptions>}
Specify the field options to use for fields that do not have their own per-field options or per-type options. See ybload Field Options. The default value is
{}.- --delimiter SPECIAL_CHARACTER
Define the special character that the source file uses as its field delimiter. All of the following are supported:
- A single Unicode character
- A hex value that corresponds to any ASCII control code (such as
0x1f) - A valid escape sequence
When the
--formatoption is set, the delimiter may be a multi-byte character.If you do not specify a field delimiter,
ybloadauto-detects it from among the following characters:,|\t\us\uFFFA
See also Setting --format for Flat Files.
- --double-field-options
See ybload Field Options.
- --dryrun
Do a dry run of the load operation without committing any data to the table. See Using the Dry Run Option.
- --duplicate-handler NONE | RANDOM | ORDER BY <sql clause>
Specify how to handle duplicate source rows when the
--write-opoption is set toinsert,update, orupsert. (--duplicate-handleris ignored for--write-op deleteloads.)NONE(the default): source rows are assumed to be unique. Theybloadbehavior is undefined if duplicate source rows exist;UPDATEandUPSERToperations will sometimes fail.RANDOM: a random matching source row is used to update each row in the target table."ORDER BY <sql clause>": source rows are sorted, and the first matching row that is found is used to update each row in the target table. For example:"ORDER BY order_date DESC, order_time DESC". Enclose theORDER BYclause in double quotes.See ORDER BY Clause for the complete syntax that is supported, with the exception that the
--duplicate-handleroption does not support ordinal numbers to identify columns.You cannot use a declared key column in the
--duplicate-handler order byclause.See also Handling Duplicate Rows and Examples with Duplicate Rows.
- --emptymarker STRING_WITH_ESCAPES
Define a marker that matches the character used to represent an empty string in your source file (an empty
CHARorVARCHARfield). The value may be a string, a character, or a valid escape sequence.You may need to quote the
emptymarkercharacter. For example, you can use--emptymarker '\t'on the command line, but not--emptymarker \t.Note: To use
"as theemptymarkercharacter, you have to quote strings with a different character ("is the default). For example:--emptymarker '\"' --quote-char '|'If null bytes exist within strings, use the
--on-zero-char REMOVEoption.See also NULL and Empty Markers.
- --encoding ENCODING_NAME | -E ENCODING_NAME
Specify the source file's encoding (character set:
UTF8,LATIN9,ISO-8859-1,UTF16). If no encoding is specified, the encoding of the source file is assumed to match the database encoding. If the encoding of the source file does not match the encoding of the destination database,ybloadwill use Java transcoding to "translate" the source data to the encoding of the destination database.Check the documentation for the version of Java you are using on the
ybloadclient system. For example, if you are using the Oracle JVM, check this list of supported character sets. Names from either of the first two columns can be used with the--encodingoption. See also the discussion of encodings in Creating Databases.For the fastest load performance under
ybload, export the data from the source system in the character set of the destination database. For example, load exportedUTF8data into aUTF8database. Any required transcoding is likely to have an impact on load performance.Note: To load UTF-16 data, create a
UTF8database and set the--encodingoption toUTF16.- --escape-char SPECIAL_CHARACTER
Specify an escape character: any single Unicode character. The behavior of this option depends on the
--formatchoice:--format CSV: used to escape embedded quote characters inside quoted fields. The default escape character is".--format TEXT: used to escape embedded field delimiters and line separators. The default escape character is\.--format BCP: disallowed withBCPformat, which does not use escape characters.
When the
--formatoption is set, the escape character may be a multi-byte character.- --field-defs SQL_NAME | SQL_COLUMN_DEF,...
Define a comma-separated list of source field names or field definitions. A field name simply represents the name of the corresponding column in the table. A field definition also includes the data type and any constraints on the column. Column definitions are used to create temporary table columns for source-only fields, as needed by the
--duplicate-handleroption forUPDATEandUPSERTloads.You do not need to specify a list if the source fields match the order, number, and data type of the destination table columns. The
--parse-header-lineoption, which only supports field names, overrides the--field-defsoption.Field names are case-insensitive unless they are quoted (same behavior as standard SQL).
For example:
--field-defs id,name,"Balance",date--field-defs id BIGINT PRIMARY KEY,update_order INT NOT NULL,balance,dateIf the target table has default values assigned for certain columns, you can specify a partial column list with this option and the default values will be loaded for the other columns. For example, if a table
t1has three columns,c1,c2, andc3, andc1was created with aDEFAULTconstraint, the--field-defslist could bec2, c3.See also Loading Generated Key Values.
- --format CSV | TEXT | BCP | PARQUET
Specify the formatting style of the incoming data (how the source files were formatted by the export or unload tool that produced them).
For flat files, this option refers to how field delimiters are protected in the data:
CSV: Delimiters in field values were protected by wrapping the field values in quotes. For example:"2012, Mini Cooper S, ALL4"TEXT: Delimiters in field values were protected by preceding them with a backslash escape character. For example:2012\, Mini Cooper S\, ALL4BCP: Delimiters in fields were not protected (for compatibility with the Microsoft SQL Serverbcptool). For example:2012, Mini Cooper S, ALL4
See also Setting --format for Flat Files and Loading Tables from Parquet Files.
- --ignore-emptymarker-case, --no-ignore-emptymarker-case
The first option supports case-insensitive empty-marker comparisons. If you use
--ignore-emptymarker-case, values ofEMPTY,Empty, andemptyare all recognized as empty values when--emptymarkeris set toNONE. These options apply globally; you cannot specify them per data type or per field. They are applied globally regardless of how the--emptymarkeroption was specified.- --ignore-nullmarker-case, --no-ignore-nullmarker-case
The first option supports case-insensitive null-marker comparisons. If you use
--ignore-nullmarker-case, values ofNULL,Null, andnullare all recognized as null values when--nullmarkeris set toNULL. These options apply globally; you cannot specify them per data type or per field. They are applied globally regardless of how the--nullmarkeroption was specified.- --ignore-unsupported-schema, --no-ignore-unsupported-schema
If
--ignore-unsupported-schemais specified,ybloadproceeds with aparquetload when source fields contain unsupported data types or a nested structure. See Parquet Load Examples. Note that the load may still fail when a target column for unsupported data is declaredNOT NULLor has noDEFAULTvalue. The failure occurs becauseybloadproceeds as if the unsupported fields are not in the file.When
--no-ignore-unsupported-schemais specified (or neither option is specified, which is the default behavior),ybloadreturns an error when source fields contain unsupported data types or a nested structure. See also Loading Tables from Parquet Files and Parquet Load Examples.These options do not apply to loads of flat files.
- --int96-as-timestamp, --no-int96-as-timestamp
Interpret INT96 data (without a logical type) as a timestamp to provide compatibility with third-party systems, such as Impala, Hive and Spark.
- --integer-field-options
See ybload Field Options.
- --ip-field-options
- See ybload Field Options.
- --key-field-names SQL_NAME,...
Comma separated list of source field names to be used as key fields for
--write-opoperations. Key field names must be declared asNOT NULLin theCREATE TABLEstatement.If this option is not specified, primary keys either specified with the
--field-defsoption or declared by the target table are used. If no key fields are specified and no primary key exists, an attempt to load a table with--write-opdelete,update, orupsertwill result in an error.Field names are case-insensitive unless they are quoted (same behavior as standard SQL). For example:
--key-field-names user_name,"deptId"The
--parse-header-lineoption cannot be used in conjunction with the--key-field-namesoption.- --linesep LINE_SEPARATOR
Define the line separator (or row separator) that is used in source files: any single Unicode character or the
\r\nescape sequence.If you do not specify a line separator,
ybloadauto-detects it from among the following characters:\n\r\n\rs\uFFFB
When the
--formatoption is set, the line separator may be a multi-byte character.- --locale LOCALE_NAME | -L LOCALE_NAME
The name of the locale to use for parsing dates, timestamps, and so on. If the locale is not specified, the database locale is assumed to be
C.Locale names must be of the following form:
<language code>[_<country code>[_<variant code>]]For example:
--locale en --locale en_US --locale zh_CNVariant codes are rarely used; for details, see the Java documentation.
- --mac-field-options
Specify field options for
MACADDRandMACADDR8fields. See ybload Field Options.- --max-bad-rows NUMBER
Set the maximum number of rejected rows that
ybloadwill tolerate before aborting and starting to roll back the transaction. (Additional bad rows may be reported before the transaction has finished aborting.) The default is-1, which means do not cancel and roll back, regardless of the number of bad rows.Note:
--max-bad-rows 32means 32 bad rows are allowed; the load will fail on the 33rd bad row.Rejected rows are written to the location specified with the
--bad-row-fileoption.- --nullmarker STRING
Define a string that matches the string used to represent
nullin your source file. If this option is unspecified or set to an empty string, adjacent delimiters without text between them are parsed asNULLvalues. This option supports valid escape sequences.You cannot load data in which more than one value in the same column is intended to be parsed as
NULL. However, you can load data in which different columns have different values forNULL. See Specifying NULL Behavior for Different Columns and NULL and Empty Markers.- --num-cores NUMBER_MIN_1
Set the number of CPU cores
ybloadwill attempt to use to1or greater. By default,ybloadtries to saturate all of the CPU cores on the host computer. For example, the default values for--num-readersand the number of actual concurrent readers used by--read-sources-concurrently ALWAYSare both based on the number of cores on the host computer. The primary purpose of this setting is to restrictybloadto the use of fewer resources when it shares the host computer with other programs.- --num-header-lines [ NUMBER ]
Ignore one or more header lines at the top of the source file. (The first or only header line in a file is typically a list of field names.) The default is
0, or1if--parse-header-lineis specified. The maximum number is5. If you specify multiple source files, the first line is skipped in all of them.- --num-readers
Define the behavior for reading input files in parallel. See ybload Advanced Processing Options.
Note: When you are loading from named pipes, setting this option equal to the number of pipes is recommended. Not loading from pipes concurrently may cause a deadlock with the program that the pipes communicate with.
- --num-parsers-per-reader
Define multiple parsers per reader. See ybload Advanced Processing Options.
- --object-storage-*
See Object Storage Options. These options apply to loads and unloads from and to Azure, AWS S3, and S3-compatible systems.
- --on-extra-field [ REMOVE | ERROR ]
Specify
REMOVEto allow rows to be loaded when the source file (CSV,TEXT, orBCP) contains either more fields than the columns defined in the header, as detected when the--parse-header-lineoption is used, or more fields than the columns defined with the--field-defsoption.If neither
--parse-header-linenor--field-defsis specified,--on-extra-fieldhas no effect;ybloadexpects the source file to have the same number of fields as the number of columns in the target table. The--parse-header-lineoption, if specified, overrides the--field-defsoption.This option takes effect only when the extra fields are at the end of the line in the source file. For example:
colA, colB, colC <- header data1, data2, data3, data4 <- on-extra-fieldThe default behavior (
ERROR) is to reject rows with extra fields at the end of the line.- --on-invalid-char [ REPLACE | ERROR ]
Specify the action to take when the source file contains characters that cannot be represented in the database (or characters that are invalid in the source file itself):
- The replacement character for a LATIN9 database is the question mark:
0x3F(?). - The replacement character for a UTF8 database is the Unicode replacement character:
U+FFFD(a question mark in a diamond).
The default is
REPLACE.- The replacement character for a LATIN9 database is the question mark:
- --on-missing-field [ SUPPLYNULL | ERROR ]
Specify
SUPPLYNULLto allow rows to be loaded when the source file (CSV,TEXT, orBCP) contains either fewer fields than the columns defined in the header, as detected when the--parse-header-lineoption is used, or fewer fields than the columns defined with the--field-defsoption.If neither
--parse-header-linenor--field-defsis specified,--on-missing-fieldhas no effect;ybloadexpects the source file to have the same number of fields as the number of columns in the target table. The--parse-header-lineoption, if specified, overrides the--field-defsoption.This option takes effect only when the missing fields are at the end of the line in the source file. For example:
colA, colB, colC <- header data1, data2 <- on-missing-fieldThe default behavior (
ERROR) is to reject rows with missing fields at the end of the line.- --on-string-too-long TRUNCATE | ERROR
Truncate character strings that are longer than the specified column width (or return an error). The default is
ERROR.- --on-unescaped-embedded-quote [ PRESERVE | ERROR ]
Preserve or return an error when unescaped quotes are found inside quoted strings. The default is
ERROR.- --on-zero-char [ REMOVE | ERROR ]
Remove null bytes (
0x00characters) that appear within strings inCHARandVARCHARfields (or return an error). The default isERROR.- --parse-header-line
Use the header line at the top of the source file (a list of field names) to determine the column names in the target table. This option overrides the
--field-defsoption.- --per-field-options {JSON Object}
Specify parsing options for individual fields. See ybload Field Options.
- --pre-parse-buf-size NUMBER_MIN_1
Specify the size of the pre-parse buffers, which should be larger than the size of any single row in the source data. The default value is
192744bytes. (The maximum width of a row in a table is 64231 bytes.)- --quote-char SPECIAL_CHARACTER
Define the character that is used in source files to quote field values that contain embedded delimiters. Specify any single Unicode character. The default is
". This option only applies when you are using--format CSV.When
--format CSVis set, the delimiter may be a multi-byte character.- --read-sources-concurrently ALWAYS | NEVER | ALLOW | <NUMBER>
Define the behavior for reading source files in parallel:
ALWAYS,NEVER,ALLOW(the default), or a specific number of source files. See ybload Advanced Processing Options.Note: If you are loading from multiple pipes and this option is not set to
ALWAYS,ybloadsets it toALWAYSand returns anINFOmessage stating that change. (Not reading from pipes concurrently could cause a deadlock with the program that they communicate with.)- --real-field-options
See ybload Field Options.
- --resume-partial-load-from-offset NUMBER
Skip the specified number of bytes in the source file in order to resume a failed bulk load. The default value is
0. See Resuming a Partial Load.- --rows-per-transaction
Set the number of rows to load per commit. The default is set to the maximum number of rows that can be loaded (263 – 1). You can reduce this number to increase the frequency of commits when bulk loads are running. This option works in conjunction with
--bytes-per-transaction. The threshold that is met first is applied.- --serialize-nested-as-json, --no-serialize-nested-as-json
If
--serialize-nested-as-jsonis specified,ybloadprocesses nested fields in complexparquetfiles as JSON. If--no-serialize-nested-as-jsonis specified (or neither option is specified),ybloadreturns an error when attempting to map complex nested fields to the columns in the target table. These options do not apply to loads of flat files. See Loading Tables from Parquet Files and Parquet Load Examples.- --skip-blank-lines, --no-skip-blank-lines
Skip blank lines in the source file (
true) or detect blank rows as bad rows (false). The default value istrue.- --skip-comment-lines, --no-skip-comment-lines
Skip lines that are commented out in the source file or detect them as bad rows. The default value is
false(do not skip).Note: If you use
--skip-comment-lines, make sure the data does not contain any lines that begin with the comment character (which defaults to#but may bet set to a different character with the--comment-charoption).- --skip-missing-files, --no-skip-missing-files
If a file is removed after
ybloadhas expanded the path to the source files into a list, skip the file and continue the load operation. This option is useful when millions of source files (or serveral very large files) are being loaded over an extended period of time. As the load progresses, it is possible that an administrator may delete one or more files from the source location. You may or may not want to stop the load in this situation. The default behavior is to not skip missing files.When a load proceeds because a missing file is skipped, you will see a message like this:
Skipped [filepath] because the file is removed after listing- --skip-open-failed-files, --no-skip-open-failed-files
If a file fails to open after
ybloadhas expanded the path to the source files into a list, skip the file and continue the load operation. It is possible that the original file listing contains one or more files that have been modified and cannot be opened (for example, if permissions on the file were changed after the load started). You may or may not want to stop the load in this situation. The default behavior is to not skip files that fail to open.When a load proceeds because a file that fails to open is skipped, you will see a message like this:
Skipped [filepath] because the file can’t be openedThis option only works when no bytes have already been read from the file in question. Files read from object storage may be loaded in parts, but a load will fail rather than skip a file that has been partially read.
- --smallint-field-options
See ybload Field Options.
- --source-compression NONE | AUTO | GZ | BZIP2 | XZ | PACK200 | LZ4
This option explicitly defines the type of compression used by source data and is primarily intended for data sources that do not have file names (such as
STDIN).The default value is
AUTO. When specified, this option overrides other compression detection methods. If this option is not specified, for flat filesybloadauto-detects the compression type based on their file name extension.For
parquetfiles,ybloadauto-detects the compression type based on metadata in the files and supports the following compression types:SNAPPY,ZSTD, andLZO. You do not need to specify the--source-compressionoption forparquetfiles.- --table (or -t)
Name the target table to load and optionally its schema:
schema_name.table_name. If you do not specify the schema name, the table is assumed to be in thepublicschema. The schema of the target table is not based on the user'ssearch_path, regardless of how it is set. The table must exist in the database that you are connecting to for the load. (Do not try to specify the database name as part of the table name. Use theYBDATABASEenvironment variable,-d, or--dbname.)Note: If you used a quoted case-sensitive identifier to create the target table, you must quote the table name and escape the quotes in the
ybloadcommand line. For example, if your table is namedPremLeagueStats:-t \"PremLeagueStats\"- --time-field-options
See ybload Field Options.
- --timestamp-field-options
See ybload Field Options.
- --timestamptz-field-options
See ybload Field Options.
- --trim-white, --no-trim-white
Trim or retain leading and trailing whitespace characters in each field. With
--format csv, whitespace characters inside of quotes are always preserved. The default value isfalse(preserved).- --trim-trailing-white, --no-trim-trailing-white
Trim or retain trailing whitespace characters in each field. With
--format csv, whitespace characters inside of quotes are always preserved. The default value isfalse(preserved).- --truncate-before-insert, --no-truncate-before-insert
Truncate the target table before inserting new rows. Use this option if you want to ensure that the load runs against an empty table. The
TRUNCATEstatement is executed in the first transaction of the bulk load session. If the bulk load fails before that transaction is committed, theTRUNCATEis rolled back.To use this option, you must have
DELETE,TRUNCATE, andINSERTprivileges on the target table (in addition toBULK LOADprivilege on the database).- --uuid-field-options
See ybload Field Options.
- --write-op INSERT | UPDATE | DELETE | UPSERT
How the source rows should be written to the target table.
INSERT(the default): insert source rows as new rows (append them to the table).UPDATE: update rows that match source rows.DELETE: delete rows that match source rows.UPSERT: update rows that match source rows, and insert new rows.
Updates, deletes, and upserts require primary keys or declared key fields to match against the target table. See
--key-field-names.By default, source rows are inserted (appended to the table) and duplicate rows are not discarded. Incoming rows are assumed to be unique. Updates and upserts may require duplicate handling. See the
--duplicate-handleroption.Note: Make sure the user running
ybloadhasBULK LOADprivileges on the database and appropriate additional privileges on the target table:INSERTfor default loadsUPDATEandSELECTfor--write-op updateand--write-op upsertloadsDELETEandSELECTfor--write-op deleteloads
See also Deletes, Updates, and Upserts and Delete, Update, and Upsert Examples.
- --y2base YEAR
Define the pivot year (such as
1970) for two-digit year values (such as97and16). For example, if--y2baseis set to1970, two-digit years of70and later are assumed to be in the 1900s. Values of69and earlier are assumed to be in the 2000s.If you want to specify one of the
Y2values for--date-style, such asDMY2, you must also specify a--y2basevalue.