Appearance
Specifying NULL Behavior for Different Columns
This section explains how you can use the --nullmarker
option in different ways in the same bulk load. You may run into a situation where different columns in the same table must be loaded with NULL
values, but the source file for the load defines NULL
in different ways.
By default, ybload
treats adjacent delimiters without text between them as NULL
values and adjacent quotes within delimiters as empty strings. For example, assume that a line in a comma-delimited source file looks like this:
123,,456,"",789
This line represents 5 fields or columns:
- Column 1: 123
- Column 2:
NULL
(If this column is not nullable,ybload
returns an error.) - Column 3: 456
- Column 4: empty string (If this column cannot contain an empty string,
ybload
returns an error; onlyCHAR
andVARCHAR
columns can contain empty strings.) - Column 5: 789
If the fields in your source files and their corresponding table columns conform with this pattern with respect to NULL
values and empty strings, ybload
will load the data correctly by default. However, if your data contains special values that are intended to represent NULL
, read the rest of this section to understand how to load those values correctly (or discard them as appropriate). See also the description of the --emptymarker
option under ybload Options.
Consider the DDL for the following table:
create table team_with_nulls(
teamid smallint not null,
htid smallint not null,
atid smallint not null,
name varchar(30),
nickname varchar(20),
city varchar(20),
stadium varchar(50),
debut date)
distribute replicate;
Note that the four VARCHAR
columns are nullable, and the final DATE
column is also nullable. Now consider the following 5 lines in the source file:
46,47,96,Wimbledon,Dons,London,Selhurst Park,1991/08/01
47,48,97,Wolverhampton Wanderers,Wolves,Wolverhampton,Molineux Stadium,2003/08/01
48,49,98,,null,,,----/--/--
49,50,99,,null,,,----/--/--
50,51,100,,null,,,----/--/--
- Columns 4, 6, and 7 have consecutive delimiters, implying
NULL
values. - Column 5 contains an explicit
null
character string, but this string must be stored asNULL
. - Column 8 has
----/--/--
in some rows. This value also needs to be loaded asNULL
.
Fortunately, ybload
has options that work for all fields in the source file, for all fields with a given data type, or for specific named fields. For example, within the same ybload
command, you could set the --nullmarker
option globally, for all VARCHAR
fields, and for one or more specific fields. The per-type and per-field options, if present, override the global settings.
In this case, the per-type option will not solve the issue with the VARCHAR
columns because one of them has a different marker for NULL
. However, the global default --nullmarker
option takes care of the columns where adjacent delimiters indicate NULL
, and you can specify a different NULL
value for column 5. For column 8, the only DATE
column in the table, you can use either a per-field option or a per-type option:
- Column 5: load
null
asNULL
, using:
--per-field-options '{"5":{"nullmarker":"null"}}'
- Column 8: load
----/--/--
asNULL
, using :
--date-field-options '{"nullmarker": "----/--/--"}'
or:
--per-field-options '{"8":{"nullmarker":"----/--/--"}}'
The complete ybload
command looks like this:
$ ybload -d premdb -t team_with_nulls --username bobr -W /home/premdata/team_with_nulls.csv
--format csv --date-field-options '{"nullmarker": "----/--/--"}' --per-field-options '{"5":{"nullmarker":"null"}}'
The global --nullmarker ''
option is not required because consecutive delimiters represent the default null marker.
The resulting rows in the table look like this:
$ ybsql premdb
Null display is "[NULL]"
...
premdb=# select * from team_with_nulls;
teamid | htid | atid | name | nickname | city | stadium | debut
--------+------+------+-------------------------+----------+---------------+------------------+------------
46 | 47 | 96 | Wimbledon | Dons | London | Selhurst Park | 1991-08-01
47 | 48 | 97 | Wolverhampton Wanderers | Wolves | Wolverhampton | Molineux Stadium | 2003-08-01
48 | 49 | 98 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL]
49 | 50 | 99 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL]
50 | 51 | 100 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL]
(5 rows)
Note that in this ybsql
session NULL
values are intentionally displayed as [NULL]
for readability. (See ybsql Display Properties for information about the \pset
command.)
Parent topic:Running a Bulk Load