Appearance
Loading Data Exported from SQL Server
This section covers best practices for loading data into a Yellowbrick database when the source of the data is a table in a Microsoft SQL Server database. There are specific recommendations for both exporting the data from SQL Server and loading the data with the ybload
program.
Best Practices for Exporting Data from SQL Server
Yellowbrick strongly recommends that you prepare a clean csv
file before trying to load it, especially if your character strings contain any of the following:
- The same character that is used as the field separator, such as commas or pipe symbols
- The same character that is used as the row separator, such as EOL markers
- Quotes
If you intend to use the SQL Server bcp
utility to export data, use the "query"
option to select the data out of the table and escape quotes and other characters that may cause problems when you load. By default, the bcp
utility does not escape quotes and separators that are found in character strings. If you use a simple "copy table to file" approach and your data has the characteristics listed above, you will need to set some special options in the ybload
command to parse the incoming data successfully. In general, it is more efficient to clean up the data on its way out of the source table than to work around data problems when you set up and run the ybload
operation.
For example, you can use a bcp
command that looks something like this:
bcp "query" queryout <filename> -c -S <server> -t,
...
where -t,
exports a comma-delimited file and "query"
is a SELECT statement. If column2
in table1
has character strings that contain quotes, your query should look like this:
select column1, '\"' + REPLACE(column2,'\"','\\\"') + '\"' as column2 from table1 ...
The purpose of the REPLACE
function is to replace double quotes within strings with escaped double quotes ('\"'
). The entire character string will also be enclosed by double quotes.
Note: If you have a pre-2016 version of SQL Server, which does not export data in the UTF-8 encoding, use the -w
option in the bcp
command (instead of the -c
option). The -w
option runs the export using Unicode characters.
By using the -w
option in the bcp
command and the --encoding utf16
option in the ybload
command, you will be able to export and load your data. Also make sure that your Yellowbrick database uses the utf8
encoding, not the default latin9
encoding.
For details about bcp
syntax and options, see the Microsoft SQL Server documentation.
Workarounds for Loading Exported Data
If you are unable to produce clean exported source files for bulk loading purposes, the ybload
program provides some options that can parse the data and work around problems:
- Null Bytes
- Null-byte strings (
0x00
) may be exported in your data, either representing complete empty fields or empty strings within fields. You can use the--emptymarker \0
option to replace0x00
with empty strings, and the--ignorenullbytes
option to ignore0x00
within strings. - Hexadecimal Values
- Timestamps may be exported as 64-bit hexadecimal values. The
ybload
program can parse and load these values if you use theinhex
option for the appropriate fields.
For details about bulk load syntax and options, see ybload Options.
Parent topic:Running a Bulk Load