Loading Data Exported from SQL Server
This section covers best practices for loading data into a Yellowbrick database when the source
of the data is a table in a Microsoft SQL Server database. There are specific recommendations for
both exporting the data from SQL Server and loading the data with the ybload
client.
Best Practices for Exporting Data from SQL Server
csv
file before
trying to load it, especially if your character strings contain any of the following:- The same character that is used as the field separator, such as commas or pipe symbols
- The same character that is used as the row separator, such as EOL markers
- Quotes
If you intend to use the SQL Server bcp
utility to export data, use the
"query"
option to select the data out of the table and escape quotes and other
characters that may cause problems when you load. By default, the bcp
utility
does not escape quotes and separators that are found in character strings. If you use a simple
"copy table to file" approach and your data has the characteristics listed above, you will need
to set some special options in the ybload
command to parse the incoming data
successfully. In general, it is more efficient to clean up the data on its way out of the source
table than to work around data problems when you set up and run the ybload
operation.
bcp
command that looks something like
this:bcp "query" queryout <filename> -c -S <server> -t,
...
-t,
exports a comma-delimited file and "query"
is a
SELECT statement. If column2
in table1
has character strings
that contain quotes, your query should look like this:
select column1, '\"' + REPLACE(column2,'\"','\\\"') + '\"' as column2 from table1 ...
The purpose of the REPLACE
function is to replace double quotes within
strings with escaped double quotes ('\"'
). The entire character string will
also be enclosed by double quotes.
-w
option in the bcp
command (instead of
the -c
option). The -w
option runs the export using Unicode
characters. By using the -w
option in the bcp
command and
the --encoding utf16
option in the ybload
command, you will
be able to export and load your data. Also make sure that your Yellowbrick database uses the
utf8
encoding, not the default latin9
encoding.
For details about bcp
syntax and options, see the Microsoft SQL Server documentation.
bcp Example on Windows
bcp DW.dbo.DimDate out dw.dbo_DimDate.csv -S ABCDE\PROD -T -c -t 0x1F -r 0x0A -a 32768
ybload
command will work with this bcp
command:ybload -d dw -t dbo.dimdate --delimiter 0x1F --format bcp dw.dbo.dimdate.csv
bcp Example on Linux
bcp
for Linux utility is available for extracting data from SQL Server.
Here is an example bcp
Linux
command:bcp "USE DW; SELECT * FROM dbo.DimDate(NOLOCK) ;" queryout "/mnt/data/dw.dbo_DimDate.csv" –S dw.abcd.com -c -t "0x1F" -a 32768
The same ybload
command that is shown in the Windows example will work in
this case.
Workarounds for Loading Exported Data
ybload
program provides some options that can parse the data and work around problems:- Null Bytes
- Null-byte strings (
0x00
) may be exported in your data, either representing complete empty fields or empty strings within fields. You can use the--emptymarker \0
option to replace0x00
with empty strings, and the--ignorenullbytes
option to ignore0x00
within strings. - Hexadecimal Values
- Timestamps may be exported as 64-bit hexadecimal values. The
ybload
program can parse and load these values if you use theinhex
option for the appropriate fields.
For details about bulk load syntax and options, see ybload Options.