Appearance
Unloading Data to Parquet Files
This section explains how to unload data from Yellowbrick tables in Apache Parquet format (a binary, structured columnar storage format). Apache Parquet format is supported in ybload
and ybunload
operations. Certain options and parameters that work for flat files are not supported for parquet
unloads, and a few options are specific to parquet
unloads.
The standard set of Yellowbrick data types are automatically mapped to native and logical parquet
data types; you do not need to specify any mapping. However, if you want to load (or reload) parquet
data into a Yellowbrick table, you must make sure that the target column names in the table DDL match the names in the parquet
schema. You can use parquet-tools
to check the schema of parquet
files before loading.
The following ybunload
options are specific to Parquet unloads:
--format parquet
: required for all unloads toparquet
files. This format must be set in theybunload
command.- Some optional Parquet Processing Options. You may need to set or modify some of these options, depending on the specific requirements of the data you are unloading.
Note: If you are unloading data to Azure Blob Storage or Azure Data Lake Storage Gen2, it must be unloaded in parquet
format.