Parquet Schema Mapping and Type Casting
This section lays out the mapping and casting support for parquet
types to Yellowbrick data types (data types supported for storage in columns in Yellowbrick tables).
Mapping for Parquet Boolean Type
The parquet boolean
type maps directly to the Yellowbrick boolean
data type. No other mappings are supported.
Mappings for Parquet INT32 Types
The following table indicates which parquet INT32
data types map to Yellowbrick data types, either directly or with casting.
The first row in the table refers to the parquet
primitive type, and the subsequent rows to annotated logical types.
CHAR, VARCHAR | BOOLEAN | SMALLINT | INT | BIGINT | REAL | DOUBLE | DECIMAL | DATE | TIME | TIMESTAMP, TIMESTAMPTZ | UUID | IPV4, IPV6 | MACADDR, MACADDR8 | BYTEA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INT32 | Yes, with cast | No | Yes, with cast | Yes | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | No | No | No | No | No | No | No |
INT/UINT (8/16/32, sign) | Yes, with cast | No | Yes: INT(8), UINT(8), INT(16) | Yes: UINT(16),INT(32) | Yes: UINT(32) | Yes, with cast | Yes, with cast | Yes, with cast | No | No | No | No | No | No | No |
DECIMAL (1-9) | Yes, with cast | No | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes | No | No | No | No | No | No | No |
DATE | Yes, with cast | No | No | No | No | No | No | No | Yes | No | Yes, with cast | No | No | No | No |
TIME (MILLIS) | Yes, with cast | No | No | No | No | No | No | No | No | Yes | No | No | No | No | No |
Mappings for Parquet INT64 Types
The following table indicates which parquet INT64
data types map to Yellowbrick data types, either directly or with casting.
The first row in the table refers to the parquet
primitive type, and the subsequent rows to annotated logical types.
CHAR, VARCHAR | BOOLEAN | SMALLINT | INT | BIGINT | REAL | DOUBLE | DECIMAL | DATE | TIME | TIMESTAMP, TIMESTAMPTZ | UUID | IPV4, IPV6 | MACADDR, MACADDR8 | BYTEA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INT64 | Yes, with cast | No | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | No | No | No | No | No | No | No |
INT/UINT(64, sign) | Yes, with cast | No | Yes, with cast | Yes, with cast | Yes: INT(64) | Yes, with cast | Yes, with cast | Yes: UINT(64) | No | No | No | No | No | No | No |
DECIMAL (1-18) | Yes, with cast | No | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes | No | No | No | No | No | No | No |
TIME (MICROS, NANOS) | Yes, with cast | No | No | No | No | No | No | No | No | Yes | No | No | No | No | No |
TIMESTAMP (UTC, unit) | Yes, with cast | No | No | No | No | No | No | No | Yes, with cast | Yes, with cast | Yes | No | No | No | No |
Mapping for Parquet FLOAT, DOUBLE, and INT96 Types
The following table indicates which parquet FLOAT
, DOUBLE
, and INT96
data types map to Yellowbrick data types, either directly or with casting.
CHAR, VARCHAR | BOOLEAN | SMALLINT | INT | BIGINT | REAL | DOUBLE | DECIMAL | DATE | TIME | TIMESTAMP, TIMESTAMPTZ | UUID | IPV4, IPV6 | MACADDR, MACADDR8 | BYTEA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FLOAT | Yes, with cast | No | No | No | No | Yes | Yes, with cast | No | No | No | No | No | No | No | No |
DOUBLE | Yes, with cast | No | No | No | No | Yes, with cast | Yes | No | No | No | No | No | No | No | No |
INT96 (--int96-as-timestamp) | Yes, with cast | No | No | No | No | No | No | No | Yes, with cast | Yes, with cast | Yes | No | No | No | No |
INT96 (--no-int96-as-timestamp) | Yes, with cast | No | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes | No | No | No | No | No | No | No |
Mapping for Parquet Byte Array Types
The following table indicates which parquet
byte array data types map to Yellowbrick data types, either directly or with casting.
The first row in the table refers to the parquet
primitive type, and the subsequent rows to annotated logical types.
CHAR, VARCHAR | BOOLEAN | SMALLINT | INT | BIGINT | REAL | DOUBLE | DECIMAL | DATE | TIME | TIMESTAMP, TIMESTAMPTZ | UUID | IPV4, IPV6 | MACADDR, MACADDR8 | BYTEA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BYTE ARRAY | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes |
STRING/UTF-8 | Yes | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes |
ENUM | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes |
DECIMAL(N) | Yes, with cast | No | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes | No | No | No | No | No | No | No |
JSON | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes |
BSON | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes |
Note: When loading data from parquet bytea array to a bytea column or varchar column, leading and trailing whitespaces are preserved.
Mapping for Parquet Fixed-Length Byte-Array Types
The following table indicates which parquet
fixed-length byte-array data types map to Yellowbrick data types, either directly or with casting.
The first row in the table refers to the parquet
primitive type, and the subsequent rows to annotated logical types.
CHAR, VARCHAR | BOOLEAN | SMALLINT | INT | BIGINT | REAL | DOUBLE | DECIMAL | DATE | TIME | TIMESTAMP, TIMESTAMPTZ | UUID | IPV4, IPV6 | MACADDR, MACADDR8 | BYTEA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FIXED-LENGTH BYTE-ARRAY | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes |
16/UUID | Yes, with cast | No | No | No | No | No | No | No | No | No | No | Yes | No | No | Yes |
N/DECIMAL(N) | Yes, with cast | No | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes, with cast | Yes | No | No | No | No | No | No | No |
12/Interval | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |