Appearance
BINARY
A binary string is a sequence of octets (or bytes). The binary strings specifically allow storing octets of value zero and other “non-printable” octets (usually, octets outside the decimal range 32 to 126). The operations on binary strings process the actual bytes. The binary data type is used to support the storage and manipulation of binary data. This forms the foundation for the JSONB
data type.
The binary data type has several aliases:
-BYTEA
: BYTEA
stands for “binary array”. The BYTEA
type does not have a length indicator but is currently limited to the same length as the maximum VARCHAR
.
-VARBINARY
: VARBINARY
or VARBINARY(n)
takes a length parameter in bytes.
-VARBYTE
: VARBYTE
or VARBYTE(n)
takes a length parameter in bytes.
Note: These type aliases all map to the same underlying BYTEA
type.
Prerequisite
To use a binary data type, enable the enable_full_bytea
configuration parameter.
Input BYTEA values
This section explains how to input BYTEA
values using the following formats supported by Yellowbrick:
- hex
- escape
- converting the characters into their byte representation depending on the encoding
bytea “hex” Format
The “hex” format encodes binary data as two hexadecimal digits per byte, most significant nibble first. The entire string is preceded by the sequence \x
(to distinguish it from the escape format). For input, the hexadecimal digits can be either uppercase or lowercase, and whitespace is permitted between digit pairs (but not within a digit pair nor in the starting \x
sequence).
sql
CREATE TABLE bytea_table (
data BYTEA
);
-- Insert a row with a bytea value in hexadecimal format
INSERT INTO bytea_table (data) VALUES ('\x48656C6C6F2C20776F726C6421');
-- Select and display the inserted bytea value
SELECT * FROM bytea_table;
-- \x48656c6c6f2c20776f726c6421
bytea “escape” Format
The “escape” format represents a binary string using ASCII characters and converts non-ASCII bytes into special escape sequences. When entering bytea values in escape format, octets of certain values must be escaped, while all octet values can be escaped. In general, to escape an octet, convert it into its three-digit octal value and precede it by a backslash. Backslash itself (octet decimal value 92) can alternatively be represented by double backslashes.
sql
CREATE TABLE bytea_table (
data BYTEA
);
-- Insert a row with a bytea value in escape format
INSERT INTO bytea_table (data) VALUES (E'Hell\157 wo\162ld!');
SET bytea_output TO escape;
-- Select and display the inserted bytea value
SELECT * FROM bytea_table;
-- Hello world!
Converting the characters into their byte representation depending on the encoding
sql
CREATE TABLE bytea_table (
data BYTEA
);
-- Insert a row with a bytea value
INSERT INTO bytea_table (data) VALUES ('Hello,world!');
-- Select and display the inserted bytea value.
SELECT * FROM bytea_table;
-- \x48656c6c6f2c776f726c6421
Examples of bytea_output configuration parameter
Set bytea_output to “hex” format
sql
SET bytea_output = 'hex';
SELECT '\xDEADBEEF'::bytea;
-- \xdeadbeef
This will set the output format of bytea
values to hexadecimal. When you query a bytea
column, it will be displayed in hexadecimal format.
Set bytea_output to “escape” format
In “escape” format, bytea
values are displayed as a series of octal numbers prefixed with \
and escaped characters are displayed as they are.
sql
SET bytea_output = 'escape';
SELECT 'abc \153\154\155 \052\251\124'::bytea;
-- abc klm *\251T
Note: Changing this setting affects the display format of bytea
values in query results but does not affect the actual storage format in the database.
Operators
The following table shows the operators that are supported:
Operator | Description |
---|---|
& | Bitwise And |
| | Bitwise Or |
# or ^ | Bitwise Xor |
~ | Bitwise Not |
<< | Shift Left |
>> | Logical Shift Right |
|| | Concat |
Casting Rules
BYTEA
does not support length type modifiers, it can only be shortened by calling the SUBSTRING
function. For type casting, if BYTEA
cannot be directly cast to a specified type, an error will occur as described in the sections below:
sql
[42846] ERROR: cannot cast type bytea to <target type>
Also, attempting to cast from a <source type>
to a BYTEA
results in the following error:
sql
[42846] ERROR: cannot cast type <source type> to bytea
To know how implicit casting is handled, please refer Implicit Casting.
BYTEA and UUID
Casting to and from the UUID type leads to an error. The recommended solution is to perform the casting through VARCHAR
.
BYTEA and FLOAT4/FLOAT8
Casting to and from FLOAT4
/FLOAT8
types results in an error. The solution to this issue is to perform the casting through VARCHAR
.
BYTEA and DECIMAL/NUMERIC
Casting to and from the DECIMAL
type causes an error. You can avoid this by casting through VARCHAR
. Alternatively, you can use the GET_BYTE
function to extract each byte as an integer.
BYTEA and VARCHAR/TEXT
Casting VARCHAR
/TEXT
to BYTEA
can take the following notations as input:
- Hex specified as
[0\\]x[a-fA-F0-9]
. - Binary specified as
B'[01]+
. - Octal Notation
E'\\\\000\\\\001\\\\002'
. - String values, these are interpreted as the byte sequence of the string (see below).
String byte sequences
We require that strings in non-hex, binary, or octal prefixes can still be cast to BYTEA
. The string value is interpreted as the UTF8
byte representation of the VARCHAR
.
Multibyte UTF8 sequence
sql
SELECT 'ท'::bytea::VARCHAR;
varchar
---------
\xe0b897
Which is the UTF8
binary representation of the "ท" character.
Note: Ensure that this process is run against a UTF8
encoded database for proper handling of text. Additionally, be aware that SQL clients may interpret encodings differently, which can sometimes result in variations in the displayed output.
When casting a BYTEA
to a VARCHAR
, the output is the hex string representation of the contents of the BYTEA
. This means that it is possible to overflow the max size of VARCHAR
if you have a sufficiently long BYTEA
. This results in the following error:
sql
[220BF] - Overflow when casting '<context>' to VARCHAR
With <context>
being the first 20 bytes of the BYTEA
. If more than 20 bytes are present, an ellipsis is added.
BYTEA and CHAR
Casting directly to CHAR
would lead to an error.
BYTEA and IP/MAC data types
Casting to or from BYTEA
and MAC
/IP
types results in an error. The workaround, again, is to use GET_BYTE
and manually form a string.
Functions
The following functions are supported.
Note that the Get_
style functions return an INTEGER
type, not a smaller type.
Function | Description |
---|---|
GET_BIT (BYTEA x, INT n) -> INT | Returns the nth bit in x |
GET_BYTE(BYTEA x, INT n) -> INT | Returns the nth byte |
LENGTH(BYTEA X) | Returns length (in bytes) of the binary (i.e. the size of the value in the type) |
BIT_COUNT(BYTEA X) | Returns the number of set bits in x |
SET_BIT(BYTEA x, INT n, INT v) | Set the nth bit to v |
SET_BYTE(BYTEA x, INT n, INT v) | Set the nth byte to v |
SUBSTR(BYTEA x, INT start [, INT count]) -> BYTEA | Returns the byte as start offset up until count (default 1) |
ENCODE(BYTEA x, VARCHAR format) -> VARCHAR | Formats x into a VARCHAR string which uses format ‘encoding.' We will support base64, escape, and hex formats |
DECODE(VARCHAR x, VARCHAR format) -> BYTEA | The reverse of encode. Turns x into a BYTEA(n) using format as encoding |
SHA1(BYTEA) -> BYTEA | Calculate SHA1 and returns a BYTEA |
SHA256(BYTEA) -> BYTEA | Calculates SHA256 and returns a BYTEA |
SHA512(BYTEA) -> BYTEA | Calculates SHA512 and returns a BYTEA |
MD5(BYTEA) -> VARCHAR(32) | Calculates MD5 and returns a hex respresentation as a VARCHAR. This is done for compatibility reasons |
Aggregating and Ordering
BYTEA
can be grouped and window aggregated. The only valid aggregate functions on BYTEA
are MIN
, MAX
and COUNT
(including COUNT DISTINCT
).
BYTEA
are ordered by their byte order (the same as LATIN9
).
Joining and Equality
Two BYTEA
types can only be equal if their lengths are the same and their byte patterns are the same. This means you can join on BYTEA
and perform equality match by doing simple memcmp
after length checks.
Distribution and Partitioning
The BYTEA
type cannot be used for partitioning and distribution defined in the Data definition language (DDL). Attempting to do so results in the following error:
sql
[42603] The column '<column name>' cannot be used as a {distribution | partition} key because it is of the BYTEA type.