Skip to content

BINARY

A binary string is a sequence of octets (or bytes). The binary strings specifically allow storing octets of value zero and other “non-printable” octets (usually, octets outside the decimal range 32 to 126). The operations on binary strings process the actual bytes. The binary data type is used to support the storage and manipulation of binary data. This forms the foundation for the JSONB data type.

The binary data type has several aliases:

-BYTEA: BYTEA stands for “binary array”. The BYTEA type does not have a length indicator but is currently limited to the same length as the maximum VARCHAR.

-VARBINARY: VARBINARY or VARBINARY(n) takes a length parameter in bytes.

-VARBYTE: VARBYTE or VARBYTE(n) takes a length parameter in bytes.

Note: These type aliases all map to the same underlying BYTEA type.

Prerequisite

To use a binary data type, enable the enable_full_bytea configuration parameter.

Input BYTEA values

This section explains how to input BYTEA values using the following formats supported by Yellowbrick:

  • hex
  • escape
  • converting the characters into their byte representation depending on the encoding

bytea “hex” Format

The “hex” format encodes binary data as two hexadecimal digits per byte, most significant nibble first. The entire string is preceded by the sequence \x (to distinguish it from the escape format). For input, the hexadecimal digits can be either uppercase or lowercase, and whitespace is permitted between digit pairs (but not within a digit pair nor in the starting \x sequence).

sql
CREATE TABLE bytea_table (
    data BYTEA
);

-- Insert a row with a bytea value in hexadecimal format
INSERT INTO bytea_table (data) VALUES ('\x48656C6C6F2C20776F726C6421');
 
-- Select and display the inserted bytea value
SELECT * FROM bytea_table;
-- \x48656c6c6f2c20776f726c6421

bytea “escape” Format

The “escape” format represents a binary string using ASCII characters and converts non-ASCII bytes into special escape sequences. When entering bytea values in escape format, octets of certain values must be escaped, while all octet values can be escaped. In general, to escape an octet, convert it into its three-digit octal value and precede it by a backslash. Backslash itself (octet decimal value 92) can alternatively be represented by double backslashes.

sql
CREATE TABLE bytea_table (
    data BYTEA
);

-- Insert a row with a bytea value in escape format
INSERT INTO bytea_table (data) VALUES (E'Hell\157 wo\162ld!');

SET bytea_output TO escape;

-- Select and display the inserted bytea value
SELECT * FROM bytea_table;
-- Hello world!

Converting the characters into their byte representation depending on the encoding

sql
CREATE TABLE bytea_table (
    data BYTEA
);

-- Insert a row with a bytea value
INSERT INTO bytea_table (data) VALUES ('Hello,world!');

-- Select and display the inserted bytea value. 
SELECT * FROM bytea_table;
-- \x48656c6c6f2c776f726c6421

Examples of bytea_output configuration parameter

Set bytea_output to “hex” format

sql
SET bytea_output = 'hex';

SELECT '\xDEADBEEF'::bytea;
-- \xdeadbeef

This will set the output format of bytea values to hexadecimal. When you query a bytea column, it will be displayed in hexadecimal format.

Set bytea_output to “escape” format

In “escape” format, bytea values are displayed as a series of octal numbers prefixed with \ and escaped characters are displayed as they are.

sql
SET bytea_output = 'escape';

SELECT 'abc \153\154\155 \052\251\124'::bytea;
-- abc klm *\251T

Note: Changing this setting affects the display format of bytea values in query results but does not affect the actual storage format in the database.

Operators

The following table shows the operators that are supported:

OperatorDescription
&Bitwise And
|Bitwise Or
# or ^Bitwise Xor
~Bitwise Not
<<Shift Left
>>Logical Shift Right
||Concat

Casting Rules

BYTEA does not support length type modifiers, it can only be shortened by calling the SUBSTRING function. For type casting, if BYTEA cannot be directly cast to a specified type, an error will occur as described in the sections below:

sql
[42846] ERROR: cannot cast type bytea to <target type>

Also, attempting to cast from a <source type> to a BYTEA results in the following error:

sql
[42846] ERROR: cannot cast type <source type> to bytea

To know how implicit casting is handled, please refer Implicit Casting.

BYTEA and UUID

Casting to and from the UUID type leads to an error. The recommended solution is to perform the casting through VARCHAR.

BYTEA and FLOAT4/FLOAT8

Casting to and from FLOAT4/FLOAT8 types results in an error. The solution to this issue is to perform the casting through VARCHAR.

BYTEA and DECIMAL/NUMERIC

Casting to and from the DECIMAL type causes an error. You can avoid this by casting through VARCHAR. Alternatively, you can use the GET_BYTE function to extract each byte as an integer.

BYTEA and VARCHAR/TEXT

Casting VARCHAR/TEXT to BYTEA can take the following notations as input:

  • Hex specified as [0\\]x[a-fA-F0-9].
  • Binary specified as B'[01]+.
  • Octal Notation E'\\\\000\\\\001\\\\002'.
  • String values, these are interpreted as the byte sequence of the string (see below).

String byte sequences

We require that strings in non-hex, binary, or octal prefixes can still be cast to BYTEA. The string value is interpreted as the UTF8 byte representation of the VARCHAR.

Multibyte UTF8 sequence

sql
SELECT 'ท'::bytea::VARCHAR;
 varchar
---------
\xe0b897

Which is the UTF8 binary representation of the "ท" character.
Note: Ensure that this process is run against a UTF8 encoded database for proper handling of text. Additionally, be aware that SQL clients may interpret encodings differently, which can sometimes result in variations in the displayed output.

When casting a BYTEA to a VARCHAR, the output is the hex string representation of the contents of the BYTEA. This means that it is possible to overflow the max size of VARCHAR if you have a sufficiently long BYTEA. This results in the following error:

sql
[220BF] - Overflow when casting '<context>' to VARCHAR

With <context> being the first 20 bytes of the BYTEA. If more than 20 bytes are present, an ellipsis is added.

BYTEA and CHAR

Casting directly to CHAR would lead to an error.

BYTEA and IP/MAC data types

Casting to or from BYTEA and MAC/IP types results in an error. The workaround, again, is to use GET_BYTE and manually form a string.

Functions

The following functions are supported.

Note that the Get_ style functions return an INTEGER type, not a smaller type.

FunctionDescription
GET_BIT (BYTEA x, INT n) -> INTReturns the nth bit in x
GET_BYTE(BYTEA x, INT n) -> INTReturns the nth byte
LENGTH(BYTEA X)Returns length (in bytes) of the binary (i.e. the size of the value in the type)
BIT_COUNT(BYTEA X)Returns the number of set bits in x
SET_BIT(BYTEA x, INT n, INT v)Set the nth bit to v
SET_BYTE(BYTEA x, INT n, INT v)Set the nth byte to v
SUBSTR(BYTEA x, INT start [, INT count]) -> BYTEAReturns the byte as start offset up until count (default 1)
ENCODE(BYTEA x, VARCHAR format) -> VARCHARFormats x into a VARCHAR string which uses format ‘encoding.' We will support base64, escape, and hex formats
DECODE(VARCHAR x, VARCHAR format) -> BYTEAThe reverse of encode. Turns x into a BYTEA(n) using format as encoding
SHA1(BYTEA) -> BYTEACalculate SHA1 and returns a BYTEA
SHA256(BYTEA) -> BYTEACalculates SHA256 and returns a BYTEA
SHA512(BYTEA) -> BYTEACalculates SHA512 and returns a BYTEA
MD5(BYTEA) -> VARCHAR(32)Calculates MD5 and returns a hex respresentation as a VARCHAR. This is done for compatibility reasons

Aggregating and Ordering

BYTEA can be grouped and window aggregated. The only valid aggregate functions on BYTEA are MIN, MAX and COUNT (including COUNT DISTINCT).

BYTEA are ordered by their byte order (the same as LATIN9).

Joining and Equality

Two BYTEA types can only be equal if their lengths are the same and their byte patterns are the same. This means you can join on BYTEA and perform equality match by doing simple memcmp after length checks.

Distribution and Partitioning

The BYTEA type cannot be used for partitioning and distribution defined in the Data definition language (DDL). Attempting to do so results in the following error:

sql
[42603] The column '<column name>' cannot be used as a {distribution | partition} key because it is of the BYTEA type.