Encrypting Sensitive Data

You can use the ENCRYPT_KS and DECRYPT_KS SQL functions to protect access to sensitive character data, such as Personally Identifiable Information (PII). Encryption functions scramble the input data they receive, rendering it unreadable to any user or role who does not have both the privileges and the access key required to decrypt it. Authorized users can run the encryption functions against any valid SQL expression that evaluates to a character string.

You can also create tables with encrypted columns; in this case, those columns are encrypted automatically when data is loaded or inserted. You do not need to call the ENCRYPT_KS function explicitly.

The Yellowbrick implementation of the ENCRYPT_KS and DECRYPT_KS functions contains code under the OpenSSL license.

Configuring Encryption for SQL Access

The ENCRYPT_KS and DECRYPT_KS functions are backed by an integrated "vault" on the manager node, which functions as a key manager (or keystore). The same vault is used in the manager node to secure keys for unlocking encrypted drives; see System Hardware Encryption.

Before you can use the SQL encryption functions, you have to unlock the keystore file and complete the following configuration steps:
  1. Enable SQL encryption for the system by running the following ybcli command on the manager node:
    YBCLI(12545) (PRIMARY - yb100-mgr0)> config keystore sql enable
    Are you sure you want to configure keystore for SQL integration?
    Response (yes/no): yes

    This command sets up access to the vault. The system looks up keys in the vault when they are specified as arguments to encryption functions in SQL queries.

  2. Grant CREATE privilege per schema to users and roles who will be able to create, delete, and describe keys created in that schema. (This is the same type of privilege that you would grant to users who are going to create tables and other objects.) For example:
    premdb=# grant create on schema public to yb100;
  3. Create keys with the CREATE KEY SQL command, using a valid hexadecimal string to define a "secret." Keys belong to the database and schema where you create them. For example:
    premdb=# CREATE KEY yb100key SECRET 'a1b2c3d4e5f0';
    This command creates a key for the public schema in the premdb database. The secret for the key is saved in the encryption keystore. (Authorized users pass in the key name, not its associated secret, when they run encryption functions.) Administrators can query the sys.key view to return a list of keys that have been created per database and schema. For example:
    premdb=# select * from sys.key;
     key_id |   name   | schema_id | owner_id |         creation_time         
      16480 | yb100key |      2200 |    16007 | 2019-11-07 15:26:30.577896-08
    (1 row)
  4. Grant ENCRYPT and/or DECRYPT access on keys to specific users and roles. For example:
    premdb=# grant all on key yb100key to yb100;

    This example grants ENCRYPT and DECRYPT privileges to members of the role named yb100.

  5. Test the functions by encrypting and then decrypting a column from a table:
    premdb=# create table encrypted_names as select encrypt_ks(nickname, yb100key) enc from team;
    SELECT 50
    premdb=# select decrypt_ks(enc,yb100key) from encrypted_names;
For more details about the commands used in these steps, see:

Calling the Functions

The ENCRYPT_KS and DECRYPT_KS functions take the following arguments:
  • An input expression (required), which must evaluate to a character string. For example, this could be a column name, the concatenation of values from two columns, or the result of another function.
  • The name of the key (required). Users specify keys to ensure that the data is encrypted and decrypted consistently every time the same source data is accessed. Keys are mapped to "secrets" stored in the vault.
  • One of three standard AES algorithms, all of which use Output Feedback Mode (OFB):
    • AES with a 128-bit key (the default)
    • AES with a 192-bit key
    • AES with a 256-bit key
  • An optional "initialization vector" provides an additional level of obfuscation. This parameter provides an efficient way to re-scramble the encrypted data without having to rebuild the overall character mapping that is generated when a new key is used. Instead of using a different key for each function call, you can use the same key and adjust the vector. Taken together, the key and vector parameters function like a login credential with two levels of privacy and security.

Using ybunload, you can unload data with a query that contains the ENCRYPT_KS function. You can reload the data as is with ybload, then decrypt it with a query or CTAS statement that uses the DECRYPT_KS function. (You cannot load and decrypt data in one step with ybload.)

Creating Tables with Encrypted Columns

When you create a table with encrypted columns, data inserted or loaded into those columns is automatically encrypted using the ENCRYPT_KS function. In turn, encrypted data is automatically decrypted when it is selected by users who have DECRYPT privilege on the key (and SELECT privilege on the table). Users writing queries against encrypted columns do not have to specify the DECRYPT_KS function. Users who do not have DECRYPT privilege on the key see the stored encrypted values.

The CREATE TABLE statement supports the ENCRYPTED column constraint for VARCHAR columns only. For example, the following table is defined with three encrypted VARCHAR columns: weekly_wages, dob (date of birth), and cob (country of birth):
premdb=# create table player(
playerid bigint not null,
teamid smallint not null,
seasonid smallint not null,
firstname varchar(30),
lastname varchar(30),
position char(1),
dob varchar(100) encrypted with(COLUMN_ENCRYPTION_KEY=playerkey,ENCRYPTION_TYPE=RANDOMIZED,ALGORITHM='AES_256_OFB'),
weekly_wages varchar(100) encrypted with(COLUMN_ENCRYPTION_KEY=playerkey,ENCRYPTION_TYPE=RANDOMIZED,ALGORITHM='AES_256_OFB'),
avg_mins_per_match double precision,
matches_played real,
cob varchar(100) encrypted with(COLUMN_ENCRYPTION_KEY=playerkey,ENCRYPTION_TYPE=RANDOMIZED,ALGORITHM='AES_256_OFB'));
premdb=# \d player
                                                                          Table "public.player"
       Column       |          Type          |                                                         Modifiers                                                         
 playerid           | bigint                 | not null
 teamid             | smallint               | not null
 seasonid           | smallint               | not null
 firstname          | character varying(30)  | 
 lastname           | character varying(30)  | 
 position           | character(1)           | 
 dob                | character varying(100) | encrypted with (column_encryption_key = premdb.public.playerkey, encryption_type = randomized, algorithm = 'aes_256_ofb')
 weekly_wages       | character varying(100) | encrypted with (column_encryption_key = premdb.public.playerkey, encryption_type = randomized, algorithm = 'aes_256_ofb')
 avg_mins_per_match | double precision       | 
 matches_played     | real                   | 
 cob                | character varying(100) | encrypted with (column_encryption_key = premdb.public.playerkey, encryption_type = randomized, algorithm = 'aes_256_ofb')

Distribution: Hash (playerid)
This table is then loaded from a CSV file with rows like these:
15,23,27,Virgil,Van Dijk,D,07-08-1991,180000,87.6345,37.4,Netherlands
User bobr belongs to a role with SELECT privilege on the table, but this role does not have DECRYPT privilege on the key. Therefore, user bobr sees rows like these:
premdb=> select * from player where playerid between 15 and 17;
-[ RECORD 1 ]------+-------------------------------------------------------------
playerid           | 15
teamid             | 23
seasonid           | 27
firstname          | Virgil
lastname           | Van Dijk
position           | D
dob                | CVdbnbTmmeYaaA4SZfOV590jszv8Ooj09zGHsIlB7Q9WhjS2ywNMrg9JVA==
weekly_wages       | CYJzBAQSvlufeMDyWtH2Yr5TA7Ao+ulFpP3eY2SwBSVpNb/DlQjn
avg_mins_per_match | 87.6345
matches_played     | 37.4
cob                | CYz7BPoHAjpNJF0E+O8K/gU8yhRj7oFgcCKvb37lG/LlSep/9M2UbFHHECo=
-[ RECORD 2 ]------+-------------------------------------------------------------
playerid           | 16
teamid             | 24
seasonid           | 27
firstname          | David
lastname           | Silva
position           | M
dob                | CUdWko/n2n6SbaaVb39Ck6RYwPH9+intp+oP0p7Uq5M+mMIPQa6iV+wMzA==
weekly_wages       | CXWWk/rZxpNJM5LtBfW32nx+eoQ/0X9NexX236RNAC3N4533++rD
avg_mins_per_match | 79.6723
matches_played     | 30.5
cob                | CXjnwR9g7+iFgQi8IfBsZRYpmj4+sgkmlvSHrZFdCnZmfG+q4YI=
-[ RECORD 3 ]------+-------------------------------------------------------------
playerid           | 17
teamid             | 22
seasonid           | 27
firstname          | Jamie
lastname           | Vardy
position           | F
dob                | CeVFkbgdYq2ZOjW7s0E/RQSWlN8W+TsvwYVu8W5EDDPRuRFJS5Mwu7/QyQ==
weekly_wages       | CRMmOw16X8foeWOsq0N+x7kTNXTTFsHo5aXXeH6yzhwmeWu1XMI=
avg_mins_per_match | 77.1986
matches_played     | 31.6
cob                | Cf9r4Q60m/5FtYWSa8jwu3164z0SoL8UDxuz0p3FPQaRQmTjU7oL7w==
If user bobr is granted DECRYPT privilege on the key (either directly or via a role in which he has membership), all of the column values are visible in clear text:
premdb=# grant decrypt on key playerkey to bobr;
premdb=# \c premdb bobr
You are now connected to database "premdb" as user "bobr".
premdb=> select * from player where playerid between 15 and 17;
 playerid | teamid | seasonid | firstname | lastname | position |    dob     | weekly_wages | avg_mins_per_match | matches_played |     cob     
       15 |     23 |       27 | Virgil    | Van Dijk | D        | 07-08-1991 | 180000       |            87.6345 |           37.4 | Netherlands
       16 |     24 |       27 | David     | Silva    | M        | 01-08-1986 | 250000       |            79.6723 |           30.5 | Spain
       17 |     22 |       27 | Jamie     | Vardy    | F        | 01-11-1987 | 80000        |            77.1986 |           31.6 | England
(3 rows)

For more details about creating tables with encrypted columns, see CREATE TABLE. For details about the specific algorithms used to encrypt data, see Encryption and Decryption Algorithms. For information about accessing encrypted data in a restored database, see Decrypting Columns in a Restored Database.