Overview

This section introduces concepts, terms, and behavior that you need to be familiar with before you start replicating Yellowbrick databases.

System Requirements

Note the following requirements for setting up and using database replication:

In most cases, you will replicate from a source system to a physically separate target system. However, you can also set up "loopback replication" into an aliased database on the same system where the source database resides. In this context, a "system" may be an on-premises appliance or a cloud data warehouse instance.
You can replicate any number of databases from one system to another.
You can only create and maintain 1 replica per database.
Network connectivity must be established and maintained between the source and target systems (with SSL trust configured).
Chained replication is not supported. For example, you can replicate database A on instance YB1 to database B on instance YB2, but you cannot in turn replicate database B on instance YB2 to database C on instance YB3.
Source and target systems need not be running on the same database software and hardware. For example, you can replicate a database from a Version 5.2.x on-premises appliance to a 6.2.x cloud data warehouse instance, or from a Version 6.1.x data warehouse instance to a 6.2.x instance.

For example, in this diagram database A on data warehouse instance YB1 is replicated to database A on instance YB2. (These instances could be part of the same cloud deployment or could reside in separate deployments.)

What Is Replicated and When

A database replica belongs to a single database. Therefore, a replica maintains a copy of all of the database objects that the source database contains, including tables, views, sequences, and stored procedures. You can either include all schemas or exclude one or more schemas (this feature is also available for regular backup and restore operations).

In addition to database-specific objects, by default replication maintains users, roles, and their privileges and grants. However, superusers are not replicated.

Replication is asynchronous; there is a short delay between a transaction commit in the source database and the moment when changes become visible in the target database. Replication occurs at regular intervals. You can set the interval initially when you create the replica, then modify it later if needed.

Replication Moving Parts

In order for replication to run smoothly, you need to make sure that the two systems are communicating securely, and you have to create some special database objects. This section is a quick summary of the "moving parts" that make replication possible. For details about the setup process, see Setting Up Replication.

SSL trust

A prerequisite for database replication over two systems is SSL trust, which must be established in both directions, based on imported SSL certificates that identify the hosts with fully qualified domain names (FQDNs).

Source and target databases

Replication depends on traffic between two physical databases, a source and a target. The source accepts reads and writes of all kinds and is fully active in that sense.

A target database for replication must be in HOT_STANDBY mode, which allows incremental writes via replication but prevents other write operations. Note that a target database is available for reads (queries) and creation of temporary database objects.

Remote servers

A remote server is a logical object that defines the relationship between a source system, where the remote server is defined, and the remote (or target) host that contains target databases for replication. A single remote server can be used to set up multiple replicas.

Replicas

A replica is a logical object that unites a source database, a remote server, and a target database. You can define only one replica per source database. A replica also sets the frequency for replication updates and a few other optional attributes. A reverse replica is an object that enables replication in the reverse direction, which is typically needed for failover and failback.

Replication cycles

A replication cycle is an event that occurs at the frequency specified by a replica. Replication can be set to run very frequently (such as every 60 seconds) or infrequently (such as only twice a day). Each cycle goes through a series of states until it is finished and the target system is fully updated. Transactional consistency is maintained between the source and target databases.

Replication and Backup/Restore

Database replication and database backup and restore are closely related operations. Replication makes use of backups and incremental restores to do its work; replication is a kind of automated backup and restore function. The main differences are that replication requires no intermediate storage and provides a means of failing over to a secondary system. Replicated data flows straight from one database to another without landing on disk anywhere. Replication is a good alternative to backup and restore for large systems when the space and cost requirements of NAS systems are prohibitive.

The synchronization of source and target databases depends on snapshots, which identify the state of a database at a given point in time. Each snapshot defines a set of changes that is replicated from the source to the target, providing a rollback capability if the two databases become out of sync.

A new database that is created by replication (or by a restore operation) cannot make use of previous backup chains and requires a new full backup of its own as soon as replication is started. You cannot take incremental backups against an existing backup chain for a replicated database. A new full backup, which implies a new backup chain, must be taken, then incremental backups can be taken against that new chain. (The backups in the original pre-replication backup chain may still be used for restore operations to other databases, but not for incremental backups of the new replicated database.) See also Backup Chains.

Replication Management and Monitoring

When replication is in progress, you can monitor replication cycles by querying system views. For monitoring purposes, information about replication cycles and current progress is visible on the source system, not the target system. Newly replicated data is visible on the target system only when a cycle is complete.

You can use ALTER DATABASE commands to modify the state or attributes of a replica:

Start, pause, or resume replication
Reverse the order of replication; promote the secondary system to be the primary
Modify the frequency
Rename a replica

You can take a target database out of HOT_STANDBY mode, but it will no longer accept replication updates; replication cannot be resumed. See HOT_STANDBY and READONLY Modes and Rolling Back a Replicated Database.

Parent topic:Database Replication

Data Warehouse Instances

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Bulk Load Examples

Loading from Amazon S3

Loading from Azure Blob Storage

Setting Up the ybrelay Service

Setting up and Running a Spark Job

Creating WLM Resource Pools

Creating WLM Rules

Rule Examples

DECIMAL

Data Type Casting

SQL String Constants

CREATE EXTERNAL FORMAT

CREATE TABLE

GRANT

SELECT

GROUP BY Clause

Subqueries

SHOW

SQL Conditions

String Functions

ENCRYPT_KS

Pattern Matching

SQL Operators and Pattern Matching Functions

Regular Expression Details

Datetime Functions

Mathematical Functions

Aggregate Functions

Window Functions

Conditional Expressions

Formatting Functions

Type-Safe Casting Functions

JSON Functions

System Functions

Network Address Functions

Overview ​

System Requirements ​

What Is Replicated and When ​

Replication Moving Parts ​