Examples with Duplicate Rows

The following examples demonstrate the behavior of different types of ybload operations when the --duplicate-handler option is used.

UPSERT Example

For example, table five_seasons contains:

premdb=# select * from five_seasons;
 seasonid | season_name | numteams |      winners      
----------+-------------+----------+-------------------
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       3 | 1994-1995   |       22 | Blackburn Rovers
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
(5 rows)

Assume that a source file contains the following 15 records. Note that there are duplicate records for seasonid 3, 6, and 10. The winners column values are different in these duplicate rows.

0,1991-1992,22,
1,1992-1993,22,Manchester United
2,1993-1994,22,Manchester United
3,1994-1995,22,Blackburn Rovers
3,1994-1995,22,Blackburn Rovers FC
4,1995-1996,20,Manchester United
5,1996-1997,20,Manchester United
6,1997-1998,20,Arsenal
6,1997-1998,20,Arsenal FC
7,1998-1999,20,Manchester United
8,1999-2000,20,Manchester United
9,2000-2001,20,Manchester United
10,2001-2002,20,Arsenal
10,2001-2002,20,Arsenal FC
30,2020-2021,20,

If it does not matter which of the duplicates is used for key matching, you can set --duplicate-handler to none or random. If you want a deterministic approach to matching duplicates, use an ORDER BY clause in the ybload command. For example:

--write-op upsert --duplicate-handler "order by winners asc"

For the 15 records shown here, order by winners asc will produce different results from order by winners desc. Ascending order means that Arsenal sorts above Arsenal FC, and Blackburn Rovers sorts above Blackburn Rovers FC.

If order by winners asc is used, the rows where winners values do not end in FC will be the chosen duplicate rows. The results of an UPSERT load will be:

premdb=# select * from five_seasons order by 1 asc;
 seasonid | season_name | numteams |      winners      
----------+-------------+----------+-------------------
       0 | 1991-1992   |       22 | [NULL]
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       3 | 1994-1995   |       22 | Blackburn Rovers
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
       6 | 1997-1998   |       20 | Arsenal
       7 | 1998-1999   |       20 | Manchester United
       8 | 1999-2000   |       20 | Manchester United
       9 | 2000-2001   |       20 | Manchester United
      10 | 2001-2002   |       20 | Arsenal
      30 | 2020-2021   |       20 | [NULL]
(12 rows)

If order by winners desc is used instead for the same upsert load, the winners values that end in FC will be chosen for seasonid 3, 6, and 10.

Note that you cannot use a declared key column as an ORDER BY column in the ybload command.

INSERT Example

For example, consider the following source file, which contains 16 records, including three rows with a seasonid of 6:

0,1991-1992,22,
1,1992-1993,22,Manchester United
2,1993-1994,22,Manchester United
3,1994-1995,22,Blackburn Rovers
3,1994-1995,22,Blackburn Rovers FC
4,1995-1996,20,Manchester United
5,1996-1997,20,Manchester United
6,1997-1998,20,Arsenal
6,1997-1998,20,Arsenal FC
6,1997-1998,20,Arsenal FC
7,1998-1999,20,Manchester United
8,1999-2000,20,Manchester United
9,2000-2001,20,Manchester United
10,2001-2002,20,Arsenal
10,2001-2002,20,Arsenal FC
30,2020-2021,20,

Before the insert load, table five_seasons contains:

premdb=# select * from five_seasons;
 seasonid | season_name | numteams |      winners      
----------+-------------+----------+-------------------
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       3 | 1994-1995   |       22 | Blackburn Rovers
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
(5 rows)

An insert load without duplicate handling results in 21 rows:

premdb=# select * from five_seasons order by 1;
 seasonid | season_name | numteams |       winners       
----------+-------------+----------+---------------------
       0 | 1991-1992   |       22 | [NULL]
       1 | 1992-1993   |       22 | Manchester United
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       3 | 1994-1995   |       22 | Blackburn Rovers
       3 | 1994-1995   |       22 | Blackburn Rovers
       3 | 1994-1995   |       22 | Blackburn Rovers FC
       4 | 1995-1996   |       20 | Manchester United
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
       6 | 1997-1998   |       20 | Arsenal
       6 | 1997-1998   |       20 | Arsenal FC
       6 | 1997-1998   |       20 | Arsenal FC
       7 | 1998-1999   |       20 | Manchester United
       8 | 1999-2000   |       20 | Manchester United
       9 | 2000-2001   |       20 | Manchester United
      10 | 2001-2002   |       20 | Arsenal
      10 | 2001-2002   |       20 | Arsenal FC
      30 | 2020-2021   |       20 | [NULL]
(21 rows)

The same load with --duplicate-handler "order by winners asc" results in 17 rows:

premdb=# select * from five_seasons order by 1;
 seasonid | season_name | numteams |      winners      
----------+-------------+----------+-------------------
       0 | 1991-1992   |       22 | [NULL]
       1 | 1992-1993   |       22 | Manchester United
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       3 | 1994-1995   |       22 | Blackburn Rovers
       3 | 1994-1995   |       22 | Blackburn Rovers
       4 | 1995-1996   |       20 | Manchester United
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
       6 | 1997-1998   |       20 | Arsenal
       7 | 1998-1999   |       20 | Manchester United
       8 | 1999-2000   |       20 | Manchester United
       9 | 2000-2001   |       20 | Manchester United
      10 | 2001-2002   |       20 | Arsenal
      30 | 2020-2021   |       20 | [NULL]
(17 rows)

The only duplicates that are discarded are the rows with FC in the winners column. The duplicates for seasonid values 1 through 5 are loaded. They were already in the table and are not affected by the duplicate handling of new source rows.

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

sys.lock

Bulk Load Examples

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Loading from Amazon S3

Loading from Azure Blob Storage

Setting up and Running a Spark Job

Setting Up the ybrelay Service

LDAP Authentication

Synchronizing Users and Groups

Appliance: Disk Encryption

Setting Up Encrypted Drives

Remote Diagnostics

System Alerts

Creating an Alert Endpoint

Using the System Management Console

ybcli Reference

ybcli: config

AWS Marketplace

Create Stack

Docker

Cloud: Configuration

Vanity DNS

Yellowbrick Manager

Cloud: Enterprise Edition Getting Started

SQL-Based Loads from External Storage

Cloud: Installation

CLI Install Instructions

Permissions

Private Install Instructions

Public Install Instructions

Cloud: Kubernetes Guides

CREATE EXTERNAL FORMAT

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

Plan Hinting

SELECT

GROUP BY Clause

Subqueries

Data Type Casting

DECIMAL

JSON

JSONB

SQL String Constants

Aggregate Functions

Conditional Expressions

Datetime Functions

Formatting Functions

Geospatial functions

Mathematical Functions

Network Address Functions

Pattern Matching

Regular Expression Details

SQL Operators and Pattern Matching Functions

SQL Conditions

SQL User Defined Function (UDF)

SQL UDF Create Function

String Functions

ENCRYPT_KS

System Functions

Type-Safe Casting Functions

Window Functions

Creating WLM Resource Pools

Creating WLM Rules

Rule Examples

Examples with Duplicate Rows ​

UPSERT Example ​

INSERT Example ​

Examples with Duplicate Rows

UPSERT Example

INSERT Example