Appearance
Examples with Duplicate Rows
The following examples demonstrate the behavior of different types of ybload
operations when the --duplicate-handler
option is used.
UPSERT Example
For example, table five_seasons
contains:
premdb=# select * from five_seasons;
seasonid | season_name | numteams | winners
----------+-------------+----------+-------------------
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
3 | 1994-1995 | 22 | Blackburn Rovers
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
(5 rows)
Assume that a source file contains the following 15 records. Note that there are duplicate records for seasonid 3
, 6
, and 10
. The winners
column values are different in these duplicate rows.
0,1991-1992,22,
1,1992-1993,22,Manchester United
2,1993-1994,22,Manchester United
3,1994-1995,22,Blackburn Rovers
3,1994-1995,22,Blackburn Rovers FC
4,1995-1996,20,Manchester United
5,1996-1997,20,Manchester United
6,1997-1998,20,Arsenal
6,1997-1998,20,Arsenal FC
7,1998-1999,20,Manchester United
8,1999-2000,20,Manchester United
9,2000-2001,20,Manchester United
10,2001-2002,20,Arsenal
10,2001-2002,20,Arsenal FC
30,2020-2021,20,
If it does not matter which of the duplicates is used for key matching, you can set --duplicate-handler
to none
or random
. If you want a deterministic approach to matching duplicates, use an ORDER BY
clause in the ybload
command. For example:
--write-op upsert --duplicate-handler "order by winners asc"
For the 15 records shown here, order by winners asc
will produce different results from order by winners desc
. Ascending order means that Arsenal
sorts above Arsenal FC
, and Blackburn Rovers
sorts above Blackburn Rovers FC
.
If order by winners asc
is used, the rows where winners
values do not end in FC
will be the chosen duplicate rows. The results of an UPSERT
load will be:
premdb=# select * from five_seasons order by 1 asc;
seasonid | season_name | numteams | winners
----------+-------------+----------+-------------------
0 | 1991-1992 | 22 | [NULL]
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
3 | 1994-1995 | 22 | Blackburn Rovers
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
6 | 1997-1998 | 20 | Arsenal
7 | 1998-1999 | 20 | Manchester United
8 | 1999-2000 | 20 | Manchester United
9 | 2000-2001 | 20 | Manchester United
10 | 2001-2002 | 20 | Arsenal
30 | 2020-2021 | 20 | [NULL]
(12 rows)
If order by winners desc
is used instead for the same upsert
load, the winners
values that end in FC
will be chosen for seasonid 3
, 6
, and 10
.
Note that you cannot use a declared key column as an ORDER BY
column in the ybload
command.
See also ybload Options.
INSERT Example
For example, consider the following source file, which contains 16 records, including three rows with a seasonid
of 6
:
0,1991-1992,22,
1,1992-1993,22,Manchester United
2,1993-1994,22,Manchester United
3,1994-1995,22,Blackburn Rovers
3,1994-1995,22,Blackburn Rovers FC
4,1995-1996,20,Manchester United
5,1996-1997,20,Manchester United
6,1997-1998,20,Arsenal
6,1997-1998,20,Arsenal FC
6,1997-1998,20,Arsenal FC
7,1998-1999,20,Manchester United
8,1999-2000,20,Manchester United
9,2000-2001,20,Manchester United
10,2001-2002,20,Arsenal
10,2001-2002,20,Arsenal FC
30,2020-2021,20,
Before the insert
load, table five_seasons
contains:
premdb=# select * from five_seasons;
seasonid | season_name | numteams | winners
----------+-------------+----------+-------------------
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
3 | 1994-1995 | 22 | Blackburn Rovers
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
(5 rows)
An insert
load without duplicate handling results in 21 rows:
premdb=# select * from five_seasons order by 1;
seasonid | season_name | numteams | winners
----------+-------------+----------+---------------------
0 | 1991-1992 | 22 | [NULL]
1 | 1992-1993 | 22 | Manchester United
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
3 | 1994-1995 | 22 | Blackburn Rovers
3 | 1994-1995 | 22 | Blackburn Rovers
3 | 1994-1995 | 22 | Blackburn Rovers FC
4 | 1995-1996 | 20 | Manchester United
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
6 | 1997-1998 | 20 | Arsenal
6 | 1997-1998 | 20 | Arsenal FC
6 | 1997-1998 | 20 | Arsenal FC
7 | 1998-1999 | 20 | Manchester United
8 | 1999-2000 | 20 | Manchester United
9 | 2000-2001 | 20 | Manchester United
10 | 2001-2002 | 20 | Arsenal
10 | 2001-2002 | 20 | Arsenal FC
30 | 2020-2021 | 20 | [NULL]
(21 rows)
The same load with --duplicate-handler "order by winners asc"
results in 17 rows:
premdb=# select * from five_seasons order by 1;
seasonid | season_name | numteams | winners
----------+-------------+----------+-------------------
0 | 1991-1992 | 22 | [NULL]
1 | 1992-1993 | 22 | Manchester United
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
3 | 1994-1995 | 22 | Blackburn Rovers
3 | 1994-1995 | 22 | Blackburn Rovers
4 | 1995-1996 | 20 | Manchester United
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
6 | 1997-1998 | 20 | Arsenal
7 | 1998-1999 | 20 | Manchester United
8 | 1999-2000 | 20 | Manchester United
9 | 2000-2001 | 20 | Manchester United
10 | 2001-2002 | 20 | Arsenal
30 | 2020-2021 | 20 | [NULL]
(17 rows)
The only duplicates that are discarded are the rows with FC in the winners column. The duplicates for seasonid
values 1 through 5 are loaded. They were already in the table and are not affected by the duplicate handling of new source rows.