Appearance
Delete, Update, and Upsert Examples
This section contains examples of loads that use the --write-op delete
, update
, and upsert
modes.
UPDATE and UPSERT Example
For example, table five_seasons
contains:
premdb=# select * from five_seasons;
seasonid | season_name | numteams | winners
----------+-------------+----------+-------------------
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
3 | 1994-1995 | 22 | Blackburn Rovers
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
(5 rows)
Assume that the source file for the load contains ten rows in which the winners
value always ends with FC
:
1,1992-1993,22,Manchester United FC
2,1993-1994,22,Manchester United FC
3,1994-1995,22,Blackburn Rovers FC
4,1995-1996,20,Manchester United FC
5,1996-1997,20,Manchester United FC
6,1997-1998,20,Arsenal FC
7,1998-1999,20,Manchester United FC
8,1999-2000,20,Manchester United FC
9,2000-2001,20,Manchester United FC
10,2001-2002,20,Arsenal FC
If the seasonid
column is the primary key of the table and is used to match rows in the table with rows in the source file, a --write-op update
load would result in the modification of all five existing rows. Rows 6
through 10
would be read but not loaded.
Starting with the same five-row table and source file as the first UPDATE
example, a --write-op upsert
load would update five rows but also append the five new rows. The result would be a ten-row table.
premdb=# select * from five_seasons order by 1;
seasonid | season_name | numteams | winners
----------+-------------+----------+----------------------
1 | 1992-1993 | 22 | Manchester United FC
2 | 1993-1994 | 22 | Manchester United FC
3 | 1994-1995 | 22 | Blackburn Rovers FC
4 | 1995-1996 | 20 | Manchester United FC
5 | 1996-1997 | 20 | Manchester United FC
6 | 1997-1998 | 20 | Arsenal FC
7 | 1998-1999 | 20 | Manchester United FC
8 | 1999-2000 | 20 | Manchester United FC
9 | 2000-2001 | 20 | Manchester United FC
10 | 2001-2002 | 20 | Arsenal FC
(10 rows)
DELETE Example
For this example, assume that the five_seasons
table has the following 12 rows in it:
premdb=# select * from five_seasons order by 1;
seasonid | season_name | numteams | winners
----------+-------------+----------+-------------------
0 | 1991-1992 | 22 | [NULL]
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
3 | 1994-1995 | 22 | Blackburn Rovers
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
6 | 1997-1998 | 20 | Arsenal
7 | 1998-1999 | 20 | Manchester United
8 | 1999-2000 | 20 | Manchester United
9 | 2000-2001 | 20 | Manchester United
10 | 2001-2002 | 20 | Arsenal
30 | 2020-2021 | 20 | [NULL]
(12 rows)
Assume that the source file contains 7 records as follows:
0,,,
3,1994-1995,22,Blackburn Rovers
6,1997-1998,20,Arsenal
10,2001-2002,20,Arsenal
11,,,
12,,,
30,2020-2021,20,
Using the seasonid
key field, a --write-op delete
load would delete 5 of these records (those with seasonid
0, 3, 6, 10, and 30). The resulting table would have 7 rows:
premdb=# select * from five_seasons order by 1;
seasonid | season_name | numteams | winners
----------+-------------+----------+-------------------
1 | 1992-1993 | 22 | Manchester United
2 | 1993-1994 | 22 | Manchester United
4 | 1995-1996 | 20 | Manchester United
5 | 1996-1997 | 20 | Manchester United
7 | 1998-1999 | 20 | Manchester United
8 | 1999-2000 | 20 | Manchester United
9 | 2000-2001 | 20 | Manchester United
(7 rows)
UPSERT Example with ybload Output and System View Queries
This example shows the ybload
command and output for a very small upsert
load. Then it shows how to query sys.log_load
and sys.log_query
to get more information about the load operation.
A table called minimatch
contains 15 rows:
premdb=# select * from minimatch order by seasonid,htid,atid;
seasonid | matchday | htid | atid | ftscore | htscore
----------+---------------------+------+------+---------+---------
22 | 2014-03-29 15:00:00 | 2 | 74 | 1-1 | 0-1
22 | 2014-02-12 00:00:00 | 2 | 75 | 0-0 | 0-0
22 | 2014-04-28 15:00:00 | 2 | 77 | 3-0 | 2-0
22 | 2013-10-19 00:00:00 | 2 | 78 | 4-1 | 1-0
22 | 2013-11-23 15:00:00 | 2 | 86 | 2-0 | 1-0
22 | 2013-09-22 00:00:00 | 2 | 87 | 3-1 | 2-1
22 | 2014-02-22 15:00:00 | 2 | 88 | 4-1 | 3-0
22 | 2014-03-25 00:00:00 | 2 | 89 | 2-2 | 0-1
22 | 2013-09-01 15:00:00 | 2 | 91 | 1-0 | 1-0
22 | 2014-05-04 00:00:00 | 2 | 93 | 1-0 | 1-0
25 | 2020-10-24 13:00:00 | 3 | 75 | 1-1 | 1-0
25 | 2020-10-24 12:30:00 | 4 | 76 | 6-2 | 2-2
25 | 2020-10-24 15:00:00 | 5 | 87 | 1-4 | 1-2
25 | 2020-10-25 11:00:00 | 6 | 89 | 0-3 | 0-1
25 | 2020-10-25 13:00:00 | 7 | 91 | 3-0 | 1-0
(15 rows)
The source file for an upsert load looks like this:
$ more minimatch_upsert.csv
22,2014-02-12 15:00:00,2,75,0-0,0-0
22,2013-10-19 15:00:00,2,78,4-1,1-0
22,2013-09-22 15:00:00,2,87,3-1,2-1
22,2014-03-25 15:00:00,2,89,2-2,0-1
22,2014-05-04 15:00:00,2,93,1-0,1-0
22,2014-03-29 15:00:00,2,74,1-1,0-1
22,2014-04-28 15:00:00,2,77,3-0,2-0
22,2013-11-23 15:00:00,2,86,2-0,1-0
22,2014-02-22 15:00:00,2,88,4-1,3-0
22,2013-09-01 15:00:00,2,91,1-0,1-0
25,2020-10-24 13:00:00,3,75,1-1,1-0
25,2020-10-24 12:30:00,4,76,6-2,2-2
25,2020-10-24 15:00:00,5,87,1-4,1-2
25,2020-10-25 11:00:00,6,89,0-3,0-1
25,2020-10-25 13:00:00,7,91,3-0,1-0
30,2020-10-26 13:00:00,8,88,4-2,2-2
The file contains 16 rows, including a new row with seasonid 30
. 10 rows in the file match the rows that are already in the table. 5 other rows match, except for a change in the timestamp for the matchday
column.
An upsert
load reads the 16
rows from the source file, using three key fields:
$ ybload -d premdb --username bobr -t minimatch -W --format csv --write-op upsert --key-field-names seasonid,htid,atid /home/brumsby/minimatch_upsert.csv
...
20:34:57.159 [ INFO] Flushing last 16 rows (of 16 total) in transaction #1 for minimatch
READ:576.0 B(333.6 B/s). ROWS G/B: 16/0( 9.27 /s). WRITE:544.0 B(315.1 B/s). TIME E/R: 0:00:01/ --:--:--20:34:57.174 [ INFO] Committing 16 rows into transaction #1 for minimatch ...
20:34:57.514 [ INFO] Committed transaction #1 after a total of 544 bytes and 16 good rows for minimatch
20:34:57.531 [ INFO] READ:576.0 B(278.5 B/s). ROWS G/B: 16/0( 7.74 /s). WRITE:544.0 B(263.0 B/s). TIME E/R: 0:00:02/ --:--:--
20:34:57.533 [ INFO] SUCCESSFUL BULK LOAD: Read 16 good rows from the source file(s) in 0:00:02 (READ: 278.5 B/s WRITE: 263.0 B/s). Number of rows affected is tracked in sys.log_query.
To find out more about the results of this load, query sys.log_load
:
premdb=# select session_key, table_name, username, end_time, transaction_first, inserted_rows
from sys.log_load where username='bobr' and table_name='minimatch' and error_string is null
order by end_time desc;
session_key | table_name | username | end_time | transaction_first | inserted_rows
------------------------------------------------------------------+------------+----------+----------------------------+-------------------+---------------
BrJx6HcnNQkNOGxorNSPD76G-br5LQwqtgcM5HPDCclCej4OOLAaqwzcd9wF1dnK | minimatch | bobr | 2020-10-29 20:34:57.552-07 | 92925 | 16
...
Given the transaction ID for the load in question, you can query the sys.log_query
view for more details and statistics. (Alternatively, do this in one step by joining the sys.log_load
view to the sys.log_query
view on transaction_first=transaction_id
.)
premdb=# select substr(query_text,1,25), rows_inserted
from sys.log_query where transaction_id=92925 order by done_time asc;
substr | rows_inserted
---------------------------+---------------
YBULKLOAD INTO "UPSERT_1_ | 16
UPDATE "premdb"."public". | 15
INSERT INTO "premdb"."pub | 1
ANALYZE HLL public.minima | 0
DROP TABLE ID pg_temp_10. | 0
(5 rows)
Note that an upsert
load is processed in five separate steps:
YBULKLOAD
into a temporary table.UPDATE
the target table.INSERT INTO
the target table.ANALYZE
the target table.DROP
the temporary table.
Based on the results of this query, you know that 16
rows were read from the source file, 15
rows were updated, and 1
completely new row was inserted.
Note that 15 rows are updated because 15 rows matched on the key fields: seasonid,htid,atid
. Only 5 of these rows were actually modified (their matchday
values were changed), but all of them were written to the database. In other words,10 existing rows were written to the database with the exact same information that was stored before, and 5 existing rows were written with a change to one column.
The updated table now contains the following 16 rows:
premdb=# select * from minimatch order by seasonid,htid,atid;
seasonid | matchday | htid | atid | ftscore | htscore
----------+---------------------+------+------+---------+---------
22 | 2014-03-29 15:00:00 | 2 | 74 | 1-1 | 0-1
22 | 2014-02-12 15:00:00 | 2 | 75 | 0-0 | 0-0
22 | 2014-04-28 15:00:00 | 2 | 77 | 3-0 | 2-0
22 | 2013-10-19 15:00:00 | 2 | 78 | 4-1 | 1-0
22 | 2013-11-23 15:00:00 | 2 | 86 | 2-0 | 1-0
22 | 2013-09-22 15:00:00 | 2 | 87 | 3-1 | 2-1
22 | 2014-02-22 15:00:00 | 2 | 88 | 4-1 | 3-0
22 | 2014-03-25 15:00:00 | 2 | 89 | 2-2 | 0-1
22 | 2013-09-01 15:00:00 | 2 | 91 | 1-0 | 1-0
22 | 2014-05-04 15:00:00 | 2 | 93 | 1-0 | 1-0
25 | 2020-10-24 13:00:00 | 3 | 75 | 1-1 | 1-0
25 | 2020-10-24 12:30:00 | 4 | 76 | 6-2 | 2-2
25 | 2020-10-24 15:00:00 | 5 | 87 | 1-4 | 1-2
25 | 2020-10-25 11:00:00 | 6 | 89 | 0-3 | 0-1
25 | 2020-10-25 13:00:00 | 7 | 91 | 3-0 | 1-0
30 | 2020-10-26 13:00:00 | 8 | 88 | 4-2 | 2-2
(16 rows)
Parent topic:Bulk Load Examples