Skip to content

Delete, Update, and Upsert Examples

This section contains examples of loads that use the --write-op delete, update, and upsert modes.

UPDATE and UPSERT Example

For example, table five_seasons contains:

premdb=# select * from five_seasons;
 seasonid | season_name | numteams |      winners      
----------+-------------+----------+-------------------
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       3 | 1994-1995   |       22 | Blackburn Rovers
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
(5 rows)

Assume that the source file for the load contains ten rows in which the winners value always ends with FC:

1,1992-1993,22,Manchester United FC
2,1993-1994,22,Manchester United FC
3,1994-1995,22,Blackburn Rovers FC
4,1995-1996,20,Manchester United FC
5,1996-1997,20,Manchester United FC
6,1997-1998,20,Arsenal FC
7,1998-1999,20,Manchester United FC
8,1999-2000,20,Manchester United FC
9,2000-2001,20,Manchester United FC
10,2001-2002,20,Arsenal FC

If the seasonid column is the primary key of the table and is used to match rows in the table with rows in the source file, a --write-op update load would result in the modification of all five existing rows. Rows 6 through 10 would be read but not loaded.

Starting with the same five-row table and source file as the first UPDATE example, a --write-op upsert load would update five rows but also append the five new rows. The result would be a ten-row table.

premdb=# select * from five_seasons order by 1;
 seasonid | season_name | numteams |       winners      
----------+-------------+----------+----------------------
       1 | 1992-1993   |       22 | Manchester United FC
       2 | 1993-1994   |       22 | Manchester United FC
       3 | 1994-1995   |       22 | Blackburn Rovers FC
       4 | 1995-1996   |       20 | Manchester United FC
       5 | 1996-1997   |       20 | Manchester United FC
       6 | 1997-1998   |       20 | Arsenal FC
       7 | 1998-1999   |       20 | Manchester United FC
       8 | 1999-2000   |       20 | Manchester United FC
       9 | 2000-2001   |       20 | Manchester United FC
      10 | 2001-2002   |       20 | Arsenal FC
(10 rows)

DELETE Example

For this example, assume that the five_seasons table has the following 12 rows in it:

premdb=# select * from five_seasons order by 1;
 seasonid | season_name | numteams |      winners      
----------+-------------+----------+-------------------
       0 | 1991-1992   |       22 | [NULL]
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       3 | 1994-1995   |       22 | Blackburn Rovers
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
       6 | 1997-1998   |       20 | Arsenal
       7 | 1998-1999   |       20 | Manchester United
       8 | 1999-2000   |       20 | Manchester United
       9 | 2000-2001   |       20 | Manchester United
      10 | 2001-2002   |       20 | Arsenal
      30 | 2020-2021   |       20 | [NULL]
(12 rows)

Assume that the source file contains 7 records as follows:

0,,,
3,1994-1995,22,Blackburn Rovers
6,1997-1998,20,Arsenal
10,2001-2002,20,Arsenal
11,,,
12,,,
30,2020-2021,20,

Using the seasonid key field, a --write-op delete load would delete 5 of these records (those with seasonid 0, 3, 6, 10, and 30). The resulting table would have 7 rows:

premdb=# select * from five_seasons order by 1;
 seasonid | season_name | numteams |      winners      
----------+-------------+----------+-------------------
       1 | 1992-1993   |       22 | Manchester United
       2 | 1993-1994   |       22 | Manchester United
       4 | 1995-1996   |       20 | Manchester United
       5 | 1996-1997   |       20 | Manchester United
       7 | 1998-1999   |       20 | Manchester United
       8 | 1999-2000   |       20 | Manchester United
       9 | 2000-2001   |       20 | Manchester United
(7 rows)

UPSERT Example with ybload Output and System View Queries

This example shows the ybload command and output for a very small upsert load. Then it shows how to query sys.log_load and sys.log_query to get more information about the load operation.

A table called minimatch contains 15 rows:

premdb=# select * from minimatch order by seasonid,htid,atid;
 seasonid |    matchday       | htid | atid | ftscore | htscore 
----------+---------------------+------+------+---------+---------
      22 | 2014-03-29 15:00:00 |    2 |   74 | 1-1     | 0-1
      22 | 2014-02-12 00:00:00 |    2 |   75 | 0-0     | 0-0
      22 | 2014-04-28 15:00:00 |    2 |   77 | 3-0     | 2-0
      22 | 2013-10-19 00:00:00 |    2 |   78 | 4-1     | 1-0
      22 | 2013-11-23 15:00:00 |    2 |   86 | 2-0     | 1-0
      22 | 2013-09-22 00:00:00 |    2 |   87 | 3-1     | 2-1
      22 | 2014-02-22 15:00:00 |    2 |   88 | 4-1     | 3-0
      22 | 2014-03-25 00:00:00 |    2 |   89 | 2-2     | 0-1
      22 | 2013-09-01 15:00:00 |    2 |   91 | 1-0     | 1-0
      22 | 2014-05-04 00:00:00 |    2 |   93 | 1-0     | 1-0
      25 | 2020-10-24 13:00:00 |    3 |   75 | 1-1     | 1-0
      25 | 2020-10-24 12:30:00 |    4 |   76 | 6-2     | 2-2
      25 | 2020-10-24 15:00:00 |    5 |   87 | 1-4     | 1-2
      25 | 2020-10-25 11:00:00 |    6 |   89 | 0-3     | 0-1
      25 | 2020-10-25 13:00:00 |    7 |   91 | 3-0     | 1-0
(15 rows)

The source file for an upsert load looks like this:

$ more minimatch_upsert.csv
22,2014-02-12 15:00:00,2,75,0-0,0-0
22,2013-10-19 15:00:00,2,78,4-1,1-0
22,2013-09-22 15:00:00,2,87,3-1,2-1
22,2014-03-25 15:00:00,2,89,2-2,0-1
22,2014-05-04 15:00:00,2,93,1-0,1-0
22,2014-03-29 15:00:00,2,74,1-1,0-1
22,2014-04-28 15:00:00,2,77,3-0,2-0
22,2013-11-23 15:00:00,2,86,2-0,1-0
22,2014-02-22 15:00:00,2,88,4-1,3-0
22,2013-09-01 15:00:00,2,91,1-0,1-0
25,2020-10-24 13:00:00,3,75,1-1,1-0
25,2020-10-24 12:30:00,4,76,6-2,2-2
25,2020-10-24 15:00:00,5,87,1-4,1-2
25,2020-10-25 11:00:00,6,89,0-3,0-1
25,2020-10-25 13:00:00,7,91,3-0,1-0
30,2020-10-26 13:00:00,8,88,4-2,2-2

The file contains 16 rows, including a new row with seasonid 30. 10 rows in the file match the rows that are already in the table. 5 other rows match, except for a change in the timestamp for the matchday column.

An upsert load reads the 16 rows from the source file, using three key fields:

$ ybload -d premdb --username bobr -t minimatch -W --format csv --write-op upsert --key-field-names seasonid,htid,atid /home/brumsby/minimatch_upsert.csv
...
20:34:57.159 [ INFO] Flushing last 16 rows (of 16 total) in transaction #1 for minimatch
READ:576.0 B(333.6 B/s). ROWS G/B: 16/0( 9.27 /s). WRITE:544.0 B(315.1 B/s).  TIME E/R:   0:00:01/ --:--:--20:34:57.174 [ INFO] Committing 16 rows into transaction #1 for minimatch ...
20:34:57.514 [ INFO] Committed transaction #1 after a total of 544 bytes and 16 good rows for minimatch
20:34:57.531 [ INFO] READ:576.0 B(278.5 B/s). ROWS G/B: 16/0( 7.74 /s). WRITE:544.0 B(263.0 B/s).  TIME E/R:   0:00:02/ --:--:--
20:34:57.533 [ INFO] SUCCESSFUL BULK LOAD: Read 16 good rows from the source file(s) in   0:00:02 (READ: 278.5 B/s WRITE: 263.0 B/s). Number of rows affected is tracked in sys.log_query.

To find out more about the results of this load, query sys.log_load:

premdb=# select session_key, table_name, username, end_time, transaction_first, inserted_rows 
from sys.log_load where username='bobr' and table_name='minimatch' and error_string is null 
order by end_time desc;
                          session_key                            | table_name | username |          end_time          | transaction_first | inserted_rows 
------------------------------------------------------------------+------------+----------+----------------------------+-------------------+---------------
 BrJx6HcnNQkNOGxorNSPD76G-br5LQwqtgcM5HPDCclCej4OOLAaqwzcd9wF1dnK | minimatch  | bobr     | 2020-10-29 20:34:57.552-07 |             92925 |            16
 ...

Given the transaction ID for the load in question, you can query the sys.log_query view for more details and statistics. (Alternatively, do this in one step by joining the sys.log_load view to the sys.log_query view on transaction_first=transaction_id.)

premdb=# select substr(query_text,1,25), rows_inserted 
from sys.log_query where transaction_id=92925 order by done_time asc;
         substr           | rows_inserted 
---------------------------+---------------
 YBULKLOAD INTO "UPSERT_1_ |            16
 UPDATE "premdb"."public". |            15
 INSERT INTO "premdb"."pub |             1
 ANALYZE HLL public.minima |             0
 DROP TABLE ID pg_temp_10. |             0
(5 rows)

Note that an upsert load is processed in five separate steps:

  • YBULKLOAD into a temporary table.
  • UPDATE the target table.
  • INSERT INTO the target table.
  • ANALYZE the target table.
  • DROP the temporary table.

Based on the results of this query, you know that 16 rows were read from the source file, 15 rows were updated, and 1 completely new row was inserted.

Note that 15 rows are updated because 15 rows matched on the key fields: seasonid,htid,atid. Only 5 of these rows were actually modified (their matchday values were changed), but all of them were written to the database. In other words,10 existing rows were written to the database with the exact same information that was stored before, and 5 existing rows were written with a change to one column.

The updated table now contains the following 16 rows:

premdb=# select * from minimatch order by seasonid,htid,atid;
 seasonid |      matchday       | htid | atid | ftscore | htscore 
----------+---------------------+------+------+---------+---------
      22 | 2014-03-29 15:00:00 |    2 |   74 | 1-1     | 0-1
      22 | 2014-02-12 15:00:00 |    2 |   75 | 0-0     | 0-0
      22 | 2014-04-28 15:00:00 |    2 |   77 | 3-0     | 2-0
      22 | 2013-10-19 15:00:00 |    2 |   78 | 4-1     | 1-0
      22 | 2013-11-23 15:00:00 |    2 |   86 | 2-0     | 1-0
      22 | 2013-09-22 15:00:00 |    2 |   87 | 3-1     | 2-1
      22 | 2014-02-22 15:00:00 |    2 |   88 | 4-1     | 3-0
      22 | 2014-03-25 15:00:00 |    2 |   89 | 2-2     | 0-1
      22 | 2013-09-01 15:00:00 |    2 |   91 | 1-0     | 1-0
      22 | 2014-05-04 15:00:00 |    2 |   93 | 1-0     | 1-0
      25 | 2020-10-24 13:00:00 |    3 |   75 | 1-1     | 1-0
      25 | 2020-10-24 12:30:00 |    4 |   76 | 6-2     | 2-2
      25 | 2020-10-24 15:00:00 |    5 |   87 | 1-4     | 1-2
      25 | 2020-10-25 11:00:00 |    6 |   89 | 0-3     | 0-1
      25 | 2020-10-25 13:00:00 |    7 |   91 | 3-0     | 1-0
      30 | 2020-10-26 13:00:00 |    8 |   88 | 4-2     | 2-2
(16 rows)

Parent topic:Bulk Load Examples