本文介绍了PostgreSQL Upsert使用系统列XMIN,XMAX和其他来区分插入和更新的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

免责声明:理论问题。

这里有人问了几个有关如何区分PostgreSQL upsert 语句。

Several questions here was asked about how to differentiate inserted and updated rows in the PostgreSQL upsert statement.

这是一个简单的示例:

nd@postgres=# create table t(i int primary key, x int);
nd@postgres=# insert into t values(1,1);
nd@postgres=# insert into t values(1,11),(2,22)
  on conflict(i) do update set x = excluded.i*11
  returning *, xmin, xmax;
╔═══╤════╤══════╤══════╗
║ i │ x  │ xmin │ xmax ║
╠═══╪════╪══════╪══════╣
║ 1 │ 11 │ 7696 │ 7696 ║
║ 2 │ 22 │ 7696 │    0 ║
╚═══╧════╧══════╧══════╝

因此, xmax > 0(或 xmax = xmin )-行已更新; xmax = 0-已插入行。

So, xmax > 0 (or xmax = xmin) - row was updated; xmax = 0 - row was inserted.

IMO不太清楚地解释了<$ c的含义$ c> xmin xmax 列。

IMO It is not too clear explained the meanings of the xmin and xmax columns here.

是否可以将逻辑基于这些列?

Is it possible to base the logic on those columns? Is there any more significant explanation about system columns (except the source code)?

最后我对更新/插入的行是否正确呢?

And finally is my guess right about updated/inserted rows?

推荐答案

我认为这是一个有趣的问题,值得深入解答。

I think that this is an interesting question that deserves an in-depth answer; please bear with me if it is a little bit lengthy.

简而言之:您的猜测是正确的,您可以使用以下 RETURNING 子句,以确定是否插入了行且未更新该行:

In short: Your guess is right, and you can use the following RETURNING clause to determine if the row was inserted and not updated:

RETURNING (xmax = 0) AS inserted

现在详细说明:

当行已更新,PostgreSQL不会修改数据,而是创建该行的新版本;当不再需要该旧版本时,它会被 autovacuum 删除。行的一个版本称为 tuple ,因此在PostgreSQL中每行可以有多个元组。

When a row is updated, PostgreSQL does not modify the data, but creates a new version of the row; the old version will be deleted by autovacuum when it is no longer needed. A version of a row is called a tuple, so in PostgreSQL there can be more than one tuples per row.

xmax 有两个不同的用途:


  1. 如文档中所述,可以是事务删除(或更新)元组( tuple是 row的另一个词)的交易的ID。只有交易ID在 xmin xmax 之间的交易才能看到该元组。如果没有事务ID小于 xmax 的事务,则可以安全删除旧元组。

  1. As stated in the documentation, it can be the transaction ID of the transaction that deleted (or updated) the tuple (“tuple” is another word for “row”). Only transactions with a transaction ID between xmin and xmax can see the tuple. An old tuple can be deleted safely if there is no transaction with a transaction ID less than xmax.

xmax 也用于存储行锁。在PostgreSQL中,行锁不存储在锁表中,而是存储在元组中,以避免锁表溢出。

如果只有一个事务在行上具有锁,则 xmax 将包含锁定交易的交易ID。如果有多个事务对该行进行了锁定,则 xmax 包含所谓的 multixact 的编号,该数据结构是包含锁定交易的交易ID。

xmax is also used to store row locks. In PostgreSQL, row locks are not stored in the lock table, but in the tuple to avoid overflow of the lock table.
If only one transaction has a lock on the row, xmax will contain the transaction ID of the locking transaction. If more than one transaction has a lock on the row, xmax contains the number of a so-called multixact, which is a data structure that in turn contains the transaction IDs of the locking transactions.

xmax 不完整,因为此字段的确切含义被认为是实现细节,并且在不知道元组的 t_infomask 时无法理解,该值无法通过立刻看到SQL。

The documentation of xmax is not complete, because the exact meaning of this field is considered an implementation detail and cannot be understood without knowing t_infomask of the tuple, which is not immediately visible via SQL.

您可以安装contrib模块 pageinspect 来查看元组的此字段和其他字段。

You can install the contrib module pageinspect to view this and other fields of a tuple.

我运行了您的示例,这就是我使用 heap_page_items 函数检查详细信息时看到的内容(交易ID号

I ran your example, and this is what I see when I use the heap_page_items function to examine details (the transaction ID numbers are of course different in my case):

SELECT *, ctid, xmin, xmax FROM t;

┌───┬────┬───────┬────────┬────────┐
│ i │ x  │ ctid  │  xmin  │  xmax  │
├───┼────┼───────┼────────┼────────┤
│ 1 │ 11 │ (0,2) │ 102508 │ 102508 │
│ 2 │ 22 │ (0,3) │ 102508 │      0 │
└───┴────┴───────┴────────┴────────┘
(2 rows)

SELECT lp, lp_off, t_xmin, t_xmax, t_ctid,
       to_hex(t_infomask) AS t_infomask, to_hex(t_infomask2) AS t_infomask2
FROM heap_page_items(get_raw_page('laurenz.t', 0));

┌────┬────────┬────────┬────────┬────────┬────────────┬─────────────┐
│ lp │ lp_off │ t_xmin │ t_xmax │ t_ctid │ t_infomask │ t_infomask2 │
├────┼────────┼────────┼────────┼────────┼────────────┼─────────────┤
│  1 │   8160 │ 102507 │ 102508 │ (0,2)  │ 500        │ 4002        │
│  2 │   8128 │ 102508 │ 102508 │ (0,2)  │ 2190       │ 8002        │
│  3 │   8096 │ 102508 │      0 │ (0,3)  │ 900        │ 2           │
└────┴────────┴────────┴────────┴────────┴────────────┴─────────────┘
(3 rows)

含义 t_infomask t_infomask2 的内容可以在 src / include / access / htup_details.h 。 lp_off 是页面中元组数据的偏移量,而 t_ctid 当前元组ID ,由页码和页内的元组编号组成。由于该表是新创建的,因此所有数据都在页面0中。

The meanings of t_infomask and t_infomask2 can be found in src/include/access/htup_details.h. lp_off is the offset of the tuple data in the page, and t_ctid is the current tuple ID which consists of the page number and a tuple number within the page. Since the table was newly created, all data are in page 0.

让我讨论 heap_page_items 。


  1. 线指针处 lp )1我们找到旧的,更新的元组。它最初具有 ctid =(0,1),但是在更新过程中已对其进行了修改,以包含当前版本的元组ID。元组由事务102507创建,并由事务102508(发出 INSERT ... ON CONFLICT 的事务)使之无效。该元组不再可见,在 VACUUM 期间将被删除。

  1. At line pointer (lp) 1 we find the old, updated tuple. It originally had ctid = (0,1), but that got modified to contain the tuple ID of the current version during update. The Tuple was created by transaction 102507 and invalidated by transaction 102508 (the transaction that issued the INSERT ... ON CONFLICT). This tuple is not visible any more an will get removed during VACUUM.

t_infomask 显示 xmin xmax 都属于已提交的事务,因此显示创建元组的时间并删除。 t_infomask2 显示该元组已使用HOT(仅堆元组)更新进行了更新,这意味着更新后的元组与原始元组且未修改任何索引列(请参见 src / backend / access / heap / README.HOT )。

t_infomask shows that both xmin and xmax belong to committed transactions and consequently show when the tuples was created and deleted. t_infomask2 shows that the tuple was updated with a HOT (heap only tuple) update, which means that the updated tuple is in the same page as the original tuple and no indexed column was modified (see src/backend/access/heap/README.HOT).

在第2行指针处,我们看到由事务 INSERT ... ON CONFLICT (事务102508)创建的新的,更新的元组。

At line pointer 2 we see the new, updated tuple that was created by transaction the INSERT ... ON CONFLICT (transaction 102508).

t_infomask 显示此元组是更新的结果, xmin 是有效,并且 xmax 包含 KEY SHARE 行锁(由于交易完成,该锁不再适用)。该行锁定是在 INSERT ... ON CONFLICT 处理期间获取的。 t_infomask2 表明这是一个HOT元组。

t_infomask shows that this tuple is the result of an update, xmin is valid, and xmax contains a KEY SHARE row lock (which is no longer relevant since the transaction has completed). This row lock was taken during INSERT ... ON CONFLICT processing. t_infomask2 shows that this is a HOT tuple.

在第3行指针处,我们看到了新插入的行。

At line pointer 3 we see the newly inserted row.

t_infomask 显示 xmin 是有效的,而 xmax 无效。 xmax 设置为0,因为此值始终用于新插入的元组。

t_infomask shows that xmin is valid and xmax is invalid. xmax is set to 0 because this value is always used for newly inserted tuples.

因此,更新行的非零 xmax 是由行锁引起的实现工件。可以想象有一天会重新实现 INSERT ... ON CONFLICT ,以便改变这种行为,但我认为这不太可能。

So the nonzero xmax of the updated row is an implementation artifact caused by a row lock. It is conceivable that INSERT ... ON CONFLICT is reimplemented one day so that this behaviour changes, but I think that is unlikely.

这篇关于PostgreSQL Upsert使用系统列XMIN,XMAX和其他来区分插入和更新的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-05 02:41