本文介绍了PostgreSQL Upsert 使用系统列 XMIN、XMAX 等区分插入和更新的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

免责声明:理论问题.

这里问了几个关于如何区分 PostgreSQL upsert 语句中插入和更新的行的问题.

Several questions here was asked about how to differentiate inserted and updated rows in the PostgreSQL upsert statement.

这是一个简单的例子:

create table t(i int primary key, x int);
insert into t values(1,1);
insert into t values(1,11),(2,22)
    on conflict(i) do update set x = excluded.i*11
    returning *, xmin, xmax;

╔═══╤════╤══════╤══════╗
║ i │ x  │ xmin │ xmax ║
╠═══╪════╪══════╪══════╣
║ 1 │ 11 │ 7696 │ 7696 ║
║ 2 │ 22 │ 7696 │    0 ║
╚═══╧════╧══════╧══════╝

所以,xmax >0 (或 xmax = xmin) - 行被更新;xmax = 0 - 已插入行.

So, xmax > 0 (or xmax = xmin) - row was updated; xmax = 0 - row was inserted.

IMO 对xminxmax 列的含义的解释不太清楚此处.

IMO It is not too clear explained the meanings of the xmin and xmax columns here.

是否可以将逻辑基于这些列?关于系统列有没有更有意义的解释(源代码除外)?

Is it possible to base the logic on those columns? Is there any more significant explanation about system columns (except the source code)?

最后,我对更新/插入的行的猜测是否正确?

And finally is my guess right about updated/inserted rows?

推荐答案

我认为这是一个值得深入回答的有趣问题;如果有点长,请耐心等待.

I think that this is an interesting question that deserves an in-depth answer; please bear with me if it is a little bit lengthy.

简而言之:您的猜测是对的,您可以使用以下 RETURNING 子句来确定该行是否已插入且未更新:

In short: Your guess is right, and you can use the following RETURNING clause to determine if the row was inserted and not updated:

RETURNING (xmax = 0) AS inserted

现在详细解释:

当一行被更新时,PostgreSQL 不会修改数据,而是创建该行的新版本;当不再需要旧版本时,autovacuum 将删除它.行的一个版本称为元组,因此在 PostgreSQL 中每行可以有多个元组.

When a row is updated, PostgreSQL does not modify the data, but creates a new version of the row; the old version will be deleted by autovacuum when it is no longer needed. A version of a row is called a tuple, so in PostgreSQL there can be more than one tuples per row.

xmax 有两个不同的用途:

  1. 如文档中所述,它可以是删除(或更新)元组的事务的事务 ID(元组"是行"的另一个词).只有事务 ID 介于 xminxmax 之间的事务才能看到元组.如果没有事务 ID 小于 xmax 的事务,则可以安全地删除旧元组.

  1. As stated in the documentation, it can be the transaction ID of the transaction that deleted (or updated) the tuple ("tuple" is another word for "row"). Only transactions with a transaction ID between xmin and xmax can see the tuple. An old tuple can be deleted safely if there is no transaction with a transaction ID less than xmax.

xmax 也用于存储行锁.在PostgreSQL中,行锁不是存放在锁表中,而是存放在元组中,避免锁表溢出.
如果只有一个事务在该行上锁定,xmax 将包含锁定事务的事务 ID.如果有多个事务锁定该行,xmax 包含所谓的 multixact 的编号,这是一种数据结构,依次包含锁定事务.

xmax is also used to store row locks. In PostgreSQL, row locks are not stored in the lock table, but in the tuple to avoid overflow of the lock table.
If only one transaction has a lock on the row, xmax will contain the transaction ID of the locking transaction. If more than one transaction has a lock on the row, xmax contains the number of a so-called multixact, which is a data structure that in turn contains the transaction IDs of the locking transactions.

xmax 的文档并不完整,因为这个字段的确切含义被认为是一个实现细节,如果不知道元组的 t_infomask 就无法理解,即不能通过 SQL 立即看到.

The documentation of xmax is not complete, because the exact meaning of this field is considered an implementation detail and cannot be understood without knowing t_infomask of the tuple, which is not immediately visible via SQL.

您可以安装 contrib 模块 pageinspect 来查看元组的此字段和其他字段.

You can install the contrib module pageinspect to view this and other fields of a tuple.

我运行了您的示例,这就是我使用 heap_page_items 函数检查详细信息时看到的内容(交易 ID 号在我的情况下当然不同):

I ran your example, and this is what I see when I use the heap_page_items function to examine details (the transaction ID numbers are of course different in my case):

SELECT *, ctid, xmin, xmax FROM t;

┌───┬────┬───────┬────────┬────────┐
│ i │ x  │ ctid  │  xmin  │  xmax  │
├───┼────┼───────┼────────┼────────┤
│ 1 │ 11 │ (0,2) │ 102508 │ 102508 │
│ 2 │ 22 │ (0,3) │ 102508 │      0 │
└───┴────┴───────┴────────┴────────┘
(2 rows)

SELECT lp, lp_off, t_xmin, t_xmax, t_ctid,
       to_hex(t_infomask) AS t_infomask, to_hex(t_infomask2) AS t_infomask2
FROM heap_page_items(get_raw_page('laurenz.t', 0));

┌────┬────────┬────────┬────────┬────────┬────────────┬─────────────┐
│ lp │ lp_off │ t_xmin │ t_xmax │ t_ctid │ t_infomask │ t_infomask2 │
├────┼────────┼────────┼────────┼────────┼────────────┼─────────────┤
│  1 │   8160 │ 102507 │ 102508 │ (0,2)  │ 500        │ 4002        │
│  2 │   8128 │ 102508 │ 102508 │ (0,2)  │ 2190       │ 8002        │
│  3 │   8096 │ 102508 │      0 │ (0,3)  │ 900        │ 2           │
└────┴────────┴────────┴────────┴────────┴────────────┴─────────────┘
(3 rows)

t_infomaskt_infomask2的含义可以在src/include/access/htup_details.h中找到.lp_off是元组数据在页面中的偏移量,t_ctid当前元组ID,由页码和元组号组成页面内.由于表是新创建的,所有数据都在第0页.

The meanings of t_infomask and t_infomask2 can be found in src/include/access/htup_details.h. lp_off is the offset of the tuple data in the page, and t_ctid is the current tuple ID which consists of the page number and a tuple number within the page. Since the table was newly created, all data are in page 0.

让我讨论 heap_page_items 返回的三行.

Let me discuss the three rows returned by heap_page_items.

  1. 行指针 (lp) 1 我们找到旧的、更新的元组.它最初具有 ctid = (0,1),但在更新期间被修改为包含当前版本的元组 ID.元组由事务 102507 创建并由事务 102508(发出 INSERT ... ON CONFLICT 的事务)无效.这个元组不再可见,将在 VACUUM 期间被删除.

  1. At line pointer (lp) 1 we find the old, updated tuple. It originally had ctid = (0,1), but that got modified to contain the tuple ID of the current version during update. The Tuple was created by transaction 102507 and invalidated by transaction 102508 (the transaction that issued the INSERT ... ON CONFLICT). This tuple is not visible any more an will get removed during VACUUM.

t_infomask 显示 xminxmax 都属于已提交的事务,因此会显示元组的创建和删除时间.t_infomask2 显示元组被更新为 HOT(heap only tuple)更新,这意味着更新的元组与原始元组在同一页并且没有索引列已修改(请参阅 src/backend/access/heap/README.HOT).

t_infomask shows that both xmin and xmax belong to committed transactions and consequently show when the tuples was created and deleted. t_infomask2 shows that the tuple was updated with a HOT (heap only tuple) update, which means that the updated tuple is in the same page as the original tuple and no indexed column was modified (see src/backend/access/heap/README.HOT).

在第 2 行指针处,我们看到由事务 INSERT ... ON CONFLICT(事务 102508)创建的新的更新元组.

At line pointer 2 we see the new, updated tuple that was created by transaction the INSERT ... ON CONFLICT (transaction 102508).

t_infomask 显示这个元组是更新的结果,xmin 是有效的,并且 xmax 包含一个 KEY SHARE 行锁(自从事务完成后不再相关).这个行锁是在 INSERT ... ON CONFLICT 处理过程中取得的.t_infomask2 表明这是一个 HOT 元组.

t_infomask shows that this tuple is the result of an update, xmin is valid, and xmax contains a KEY SHARE row lock (which is no longer relevant since the transaction has completed). This row lock was taken during INSERT ... ON CONFLICT processing. t_infomask2 shows that this is a HOT tuple.

在第 3 行指针处,我们看到新插入的行.

At line pointer 3 we see the newly inserted row.

t_infomask 显示 xmin 有效,xmax 无效.xmax 设置为 0,因为此值始终用于新插入的元组.

t_infomask shows that xmin is valid and xmax is invalid. xmax is set to 0 because this value is always used for newly inserted tuples.

所以更新行的非零 xmax 是由行锁引起的实现工件.可以想象,INSERT ... ON CONFLICT 有一天会被重新实现,因此这种行为会发生变化,但我认为这不太可能.

So the nonzero xmax of the updated row is an implementation artifact caused by a row lock. It is conceivable that INSERT ... ON CONFLICT is reimplemented one day so that this behaviour changes, but I think that is unlikely.

这篇关于PostgreSQL Upsert 使用系统列 XMIN、XMAX 等区分插入和更新的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-05 02:53