问题描述
免责声明:理论问题.
这里问了几个关于如何区分 PostgreSQL upsert
语句中插入和更新的行的问题.
Several questions here was asked about how to differentiate inserted and updated rows in the PostgreSQL upsert
statement.
这是一个简单的例子:
create table t(i int primary key, x int);
insert into t values(1,1);
insert into t values(1,11),(2,22)
on conflict(i) do update set x = excluded.i*11
returning *, xmin, xmax;
╔═══╤════╤══════╤══════╗
║ i │ x │ xmin │ xmax ║
╠═══╪════╪══════╪══════╣
║ 1 │ 11 │ 7696 │ 7696 ║
║ 2 │ 22 │ 7696 │ 0 ║
╚═══╧════╧══════╧══════╝
所以,xmax
>0 (或 xmax
= xmin
) - 行被更新;xmax
= 0 - 已插入行.
So, xmax
> 0 (or xmax
= xmin
) - row was updated; xmax
= 0 - row was inserted.
IMO 对xmin
和xmax
列的含义的解释不太清楚此处.
IMO It is not too clear explained the meanings of the xmin
and xmax
columns here.
是否可以将逻辑基于这些列?关于系统列有没有更有意义的解释(源代码除外)?
Is it possible to base the logic on those columns? Is there any more significant explanation about system columns (except the source code)?
最后,我对更新/插入的行的猜测是否正确?
And finally is my guess right about updated/inserted rows?
推荐答案
我认为这是一个值得深入回答的有趣问题;如果有点长,请耐心等待.
I think that this is an interesting question that deserves an in-depth answer; please bear with me if it is a little bit lengthy.
简而言之:您的猜测是对的,您可以使用以下 RETURNING
子句来确定该行是否已插入且未更新:
In short: Your guess is right, and you can use the following RETURNING
clause to determine if the row was inserted and not updated:
RETURNING (xmax = 0) AS inserted
现在详细解释:
当一行被更新时,PostgreSQL 不会修改数据,而是创建该行的新版本;当不再需要旧版本时,autovacuum 将删除它.行的一个版本称为元组,因此在 PostgreSQL 中每行可以有多个元组.
When a row is updated, PostgreSQL does not modify the data, but creates a new version of the row; the old version will be deleted by autovacuum when it is no longer needed. A version of a row is called a tuple, so in PostgreSQL there can be more than one tuples per row.
xmax
有两个不同的用途:
如文档中所述,它可以是删除(或更新)元组的事务的事务 ID(元组"是行"的另一个词).只有事务 ID 介于
xmin
和xmax
之间的事务才能看到元组.如果没有事务 ID 小于xmax
的事务,则可以安全地删除旧元组.
As stated in the documentation, it can be the transaction ID of the transaction that deleted (or updated) the tuple ("tuple" is another word for "row"). Only transactions with a transaction ID between
xmin
andxmax
can see the tuple. An old tuple can be deleted safely if there is no transaction with a transaction ID less thanxmax
.
xmax
也用于存储行锁.在PostgreSQL中,行锁不是存放在锁表中,而是存放在元组中,避免锁表溢出.
如果只有一个事务在该行上锁定,xmax
将包含锁定事务的事务 ID.如果有多个事务锁定该行,xmax
包含所谓的 multixact 的编号,这是一种数据结构,依次包含锁定事务.
xmax
is also used to store row locks. In PostgreSQL, row locks are not stored in the lock table, but in the tuple to avoid overflow of the lock table.
If only one transaction has a lock on the row, xmax
will contain the transaction ID of the locking transaction. If more than one transaction has a lock on the row, xmax
contains the number of a so-called multixact, which is a data structure that in turn contains the transaction IDs of the locking transactions.
xmax
的文档并不完整,因为这个字段的确切含义被认为是一个实现细节,如果不知道元组的 t_infomask
就无法理解,即不能通过 SQL 立即看到.
The documentation of xmax
is not complete, because the exact meaning of this field is considered an implementation detail and cannot be understood without knowing t_infomask
of the tuple, which is not immediately visible via SQL.
您可以安装 contrib 模块 pageinspect
来查看元组的此字段和其他字段.
You can install the contrib module pageinspect
to view this and other fields of a tuple.
我运行了您的示例,这就是我使用 heap_page_items
函数检查详细信息时看到的内容(交易 ID 号在我的情况下当然不同):
I ran your example, and this is what I see when I use the heap_page_items
function to examine details (the transaction ID numbers are of course different in my case):
SELECT *, ctid, xmin, xmax FROM t;
┌───┬────┬───────┬────────┬────────┐
│ i │ x │ ctid │ xmin │ xmax │
├───┼────┼───────┼────────┼────────┤
│ 1 │ 11 │ (0,2) │ 102508 │ 102508 │
│ 2 │ 22 │ (0,3) │ 102508 │ 0 │
└───┴────┴───────┴────────┴────────┘
(2 rows)
SELECT lp, lp_off, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) AS t_infomask, to_hex(t_infomask2) AS t_infomask2
FROM heap_page_items(get_raw_page('laurenz.t', 0));
┌────┬────────┬────────┬────────┬────────┬────────────┬─────────────┐
│ lp │ lp_off │ t_xmin │ t_xmax │ t_ctid │ t_infomask │ t_infomask2 │
├────┼────────┼────────┼────────┼────────┼────────────┼─────────────┤
│ 1 │ 8160 │ 102507 │ 102508 │ (0,2) │ 500 │ 4002 │
│ 2 │ 8128 │ 102508 │ 102508 │ (0,2) │ 2190 │ 8002 │
│ 3 │ 8096 │ 102508 │ 0 │ (0,3) │ 900 │ 2 │
└────┴────────┴────────┴────────┴────────┴────────────┴─────────────┘
(3 rows)
t_infomask
和t_infomask2
的含义可以在src/include/access/htup_details.h
中找到.lp_off
是元组数据在页面中的偏移量,t_ctid
是当前元组ID,由页码和元组号组成页面内.由于表是新创建的,所有数据都在第0页.
The meanings of t_infomask
and t_infomask2
can be found in src/include/access/htup_details.h
. lp_off
is the offset of the tuple data in the page, and t_ctid
is the current tuple ID which consists of the page number and a tuple number within the page. Since the table was newly created, all data are in page 0.
让我讨论 heap_page_items
返回的三行.
Let me discuss the three rows returned by heap_page_items
.
在行指针 (
lp
) 1 我们找到旧的、更新的元组.它最初具有ctid = (0,1)
,但在更新期间被修改为包含当前版本的元组 ID.元组由事务 102507 创建并由事务 102508(发出INSERT ... ON CONFLICT
的事务)无效.这个元组不再可见,将在VACUUM
期间被删除.
At line pointer (
lp
) 1 we find the old, updated tuple. It originally hadctid = (0,1)
, but that got modified to contain the tuple ID of the current version during update. The Tuple was created by transaction 102507 and invalidated by transaction 102508 (the transaction that issued theINSERT ... ON CONFLICT
). This tuple is not visible any more an will get removed duringVACUUM
.
t_infomask
显示 xmin
和 xmax
都属于已提交的事务,因此会显示元组的创建和删除时间.t_infomask2
显示元组被更新为 HOT(heap only tuple)更新,这意味着更新的元组与原始元组在同一页并且没有索引列已修改(请参阅 src/backend/access/heap/README.HOT
).
t_infomask
shows that both xmin
and xmax
belong to committed transactions and consequently show when the tuples was created and deleted. t_infomask2
shows that the tuple was updated with a HOT (heap only tuple) update, which means that the updated tuple is in the same page as the original tuple and no indexed column was modified (see src/backend/access/heap/README.HOT
).
在第 2 行指针处,我们看到由事务 INSERT ... ON CONFLICT
(事务 102508)创建的新的更新元组.
At line pointer 2 we see the new, updated tuple that was created by transaction the INSERT ... ON CONFLICT
(transaction 102508).
t_infomask
显示这个元组是更新的结果,xmin
是有效的,并且 xmax
包含一个 KEY SHARE
行锁(自从事务完成后不再相关).这个行锁是在 INSERT ... ON CONFLICT
处理过程中取得的.t_infomask2
表明这是一个 HOT 元组.
t_infomask
shows that this tuple is the result of an update, xmin
is valid, and xmax
contains a KEY SHARE
row lock (which is no longer relevant since the transaction has completed). This row lock was taken during INSERT ... ON CONFLICT
processing. t_infomask2
shows that this is a HOT tuple.
在第 3 行指针处,我们看到新插入的行.
At line pointer 3 we see the newly inserted row.
t_infomask
显示 xmin
有效,xmax
无效.xmax
设置为 0,因为此值始终用于新插入的元组.
t_infomask
shows that xmin
is valid and xmax
is invalid. xmax
is set to 0 because this value is always used for newly inserted tuples.
所以更新行的非零 xmax
是由行锁引起的实现工件.可以想象,INSERT ... ON CONFLICT
有一天会被重新实现,因此这种行为会发生变化,但我认为这不太可能.
So the nonzero xmax
of the updated row is an implementation artifact caused by a row lock. It is conceivable that INSERT ... ON CONFLICT
is reimplemented one day so that this behaviour changes, but I think that is unlikely.
这篇关于PostgreSQL Upsert 使用系统列 XMIN、XMAX 等区分插入和更新的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!