问题描述
免责声明:理论问题。
这里有人问了几个有关如何区分PostgreSQL upsert $ c中插入和更新的行的问题。 $ c>语句。
Several questions here was asked about how to differentiate inserted and updated rows in the PostgreSQL upsert
statement.
这是一个简单的示例:
nd@postgres=# create table t(i int primary key, x int);
nd@postgres=# insert into t values(1,1);
nd@postgres=# insert into t values(1,11),(2,22)
on conflict(i) do update set x = excluded.i*11
returning *, xmin, xmax;
╔═══╤════╤══════╤══════╗
║ i │ x │ xmin │ xmax ║
╠═══╪════╪══════╪══════╣
║ 1 │ 11 │ 7696 │ 7696 ║
║ 2 │ 22 │ 7696 │ 0 ║
╚═══╧════╧══════╧══════╝
因此, xmax
> 0(或 xmax
= xmin
)-行已更新; xmax
= 0-已插入行。
So, xmax
> 0 (or xmax
= xmin
) - row was updated; xmax
= 0 - row was inserted.
IMO不太清楚地解释了<$ c的含义$ c> xmin 和 xmax
列。
IMO It is not too clear explained the meanings of the xmin
and xmax
columns here.
是否可以将逻辑基于这些列?
Is it possible to base the logic on those columns? Is there any more significant explanation about system columns (except the source code)?
最后我对更新/插入的行是否正确呢?
And finally is my guess right about updated/inserted rows?
推荐答案
我认为这是一个有趣的问题,值得深入解答。
I think that this is an interesting question that deserves an in-depth answer; please bear with me if it is a little bit lengthy.
简而言之:您的猜测是正确的,您可以使用以下 RETURNING
子句,以确定是否插入了行且未更新该行:
In short: Your guess is right, and you can use the following RETURNING
clause to determine if the row was inserted and not updated:
RETURNING (xmax = 0) AS inserted
现在详细说明:
当行已更新,PostgreSQL不会修改数据,而是创建该行的新版本;当不再需要该旧版本时,它会被 autovacuum 删除。行的一个版本称为 tuple ,因此在PostgreSQL中每行可以有多个元组。
When a row is updated, PostgreSQL does not modify the data, but creates a new version of the row; the old version will be deleted by autovacuum when it is no longer needed. A version of a row is called a tuple, so in PostgreSQL there can be more than one tuples per row.
xmax
有两个不同的用途:
-
如文档中所述,可以是事务删除(或更新)元组( tuple是 row的另一个词)的交易的ID。只有交易ID在
xmin
和xmax
之间的交易才能看到该元组。如果没有事务ID小于xmax
的事务,则可以安全删除旧元组。
As stated in the documentation, it can be the transaction ID of the transaction that deleted (or updated) the tuple (“tuple” is another word for “row”). Only transactions with a transaction ID between
xmin
andxmax
can see the tuple. An old tuple can be deleted safely if there is no transaction with a transaction ID less thanxmax
.
xmax
也用于存储行锁。在PostgreSQL中,行锁不存储在锁表中,而是存储在元组中,以避免锁表溢出。
如果只有一个事务在行上具有锁,则 xmax
将包含锁定交易的交易ID。如果有多个事务对该行进行了锁定,则 xmax
包含所谓的 multixact 的编号,该数据结构是包含锁定交易的交易ID。
xmax
is also used to store row locks. In PostgreSQL, row locks are not stored in the lock table, but in the tuple to avoid overflow of the lock table.
If only one transaction has a lock on the row, xmax
will contain the transaction ID of the locking transaction. If more than one transaction has a lock on the row, xmax
contains the number of a so-called multixact, which is a data structure that in turn contains the transaction IDs of the locking transactions.
xmax $的文档c $ c>不完整,因为此字段的确切含义被认为是实现细节,并且在不知道元组的
t_infomask
时无法理解,该值无法通过立刻看到SQL。
The documentation of xmax
is not complete, because the exact meaning of this field is considered an implementation detail and cannot be understood without knowing t_infomask
of the tuple, which is not immediately visible via SQL.
您可以安装contrib模块 pageinspect
来查看元组的此字段和其他字段。
You can install the contrib module pageinspect
to view this and other fields of a tuple.
我运行了您的示例,这就是我使用 heap_page_items
函数检查详细信息时看到的内容(交易ID号
I ran your example, and this is what I see when I use the heap_page_items
function to examine details (the transaction ID numbers are of course different in my case):
SELECT *, ctid, xmin, xmax FROM t;
┌───┬────┬───────┬────────┬────────┐
│ i │ x │ ctid │ xmin │ xmax │
├───┼────┼───────┼────────┼────────┤
│ 1 │ 11 │ (0,2) │ 102508 │ 102508 │
│ 2 │ 22 │ (0,3) │ 102508 │ 0 │
└───┴────┴───────┴────────┴────────┘
(2 rows)
SELECT lp, lp_off, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) AS t_infomask, to_hex(t_infomask2) AS t_infomask2
FROM heap_page_items(get_raw_page('laurenz.t', 0));
┌────┬────────┬────────┬────────┬────────┬────────────┬─────────────┐
│ lp │ lp_off │ t_xmin │ t_xmax │ t_ctid │ t_infomask │ t_infomask2 │
├────┼────────┼────────┼────────┼────────┼────────────┼─────────────┤
│ 1 │ 8160 │ 102507 │ 102508 │ (0,2) │ 500 │ 4002 │
│ 2 │ 8128 │ 102508 │ 102508 │ (0,2) │ 2190 │ 8002 │
│ 3 │ 8096 │ 102508 │ 0 │ (0,3) │ 900 │ 2 │
└────┴────────┴────────┴────────┴────────┴────────────┴─────────────┘
(3 rows)
含义 t_infomask
和 t_infomask2
的内容可以在 src / include / access / htup_details.h 。
lp_off
是页面中元组数据的偏移量,而 t_ctid
是当前元组ID ,由页码和页内的元组编号组成。由于该表是新创建的,因此所有数据都在页面0中。
The meanings of t_infomask
and t_infomask2
can be found in src/include/access/htup_details.h
. lp_off
is the offset of the tuple data in the page, and t_ctid
is the current tuple ID which consists of the page number and a tuple number within the page. Since the table was newly created, all data are in page 0.
让我讨论 heap_page_items
。
-
在线指针处(
lp
)1我们找到旧的,更新的元组。它最初具有ctid =(0,1)
,但是在更新过程中已对其进行了修改,以包含当前版本的元组ID。元组由事务102507创建,并由事务102508(发出INSERT ... ON CONFLICT
的事务)使之无效。该元组不再可见,在VACUUM
期间将被删除。
At line pointer (
lp
) 1 we find the old, updated tuple. It originally hadctid = (0,1)
, but that got modified to contain the tuple ID of the current version during update. The Tuple was created by transaction 102507 and invalidated by transaction 102508 (the transaction that issued theINSERT ... ON CONFLICT
). This tuple is not visible any more an will get removed duringVACUUM
.
t_infomask
显示 xmin
和 xmax
都属于已提交的事务,因此显示创建元组的时间并删除。 t_infomask2
显示该元组已使用HOT(仅堆元组)更新进行了更新,这意味着更新后的元组与原始元组且未修改任何索引列(请参见 src / backend / access / heap / README.HOT
)。
t_infomask
shows that both xmin
and xmax
belong to committed transactions and consequently show when the tuples was created and deleted. t_infomask2
shows that the tuple was updated with a HOT (heap only tuple) update, which means that the updated tuple is in the same page as the original tuple and no indexed column was modified (see src/backend/access/heap/README.HOT
).
在第2行指针处,我们看到由事务 INSERT ... ON CONFLICT
(事务102508)创建的新的,更新的元组。
At line pointer 2 we see the new, updated tuple that was created by transaction the INSERT ... ON CONFLICT
(transaction 102508).
t_infomask
显示此元组是更新的结果, xmin
是有效,并且 xmax
包含 KEY SHARE
行锁(由于交易完成,该锁不再适用)。该行锁定是在 INSERT ... ON CONFLICT
处理期间获取的。 t_infomask2
表明这是一个HOT元组。
t_infomask
shows that this tuple is the result of an update, xmin
is valid, and xmax
contains a KEY SHARE
row lock (which is no longer relevant since the transaction has completed). This row lock was taken during INSERT ... ON CONFLICT
processing. t_infomask2
shows that this is a HOT tuple.
在第3行指针处,我们看到了新插入的行。
At line pointer 3 we see the newly inserted row.
t_infomask
显示 xmin
是有效的,而 xmax
无效。 xmax
设置为0,因为此值始终用于新插入的元组。
t_infomask
shows that xmin
is valid and xmax
is invalid. xmax
is set to 0 because this value is always used for newly inserted tuples.
因此,更新行的非零 xmax
是由行锁引起的实现工件。可以想象有一天会重新实现 INSERT ... ON CONFLICT
,以便改变这种行为,但我认为这不太可能。
So the nonzero xmax
of the updated row is an implementation artifact caused by a row lock. It is conceivable that INSERT ... ON CONFLICT
is reimplemented one day so that this behaviour changes, but I think that is unlikely.
这篇关于PostgreSQL Upsert使用系统列XMIN,XMAX和其他来区分插入和更新的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!