本文介绍了有没有一种方法可以跟踪或获取由于BatchUpdateException而导致JPA在失败之前完成的批处理迭代的总数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Spring-JPA(休眠)来持久化N个实体,并且我已将我的春季批处理大小设置为M,其中M< N.

I have a requirement of persisting N entities, using Spring-JPA (Hibernate) and I have set my spring batch size = M, where M < N.

我将所有N个实体提交到存储库,并且遵循以下逻辑

I will submit all N entities to the repository and it follows below logic

entities.forEach(entity->entityManager.persist(entity));
entityManager.flush();

整个操作由@Transactional包装.

The entire operation is wrapped by @Transactional.

基于 https://vladmihalcea.com/how-to-find-which-statement-failed-in-a-jdbc-batch-update ,它给我带来了更好的结果,但是挑战在于,BatchUpdateException.getUpdateCounts()提供了总计在每个批处理操作中保持不变,但不包括失败之前的所有内部迭代在内的总体计数.

Based on https://vladmihalcea.com/how-to-find-which-statement-failed-in-a-jdbc-batch-update, It is giving me better results, but the challenge is , BatchUpdateException.getUpdateCounts() gives the total persisted in each batch operation, but not overall count including all internal iterations before failing.

例如,如果我需要保留100个实体,则春季批处理大小为5

For eg, if I need to persist 100 entities, with spring batch size = 5

spring.jpa.properties.hibernate.jdbc.batch_size=5

,并且13条记录是不良记录,导致失败. BatchUpdateException.getUpdateCounts()返回2,这是因为它在批处理周期的第3次迭代中失败.相反,我想获得12次成功插入的计数.是否有任何API或某种方式来跟踪此操作,而无需在外部进行跟踪(这会多次调用flush来破坏我的目的)

and 13 record is a bad record causing a failure. BatchUpdateException.getUpdateCounts() returns 2, that is because it failed in the 3rd iteration of batch cycle. Instead, I would like to get the count like 12 successful inserts. Is there any API or some way of tracking this, without keeping track externally, (this will defeat my purpose, by calling flush multiple times)

AtomicInteger ai = new AtomicInteger(0);
entities.forEach(entity->{ entityManager.persist(entity); 
                           ai.getAndIncrement();
                           if(ai.get() % batchsize){
                               entityManager.flush();
                           });
entityManager.flush();

谢谢

推荐答案

有关使用Hibernate批量插入Oracle 12的一些新闻.好人第一.

There are several news concerning batch insert to Oracle 12 using Hibernate. Good one first.

休眠Oracle 12批处理插入

如果设置属性

 <property name="hibernate.jdbc.batch_size" value="3"/>

识别它有点棘手,因为Hibernate日志记录与普通模式日志记录没有区别.可能由于Oracle没有语法将值的集合传递给INSERT的事实,您会看到单个插入语句的日志

It is a bit tricky to recognise it, as the Hibernate logging does not differ from the normal mode logging. Probably due to the fact that Oracle has no syntax to pass a collection of values to the INSERT you see the log of single insert statements

 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)
 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)
 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)

但是通过检查Oracle 10046跟踪,您可以看到INSERT游标的每次执行都会处理row_size的行(请参阅EXEC跟踪行中的参数r = 3-批处理大小设置为3)

But by examining the Oracle 10046 trace you can see that each execution of the INSERT cursor process the batch_size of rows (see parameter r=3 in EXEC trace line - batch size is set to 3)

 PARSING IN CURSOR #347407728 ..
 insert into AUTHOR (name, AUTHOR_ID) values (:1 , :2 )
 END OF STMT 

 EXEC #347407728:....,r=3,...

请注意,不幸的是,您不能在批处理模式下将IDENTITY列用作主键

Note that unfortunately you can't use the IDENTITY column for the primary key in batch mode

  AUTHOR_ID INT  GENERATED ALWAYS AS IDENTITY PRIMARY KEY,

使用IDENTITY将关闭批处理模式.

using IDENTITY will turn off the batch mode.

getUpdateCount

第二个好消息是,如果您在批处理中遇到异常,则可以获取当前批处理的updateCounts-必须取消嵌套此伪代码收到的PersistenceException

The second good news is that if you get an exception in batch processing you can obtain the updateCounts of the current batch - you must unnest the PersistenceException which you receive with this pseudocode

 e.getCause().getSQLException().getUpdateCounts()

但是请注意,您需要使用Oracle 12并使用相应的JDBC驱动程序来查看确切的更新计数-在以前的版本中,您只会看到一个不确定的错误(单个负数).

Note however that you need to be on Oracle 12 and use the corresponding JDBC driver to see the exact update count - in former versions you'll see only an unspecific error (single negative number).

将它们放在一起

因此,结合这两个功能,您可以-至少在理论上-识别失败的记录

So combining those two features you can - at least in theory - identify the failed record

batch_size = 3的示例

Example for batch_size =3

您会看到6条记录的行

 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)
 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)
 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)
 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)
 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)
 Hibernate: insert into AUTHOR (name, AUTHOR_ID) values (?, ?)

即开始有2个批次,第二个批次失败,成功处理了两行

i.e. there were 2 batches started, the second batch failed with two rows successfully processed

 BatchUpdateException - update count: [1, 1]

这意味着3 + 2行正常,而第6行失败

摘要

您可能会争辩说,休眠状态的人没有做作业,并且阅读日志不是识别问题的好方法.我对此没有异议,我只能提供一些见解,您可能可以从Hibernate的作者那里听到(请注意,除了对数据库问题进行异常故障排除之外,我与Hibernate没有任何关系).

You may argue, that the Hibernate people didn't do their homework and that reading logs is not a good approach to identify the problem. I have no argument against this, I can only provide some insight, that you possibly can hear from the Hibernate authors (note that I'm in no relation to Hibernate other than exceptionally troubleshooting the database problems).

验证输入

这当然值得商,,但是在使用批处理输入时,您应该预先验证数据,这样就不会发生异常.

This is of course debatable, but while using the batch input you should pre-validate the data so no exception should occur.

刷新每个批次

您反对它,但实际上并没有实际的性能损失.每次刷新时,都会关闭并重新打开INSERT游标,但是由于Oracle游标兑现,这没什么大不了的.

You arguments against it, but actually there is no real performance penalty in it. On each flush the INSERT cursor is closed and re-opened, but due to Oracle cursor cashing this no big deal.

性能不是您的首要目标

最重要的是,在决定使用休眠模式输入批量数据时,性能绝对不是您的首要目标.您选择舒适的数据输入,并为此支付了一些性能税.

And above all, while deciding to use Hibernate for batch data entry the performance is definitively not your first goal. You opt for comfortable data entry and you pay some performance tax for it.

我的测试显示在大约50秒内存储了10万个批量为1000的简单对象的时间.这不是每个对象平均0.4毫秒的平均水平,但是使用直接的 SQL INSERT 处理10万行需要不到2秒.因此,对于单个步骤(例如,在极短的时间范围内进行迁移和升级),您可以通过使用直接JDBC或事件SQL来获利.

My test shows an elapsed time of storing 100K simple objects with batch size of 1000 in about 50 seconds. Which is not a bad average of .4 ms per object, but using a direct SQL INSERT to process the 100K rows takes below 2 seconds. So for a singular steps, such as migrations and upgrades with extreme narrow time window you could profit from using direct JDBC or event SQL.

这篇关于有没有一种方法可以跟踪或获取由于BatchUpdateException而导致JPA在失败之前完成的批处理迭代的总数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 12:29