本文介绍了JDBC批处理操作的理解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的应用程序中使用Hibernate ORM和PostgreSQL,有时我使用批处理操作。
起初我不明白为什么在批量大小= 25的日志中,会生成25个查询,并且最初认为它无法正常工作。
但之后我查看了pg驱动程序的源代码,并在PgStatement类中找到以下行:

I use Hibernate ORM and PostgreSQL in my application, and sometimes i use batch operations.And at first I didn't understand why in the logs with size of the batch = 25, 25 queries are generated, and at first thought that it does not work correctly.But after that I looked at the source code of the pg driver and found the following lines in the PgStatement class:

 public int[] executeBatch() throws SQLException {
        this.checkClosed();
        this.closeForNextExecution();
        if (this.batchStatements != null && !this.batchStatements.isEmpty()) {
            this.transformQueriesAndParameters();
//confuses next line, because we have array of identical queries
            Query[] queries = (Query[])this.batchStatements.toArray(new Query[0]);
            ParameterList[] parameterLists =
(ParameterList[])this.batchParameters.toArray(new ParameterList[0]);
            this.batchStatements.clear();
            this.batchParameters.clear();

和PgPreparedStatement类

and in PgPreparedStatement class

    public void addBatch() throws SQLException {
        checkClosed();
        if (batchStatements == null) {
          batchStatements = new ArrayList<Query>();
          batchParameters = new ArrayList<ParameterList>();
        }

        batchParameters.add(preparedParameters.copy());
        Query query = preparedQuery.query;
    //confuses next line
        if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {
          batchStatements.add(query);
        }
      }

我注意到事实证明,如果大小批次为25,
25个查询将附带参数发送。

I noticed that it turns out that if the size of the batch goes 25,25 queries are sent with the parameters attached to them.

数据库日志确认这一点,例如:

Logs of the database confirm this, for example:

2017-12-06 01:22:08.023 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_3: BEGIN
2017-12-06 01:22:08.024 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_4: select nextval ('tests_id_seq')
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '1'
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '2'
...
x23 queries with parameters
...
2017-12-06 01:22:08.063 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_5: COMMIT

但我认为必须使用25个参数的数组执行一个查询。
或者我不明白批量插入如何与预准备语句一起使用?
为什么要复制一次查询n次?

But i thought one query must be executed with an array of 25 parameters.Or I don't understand how batch inserts work with a prepared statement?Why duplicate one query n times?

毕竟,我试图在这个地方调试我的查询

After all, i tried to debug my queries on this place

if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {

并注意到我的查询总是SimpleQuery的实例而不是BatchedQuery。也许这是问题的解决方案?有关BatchedQuery的信息我找不到

and noticed that my queries are always instance of SimpleQuery instead of BatchedQuery. Maybe this is the solution to the problem? Information about BatchedQuery i couldn't find

推荐答案

可能涉及各种批处理,我将介绍PostgreSQL JDBC驱动程序( pgjdbc)它的一部分。

There might be various kinds of batching involved, and I would cover PostgreSQL JDBC driver (pgjdbc) part of it.

TL; DR:在使用批处理API的情况下,pgjdbc确实使用较少的网络回合。 仅当 reWriteBatchedInserts = true 传递给pgjdbc连接设置时才使用BatchedQuery

TL;DR: pgjdbc does use less network roundrips in case batch API is used. BatchedQuery is used only if reWriteBatchedInserts=true is passed to the pgjdbc connection settings.

您可能会找到相关(幻灯片44,...)

You might find https://www.slideshare.net/VladimirSitnikv/postgresql-and-jdbc-striving-for-high-performance relevant (slide 44,...)

在查询执行方面,网络延迟通常是经过时间的重要部分。

When it comes to query execution, network latency is often a significant part of the elapsed time.

假设案例是插入10行。

Suppose the case is to insert 10 rows.


  1. 没有批处理(例如,循环中只有 PreparedStatement#execute )。驱动程序将执行以下操作

  1. No batching (e.g. just PreparedStatement#execute in a loop). The driver would perform the following

execute query
sync <-- wait for the response from the DB
execute query
sync <-- wait for the response from the DB
execute query
sync <-- wait for the response from the DB
...

值得注意的时间将花在等待数据库上

Notable time would be spent in the "waiting for the DB"

JDBC批处理API。即 PreparedStatement#addBatch()使驱动程序能够在单个网络往返中发送多个查询执行。然而,当前的实现仍然会将大批量分成较小的批次以避免TCP死锁。

JDBC batch API. That is PreparedStatement#addBatch() enables driver to send multiple "query executions" in a single network roundtrip. Current implementation, however would still split large batches into smaller ones to avoid TCP deadlock.

行动会好得多:

execute query
...
execute query
execute query
execute query
sync <-- wait for the response from the DB


  • 注意,即使使用# addBatch ,有执行查询命令的开销。服务器需要花费大量时间来单独处理每条消息。

  • Note, that even with #addBatch, there's overhead of "execute query" commands. It does take server notable time to process each message individually.

    减少查询数量的方法之一是使用多值插入。例如:

    One of the ways to reduce the number of queries is to use multi-values insert. For instance:

    insert into tab(a,b,c) values (?,?,?), (?,?,?), ..., (?,?,?)
    

    此PostgreSQL可以插入多个一次排。缺点是您没有详细的(每行)错误消息。目前Hibernate没有实现多值插入。

    This PostgreSQL enables to insert multiple rows at once. The drawback is you don't have detailed (per-row) error message. Currently Hibernate does not implement multi-values insert.

    但是自从9.4.1209(2016-07-15)起,pgjdbc可以动态地将常规批量插入重写为多值。

    However pgjdbc can rewrite regular batch inserts into multi-values on the fly since 9.4.1209 (2016-07-15).

    为了激活多值重写,您需要添加 reWriteBatchedInserts = true 连接属性。该功能最初是在)

    It is smart enough to use 2 statements in order to insert 10 rows. The first one is 8-valued statement, and the second one is 2-valued statement. Usage of powers of two enables pgjdbc to keep the number of distinct statements sane, and that improves performance as often-used statements are server-prepared (see What's the life span of a PostgreSQL server-side prepared statement )

    BatchedQuery 表示这种多值语句,因此您将看到中使用的类reWriteBatchedInserts = true 仅限案例。

    BatchedQuery is representing that kind of multi-valued statements, so you will see that class used in reWriteBatchedInserts=true case only.

    该功能的缺点可能包括:作为批处理结果的较低细节。例如,常规批处理为您提供per statement rowcount,但在多值情况下,您只需获得语句已完成状态。最重要的是,动态重写器可能无法解析某些SQL语句(例如)。

    The drawbacks of the feature might include: lower details as the "batch result". For instance, regular batch gives you "per statement rowcount", however in multi-values case you just get "statement completed" status. On top of that, on-the-fly rewritter might fail to parse certain SQL statements (e.g. https://github.com/pgjdbc/pgjdbc/issues/1045 ).

    这篇关于JDBC批处理操作的理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

  • 08-12 04:41