JDBC批处理操作理解

本文介绍了JDBC批处理操作理解的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在我的应用程序中使用 Hibernate ORM 和 PostgreSQL，有时我使用批处理操作.起初我不明白为什么在batch size = 25的日志中，会生成25个查询，一开始以为它不能正常工作.但是之后我查看了pg驱动的源代码，在PgStatement类中发现了以下几行:

I use Hibernate ORM and PostgreSQL in my application, and sometimes i use batch operations.And at first I didn't understand why in the logs with size of the batch = 25, 25 queries are generated, and at first thought that it does not work correctly.But after that I looked at the source code of the pg driver and found the following lines in the PgStatement class:

 public int[] executeBatch() throws SQLException {
        this.checkClosed();
        this.closeForNextExecution();
        if (this.batchStatements != null && !this.batchStatements.isEmpty()) {
            this.transformQueriesAndParameters();
//confuses next line, because we have array of identical queries
            Query[] queries = (Query[])this.batchStatements.toArray(new Query[0]);
            ParameterList[] parameterLists =
(ParameterList[])this.batchParameters.toArray(new ParameterList[0]);
            this.batchStatements.clear();
            this.batchParameters.clear();

在 PgPreparedStatement 类中

and in PgPreparedStatement class

    public void addBatch() throws SQLException {
        checkClosed();
        if (batchStatements == null) {
          batchStatements = new ArrayList<Query>();
          batchParameters = new ArrayList<ParameterList>();
        }

        batchParameters.add(preparedParameters.copy());
        Query query = preparedQuery.query;
    //confuses next line
        if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {
          batchStatements.add(query);
        }
      }

我注意到事实证明，如果批次的大小变为 25，发送了 25 个带有附加参数的查询.

I noticed that it turns out that if the size of the batch goes 25,25 queries are sent with the parameters attached to them.

数据库的日志证实了这一点，例如:

Logs of the database confirm this, for example:

2017-12-06 01:22:08.023 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_3: BEGIN
2017-12-06 01:22:08.024 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_4: select nextval ('tests_id_seq')
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '1'
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '2'
...
x23 queries with parameters
...
2017-12-06 01:22:08.063 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_5: COMMIT

但我认为必须使用包含 25 个参数的数组来执行一个查询.或者我不明白批量插入如何与准备好的语句一起工作?为什么重复一个查询 n 次?

But i thought one query must be executed with an array of 25 parameters.Or I don't understand how batch inserts work with a prepared statement?Why duplicate one query n times?

毕竟，我试图在这个地方调试我的查询

After all, i tried to debug my queries on this place

if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {

并注意到我的查询始终是 SimpleQuery 的实例，而不是 BatchedQuery.也许这是解决问题的方法?我找不到有关 BatchedQuery 的信息

and noticed that my queries are always instance of SimpleQuery instead of BatchedQuery. Maybe this is the solution to the problem? Information about BatchedQuery i couldn't find

推荐答案

可能涉及到各种批处理，我将介绍其中的 PostgreSQL JDBC 驱动程序 (pgjdbc) 部分.

There might be various kinds of batching involved, and I would cover PostgreSQL JDBC driver (pgjdbc) part of it.

TL;DR:在使用批处理 API 的情况下，pgjdbc 确实使用较少的网络往返.BatchedQuery 仅在 reWriteBatchedInserts=true 传递给 pgjdbc 连接设置时使用.

TL;DR: pgjdbc does use less network roundrips in case batch API is used. BatchedQuery is used only if reWriteBatchedInserts=true is passed to the pgjdbc connection settings.

您可能会发现 https://www.slideshare.net/VladimirSitnikv/postgresql-and-jdbc-striving-for-high-performance 相关(幻灯片 44，...)

You might find https://www.slideshare.net/VladimirSitnikv/postgresql-and-jdbc-striving-for-high-performance relevant (slide 44,...)

在查询执行方面，网络延迟通常是所用时间的重要组成部分.

When it comes to query execution, network latency is often a significant part of the elapsed time.

假设情况是插入 10 行.

Suppose the case is to insert 10 rows.

没有批处理(例如，在循环中只是PreparedStatement#execute).驱动程序将执行以下操作

No batching (e.g. just PreparedStatement#execute in a loop). The driver would perform the following

execute query
sync <-- wait for the response from the DB
execute query
sync <-- wait for the response from the DB
execute query
sync <-- wait for the response from the DB
...

在等待数据库"中花费了大量时间

Notable time would be spent in the "waiting for the DB"

JDBC 批处理 API.即 PreparedStatement#addBatch() 使驱动程序能够在单个网络往返中发送多个查询执行".然而，当前的实现仍然会将大批量拆分为小批量以避免 TCP 死锁.

JDBC batch API. That is PreparedStatement#addBatch() enables driver to send multiple "query executions" in a single network roundtrip. Current implementation, however would still split large batches into smaller ones to avoid TCP deadlock.

行动会更好:

execute query
...
execute query
execute query
execute query
sync <-- wait for the response from the DB

请注意，即使使用 #addBatch，也存在执行查询"命令的开销.单独处理每条消息确实需要服务器花费大量时间.

Note, that even with #addBatch, there's overhead of "execute query" commands. It does take server notable time to process each message individually.

减少查询次数的方法之一是使用多值插入.例如:

One of the ways to reduce the number of queries is to use multi-values insert. For instance:

insert into tab(a,b,c) values (?,?,?), (?,?,?), ..., (?,?,?)

这个 PostgreSQL 可以一次插入多行.缺点是您没有详细的(每行)错误消息.目前 Hibernate 没有实现多值插入.

This PostgreSQL enables to insert multiple rows at once. The drawback is you don't have detailed (per-row) error message. Currently Hibernate does not implement multi-values insert.

然而，从 9.4.1209 (2016-07-15) 开始，pgjdbc 可以将常规批量插入重写为多值.

However pgjdbc can rewrite regular batch inserts into multi-values on the fly since 9.4.1209 (2016-07-15).

为了激活多值重写，您需要添加reWriteBatchedInserts=true 连接属性.该功能最初是在 https://github.com/pgjdbc/pgjdbc/pull/491

In order to activate multi-values rewrite, you need to add reWriteBatchedInserts=true connection property. The feature was initially developed in https://github.com/pgjdbc/pgjdbc/pull/491

使用 2 条语句来插入 10 行已经足够聪明了.第一个是8值语句，第二个是2值语句.使用 2 的幂使 pgjdbc 能够保持不同语句的数量正常，这提高了性能，因为常用语句是服务器准备好的(参见 PostgreSQL 服务器端准备好的语句的生命周期是多少 )

It is smart enough to use 2 statements in order to insert 10 rows. The first one is 8-valued statement, and the second one is 2-valued statement. Usage of powers of two enables pgjdbc to keep the number of distinct statements sane, and that improves performance as often-used statements are server-prepared (see What's the life span of a PostgreSQL server-side prepared statement )

BatchedQuery 表示这种多值语句，因此您将看到该类仅在 reWriteBatchedInserts=true 情况下使用.

BatchedQuery is representing that kind of multi-valued statements, so you will see that class used in reWriteBatchedInserts=true case only.

该功能的缺点可能包括:作为批处理结果"的细节较低.例如，常规批处理为您提供每条语句行数"，但是在多值情况下，您只会获得语句完成"状态.最重要的是，即时重写器可能无法解析某些 SQL 语句(例如 https://github.com/pgjdbc/pgjdbc/issues/1045).

The drawbacks of the feature might include: lower details as the "batch result". For instance, regular batch gives you "per statement rowcount", however in multi-values case you just get "statement completed" status. On top of that, on-the-fly rewritter might fail to parse certain SQL statements (e.g. https://github.com/pgjdbc/pgjdbc/issues/1045 ).

这篇关于JDBC批处理操作理解的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！