本文介绍了HBase批处理put(List< Put>)是否比put(Put)更快?放置对象的容量是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个批处理作业,以通过HTableInterface将一批Put对象处理到HBase中. API有两种方法,即HTableInterface.put(List)和HTableInterface.put(Put).

I am working on a batch job to process a batch of Put objects into HBase through HTableInterface. There are two API methods, HTableInterface.put(List) and HTableInterface.put(Put).

我想知道,对于相同数量的Put对象,批次的放置速度是否比逐个放置它们的速度快?

I am wondering, for the same number of Put objects, is the batch put faster than putting them one by one?

另一个问题是,我正在放置一个非常大的Put对象,这导致作业失败.放置对象的大小似乎受到限制.可以有多大?

Another question is, I am putting a very large Put object, which caused the job to fail. There seems a limit on the size of a Put object. How large can it be?

推荐答案

put(List<Put> puts)put(Put aPut)完全相同.他们都叫doPut(List<Put> puts).

put(List<Put> puts) or put(Put aPut) are the same under the hood. They both call doPut(List<Put> puts).

重要的是@ozhang提到的缓冲区大小.例如默认值为2MB.

What matters is the buffer size as mentioned by @ozhang. e.g. The default value is 2MB.

<property>   
     <name>hbase.client.write.buffer</name>
     <value>2097152</value> 
</property>

每次写缓冲区被填满并触发flushCommits()时,将有1个RPC.因此,如果由于对象比较大而使应用程序经常刷新,尝试增加写缓冲区大小将解决此问题.

There will be 1 RPC every time the write buffer is filled up and a flushCommits() is triggered. So if your application is flushing to often because your objects are relatively big, experimenting with increasing the write buffer size will solve the problem.

这篇关于HBase批处理put(List&lt; Put&gt;)是否比put(Put)更快?放置对象的容量是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 01:21