本文介绍了HBase:复制是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在评估HBase作为数据存储,但是有一个问题没有得到答案:HBase在许多节点(aka replication)上存储了同一对象的许多副本。由于HBase具有所谓的一致性(与最终一致性相对),它保证每个副本在读取时都返回相同的值。



当我理解HBase概念时,在读取值时,首先查询HBase主服务器(必须有多个RegionServer)提供数据。然后,我可以发出读取和写入请求,而无需发明主人。如何才能复制工作?




  • HBase如何提供一致性?

  • 工作?

  • 写操作块直到写完所有副本(=>同步复制)。如果是,谁来管理这项转账?

  • HDFS是如何进入游戏的?


我已经阅读了 BigTable -Paper并搜索了文档,但是我没有发现有关HBase体系结构的更多信息。

谢谢! / p>

解决方案

hbase不会以您想要的方式执行任何复制。它建立在HDFS之上,为构成hbase表的数据块提供复制。但是,只有一个区域服务器曾经为任何给定的行提供或写入数据。

通常,regionservers与数据节点共同驻留。如果可能,HDFS中的所有数据写入首先到本地节点,同一机架上的另一个节点和另一个不同机架上的另一个节点(HDFS中的复制因子为3)。因此,一个地区服务器最终会以从本地服务器提供的所有数据结束。



至于阻止:唯一的阻止是直到WAL(提前写入日志)被刷新到磁盘。这保证没有数据丢失,因为日志总是可以重播。请注意,较早版本的hbase没有解决这个问题,因为直到最近HDFS才支持持续追加操作。目前我们处于一种奇怪的状态,因为没有官方的Apache Hadoop版本支持append和HBase。与此同时,您可以自己应用追加补丁,也可以使用Cloudera发行版(推荐)。

HBase具有相关的复制功能,可以复制数据从一个群集到另一个群组。

I'm currently evaluating HBase as a Datastore, but one question was left unanswered: HBase stores many copies of the same object on many nodes (aka replication). As HBase features so-called strong consistency (in constrast to eventual consistent) it guarantees that every replica returns the same value if read.

As I understood the HBase concept, when reading values, first the HBase master is queried for a (there must be more than one) RegionServer providing the data. Then I can issue read and write requests without invention of the master. How can then replication work?

  • How does HBase provide concistency?
  • How do write operations internally work?
  • Do write operations block until all replicas are written (=> synchronous replication). If yes, who manages this transfer?
  • How does HDFS come into the game?

I have already read the BigTable-Paper and searched the docs, but I found no further information on the architecture of HBase.

Thanks!

解决方案

hbase does not do any replication in the way that you are thinking. It is built on top of HDFS, which provides replication for the data blocks that make up the hbase tables. However, only one regionserver ever serves or writes data for any given row.

Usually regionservers are colocated with data nodes. All data writes in HDFS go to the local node first, if possible, another node on the same rack, and another node on a different rack (given a replication factor of 3 in HDFS). So, a region server will eventually end up with all of its data served from the local server.

As for blocking: the only block is until the WAL (write ahead log) is flushed to disk. This guarentees that no data is lost as the log can always be replayed. Note that older version of hbase did not have this worked out because HDFS did not support a durable append operation until recently. We are in a strange state for the moment as there is no official Apache release of Hadoop that supports both append and HBase. In the meantime, you can either apply the append patch yourself, or use the Cloudera distribution (recommended).

HBase does have a related replication feature that will allow you to replicate data from one cluster to another.

这篇关于HBase:复制是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 08:07