本文介绍了当使用HBase作为MapReduce的源代码时,我可以扩展TableInputFormatBase为每个区域创建多个分割和多个映射器吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑使用HBase作为我的MapReduce作业之一的源代码。我知道TableInputFormat为每个区域指定一个输入分割(以及一个映射器)。但是,这似乎效率低下。我真的很想有多个mapper一次在给定的区域工作。我可以通过扩展TableInputFormatBase来实现吗?你能指点我一个例子吗?此外,这是一个好主意吗?



感谢您的帮助。

解决方案

您需要一个扩展InputFormat的自定义输入格式。你可以知道如何从答案到问题。如果数据处理时间大于数据检索时间,这是一个好主意。

I'm thinking about using HBase as a source for one of my MapReduce jobs. I know that TableInputFormat specifies one input split (and thus one mapper) per Region. However, this seems inefficient. I'd really like to have multiple mappers working on a given Region at once. Can I achieve this by extending TableInputFormatBase? Can you please point me to an example? Furthermore, is this even a good idea?

Thanks for the help.

You need a custom input format that extends InputFormat. you can get idea how do this from answer to question I want to scan lots of data (Range based queries), what all optimizations I can do while writing the data so that scan becomes faster. This is a good idea if the time of data processing is more greater then data retrieving time.

这篇关于当使用HBase作为MapReduce的源代码时,我可以扩展TableInputFormatBase为每个区域创建多个分割和多个映射器吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-26 22:28