本文介绍了为什么Impala花大量时间打开HDFS文件(TotalRawHdfsOpenFileTime)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现我的Impala群性能不稳定,通常只需要几秒钟(不到10s)即可完成查询,但是偶尔会花费40s以上的时间(这种情况会持续几分钟),发生这种情况时,根据配置文件,TotalRawHdfsOpenFileTime非常高,这意味着大部分时间都花在了打开HDFS文件上.

I find that my Impala swarm performs not stable, normally it takes only a few seconds (less than 10s) to finish a query, but occasionally it will take more than 40s (and this situation will last for a few minutes), and when that happens, accroding to the profile, TotalRawHdfsOpenFileTime is very high, which implies most of the time is spend on opening HDFS file.

那么可能的原因是什么,我该如何解决呢?

So what is the possible reason and how can I solve it?

推荐答案

这是打开文件所花费的时间.如果您要查询HDFS,这通常意味着它在花时间从namenode上获取数据.

This is time spent opening files. If you're querying HDFS, this often means that it's spending time fetching data from the namenode.

通过启用文件句柄缓存,我们看到了许多生产部署中出现的巨大改进,这些问题已通过启用文件句柄缓存- https://docs.cloudera.com/documentation/enterprise/5-15-x/topics/impala_scalability.html#scalability_file_handle_cache

We saw dramatic improvements in a lot of production deployments running into this bottleneck by enabling file handle caching - https://docs.cloudera.com/documentation/enterprise/5-15-x/topics/impala_scalability.html#scalability_file_handle_cache

这篇关于为什么Impala花大量时间打开HDFS文件(TotalRawHdfsOpenFileTime)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 08:18