本文介绍了如果没有指定的分区路径可用,SPARK SQL将失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在EMR中使用Hive Metastore。我能够通过HiveSQL手动查询表。

但是当我在Spark Job中使用同一个表时,它表示 输入路径不存在:s3:// b
$ b

我已经在s3://中删除了我的上述分区路径,但它仍然可以在我的Hive没有在桌面级别丢弃分区。但它不在pyspark工作



以下是我的完整代码

  from pyspark import SparkContext,HiveContext $ b $ p from pyspark import SQLContext 
from pyspark.sql import SparkSession

sc = SparkContext(appName =test)
sqlContext = SQLContext (sparkContext = sc)
sqlContext.sql(logan_test.salary_csv中的select count(*))。show()
print(done ..)

我按照以下方式提交了我的作业以使用配置表编目表。



spark-submit test.py --files /usr/lib/hive/conf/hive-site.xml

解决方案

我有一个与HDFS类似的错误,其中Metastore为表格保留了一个分区,但目录不见了



检查s3 .. 。如果它丢失或者你删除了,你需要从Hive运行 MSCK REPAIR TABLE 。有时候这不起作用,而且实际上确实需要一个 DROP PARTITION



该属性默认为false,但是通过传递一个 SparkConf 对象到 SparkContext



来设置配置属性

  from pyspark import SparkConf,SparkContext 

conf = SparkConf()。setAppName(test)。set(spark.sql.hive.verifyPartitionPath ,false))
sc = SparkContext(conf = conf)

Spark 2的方式是使用SparkSession。

  from pyspark.sql import SparkSession 

spark = SparkSession.builder \
。 .. .appName(test)\
... .config(spark.sql.hive.verifyPartitionPath,false)\
... .enableHiveSupport()
... .getOrCreate()


I am using Hive Metastore in EMR. I am able to query the table manually through HiveSQL .
But When i use the same table in Spark Job, it says Input path does not exist: s3://

I have deleted my above partition path in s3://.. but it still works in my Hive without Dropping Partition at table level. but its not working in pyspark anyways

Here is my full code

from pyspark import SparkContext, HiveContext
from pyspark import SQLContext
from pyspark.sql import SparkSession

sc = SparkContext(appName = "test")
sqlContext = SQLContext(sparkContext=sc)
sqlContext.sql("select count(*) from logan_test.salary_csv").show()
print("done..")

I submitted my job as below to use hive catalog tables.

spark-submit test.py --files /usr/lib/hive/conf/hive-site.xml

解决方案

I have had a similar error with HDFS where the Metastore kept a partition for the table, but the directory was missing

Check s3... If it is missing, or you deleted it, you need to run MSCK REPAIR TABLE from Hive. Sometimes this doesn't work, and you actually do need a DROP PARTITION

That property is false by default, but you set configuration properties by passing a SparkConf object to SparkContext

from pyspark import SparkConf, SparkContext

conf = SparkConf().setAppName("test").set("spark.sql.hive.verifyPartitionPath", "false"))
sc = SparkContext(conf = conf)

Or, the Spark 2 way is using a SparkSession.

from pyspark.sql import SparkSession

spark = SparkSession.builder \
...     .appName("test") \
...     .config("spark.sql.hive.verifyPartitionPath", "false") \
...     .enableHiveSupport()
...     .getOrCreate()

这篇关于如果没有指定的分区路径可用,SPARK SQL将失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 14:30