本文介绍了spark 2.4.0 给出了“检测到的隐式笛卡尔积";空右 DF 的左连接异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎在 spark 2.2.1 和 spark 2.4.0 之间,具有空右数据帧的左连接的行为从成功更改为返回AnalysisException:检测到逻辑计划之间的 LEFT OUTER 连接的隐式笛卡尔积".

it appears that between spark 2.2.1 and spark 2.4.0, the behavior of left join with empty right dataframe changed from succeeding to returning "AnalysisException: Detected implicit cartesian product for LEFT OUTER join between logical plans".

例如:

val emptyDf = spark.emptyDataFrame
  .withColumn("id", lit(0L))
  .withColumn("brand", lit(""))
val nonemptyDf = ((1L, "a") :: Nil).toDF("id", "size")
val neje = nonemptyDf.join(emptyDf, Seq("id"), "left")
neje.show()

在 2.2.1 中,结果是

in 2.2.1, the result is

+---+----+-----+
| id|size|brand|
+---+----+-----+
|  1|   a| null|
+---+----+-----+

但是,在 2.4.0 中,我收到以下异常:

however, in 2.4.0, i get the following exception:

org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for LEFT OUTER join between logical plans
LocalRelation [id#278L, size#279]
and
Project [ AS brand#55]
+- LogicalRDD false
Join condition is missing or trivial.
Either: use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;

这是后者的完整计划说明:

here is the full plan explanation for the latter:

> neje.explain(true)

== Parsed Logical Plan ==
'Join UsingJoin(LeftOuter,List(id))
:- Project [_1#275L AS id#278L, _2#276 AS size#279]
:  +- LocalRelation [_1#275L, _2#276]
+- Project [id#53L,  AS brand#55]
   +- Project [0 AS id#53L]
      +- LogicalRDD false

== Analyzed Logical Plan ==
id: bigint, size: string, brand: string
Project [id#278L, size#279, brand#55]
+- Join LeftOuter, (id#278L = id#53L)
   :- Project [_1#275L AS id#278L, _2#276 AS size#279]
   :  +- LocalRelation [_1#275L, _2#276]
   +- Project [id#53L,  AS brand#55]
      +- Project [0 AS id#53L]
         +- LogicalRDD false

== Optimized Logical Plan ==
org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for LEFT OUTER join between logical plans
LocalRelation [id#278L, size#279]
and
Project [ AS brand#55]
+- LogicalRDD false
Join condition is missing or trivial.
Either: use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for LEFT OUTER join between logical plans
LocalRelation [id#278L, size#279]
and
Project [ AS brand#55]
+- LogicalRDD false
Join condition is missing or trivial.
Either: use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;

其他观察:

  • 如果只有左边的数据框为空,则连接成功.
  • 类似的行为变化适用于具有空左的右连接数据框.
  • 然而,有趣的是,请注意两个版本都失败了如果两个数据帧都为空,则内部联接的 AnalysisException.

这是回归还是设计?早期的行为对我来说似乎更正确.我在 spark 发行说明、spark jira 问题或 stackoverflow 问题中找不到任何相关信息.

is this a regression or by design? the earlier behavior seems more correct to me. i have not been able to find any relevant information in spark release notes, spark jira issues, or stackoverflow questions.

推荐答案

我没有遇到你的问题,但至少有同样的错误,我通过明确允许交叉连接来修复它:

I didn't have quite your problem, but the same error at least, and I fixed it by explicitly allowing the cross-join:

spark.conf.set( "spark.sql.crossJoin.enabled" , "true" )

这篇关于spark 2.4.0 给出了“检测到的隐式笛卡尔积";空右 DF 的左连接异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 08:19