本文介绍了使用 extraOptimizations 转换 Spark SQL AST的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将 SQL 字符串作为用户输入,然后在执行前对其进行转换.特别是,我想修改顶级投影(select 子句),注入要由查询检索的额外列.

I'm wanting to take a SQL string as a user input, then transform it before execution. In particular, I want to modify the top-level projection (select clause), injecting additional columns to be retrieved by the query.

我希望通过使用 sparkSession.experimental.extraOptimizations 连接到 Catalyst 来实现这一目标.我知道我正在尝试的不是严格意义上的优化(转换改变了 SQL 语句的语义),但 API 似乎仍然合适.但是,查询执行器似乎忽略了我的转换.

I was hoping to achieve this by hooking into Catalyst using sparkSession.experimental.extraOptimizations. I know that what I'm attempting isn't strictly speaking an optimisation (the transformation changes the semantics of the SQL statement), but the API still seems suitable. However, my transformation seems to be ignored by the query executor.

这是一个最小的例子来说明我遇到的问题.首先定义一个行案例类:

Here is a minimal example to illustrate the issue I'm having. First define a row case class:

case class TestRow(a: Int, b: Int, c: Int)

然后定义一个简单地丢弃任何投影的优化规则:

Then define an optimisation rule which simply discards any projection:

object RemoveProjectOptimisationRule extends Rule[LogicalPlan] {
    def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
        case x: Project => x.child
    }
}

现在创建一个数据集,注册优化,并运行 SQL 查询:

Now create a dataset, register the optimisation, and run a SQL query:

// Create a dataset and register table.
val dataset = List(TestRow(1, 2, 3)).toDS()
val tableName: String = "testtable"
dataset.createOrReplaceTempView(tableName)

// Register "optimisation".
sparkSession.experimental.extraOptimizations =
    Seq(RemoveProjectOptimisationRule)

// Run query.
val projected = sqlContext.sql("SELECT a FROM " + tableName + " WHERE a = 1")

// Print query result and the queryExecution object.
println("Query result:")
projected.collect.foreach(println)
println(projected.queryExecution)

输出如下:

Query result:
[1]

== Parsed Logical Plan ==
'Project ['a]
+- 'Filter ('a = 1)
   +- 'UnresolvedRelation `testtable`

== Analyzed Logical Plan ==
a: int
Project [a#3]
+- Filter (a#3 = 1)
   +- SubqueryAlias testtable
      +- LocalRelation [a#3, b#4, c#5]

== Optimized Logical Plan ==
Filter (a#3 = 1)
+- LocalRelation [a#3, b#4, c#5]

== Physical Plan ==
*Filter (a#3 = 1)
+- LocalTableScan [a#3, b#4, c#5]

我们看到结果与原始 SQL 语句的结果相同,没有应用转换.然而,在打印逻辑和物理计划时,投影确实已被删除.我还确认(通过调试日志输出)确实正在调用转换.

We see that the result is identical to that of the original SQL statement, without the transformation applied. Yet, when printing the logical and physical plans, the projection has indeed been removed. I've also confirmed (through debug log output) that the transformation is indeed being invoked.

对这里发生的事情有什么建议吗?也许优化器只是忽略了改变语义的优化"?

Any suggestions as to what's going on here? Maybe the optimiser simply ignores "optimisations" that change semantics?

如果使用优化不是可行的方法,有人可以提出替代方案吗?我真正想做的就是解析输入的 SQL 语句,对其进行转换,然后将转换后的 AST 传递给 Spark 执行.但据我所知,用于执行此操作的 API 是 Spark sql 包私有的.可能可以使用反射,但我想避免这种情况.

If using the optimisations isn't the way to go, can anybody suggest an alternative? All I really want to do is parse the input SQL statement, transform it, and pass the transformed AST to Spark for execution. But as far as I can see, the APIs for doing this are private to the Spark sql package. It may be possible to use reflection, but I'd like to avoid that.

任何指针将不胜感激.

推荐答案

正如您所猜想的那样,这是行不通的,因为我们假设优化器不会更改查询结果.

As you guessed, this is failing to work because we make assumptions that the optimizer will not change the results of the query.

具体来说,我们缓存来自分析器的模式(并假设优化器不会更改它).将行转换为外部格式时,我们使用此模式并因此截断结果中的列.如果你做的不仅仅是截断(即改变数据类型),这甚至可能会崩溃.

Specifically, we cache the schema that comes out of the analyzer (and assume the optimizer does not change it). When translating rows to the external format, we use this schema and thus are truncating the columns in the result. If you did more than truncate (i.e. changed datatypes) this might even crash.

正如您在 ,它实际上产生了您在封面下期望的结果.我们计划在不久的将来某个时候开放更多钩子,让您可以在查询执行的其他阶段修改计划.有关详细信息,请参阅 SPARK-18127.

这篇关于使用 extraOptimizations 转换 Spark SQL AST的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 07:22