本文介绍了Apache Spark,添加一个“CASE WHEN ... ELSE ...”计算列到现有DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Scala API将计算列添加到现有DataFrame中的CASE WHEN ... ELSE ...。
开始数据框:

I'm trying to add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame, using Scala APIs.Starting dataframe:

color
Red
Green
Blue

所需的数据帧(SQL语法:CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):

Desired dataframe (SQL syntax: CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):

color bool
Red   0
Green 1
Blue  0

我应该如何实现这个逻辑?

How should I implement this logic?

推荐答案

p>在即将到来的SPARK 1.4.0版本中(应在接下来的几天内发布)。您可以使用when / else语法:

In the upcoming SPARK 1.4.0 release (should be released in the next couple of days). You can use the when/otherwise syntax:

// Create the dataframe
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color")

// Use when/otherwise syntax
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))

如果您使用的是SPARK 1.3.0,您可以选择使用UDF:

If you are using SPARK 1.3.0 you can chose to use a UDF:

// Define the UDF
val isGreen = udf((color: String) => {
  if (color == "Green") 1
  else 0
})
val df2 = df.withColumn("Green_Ind", isGreen($"color"))

这篇关于Apache Spark,添加一个“CASE WHEN ... ELSE ...”计算列到现有DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 09:38