本文介绍了withColumn() 中的 PySpark list() 只工作一次,然后 AssertionError: col 应该是 Column的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 DataFrame,其中有 6 个字符串列,名为Spclty1"...Spclty6",另外 6 个名为StartDt1"...StartDt6".我想将它们压缩并折叠成如下所示的列:[[Spclty1, StartDt1]...[Spclty6, StartDt6]]

I have a DataFrame with 6 string columns named like 'Spclty1'...'Spclty6' and another 6 named like 'StartDt1'...'StartDt6'. I want to zip them and collapse into a columns that looks like this:[[Spclty1, StartDt1]...[Spclty6, StartDt6]]

我首先尝试将Spclty"列折叠成这样的列表:

I first tried collapsing just the 'Spclty' columns into a list like this:

DF = DF.withColumn('Spclty', list(DF.select('Spclty1', 'Spclty2', 'Spclty3', 'Spclty4', 'Spclty5', 'Spclty6')))

这在我第一次执行时起作用,给我一个名为Spclty"的新列,其中包含诸如 ['014', '124', '547', '000', '000', '000'],正如预期的那样.

This worked the first time I executed it, giving me a new column called 'Spclty' containing rows such as ['014', '124', '547', '000', '000', '000'], as expected.

然后,我在我的脚本中添加了一行,以在一组不同的 6 个字符串列上执行相同的操作,名为 'StartDt1'...'StartDt6':

Then, I added a line to my script to do the same thing on a different set of 6 string columns, named 'StartDt1'...'StartDt6':

DF = DF.withColumn('StartDt', list(DF.select('StartDt1', 'StartDt2', 'StartDt3', 'StartDt4', 'StartDt5', 'StartDt6'))))

这导致了AssertionError: col should be Column.

在我尝试的东西用完后,我再次尝试了原始操作(作为完整性检查):

After I ran out of things to try, I tried the original operation again (as a sanity check):

DF.withColumn('Spclty', list(DF.select('Spclty1', 'Spclty2', 'Spclty3', 'Spclty4', 'Spclty5', 'Spclty6'))).collect()

并得到上述断言错误.

因此,最好理解为什么它只在第一次(仅)起作用,但主要问题是:将列压缩到 Spark 中类似 dict 的元素集合的正确方法是什么?

So, it would be good to understand why it only worked the first time (only), but the main question is: what is the correct way to zip columns into a collection of dict-like elements in Spark?

推荐答案

.withColumn() 需要一个列对象作为第二个参数,而您提供的是一个列表.

.withColumn() expects a column object as second parameter and you are supplying a list.

这篇关于withColumn() 中的 PySpark list() 只工作一次,然后 AssertionError: col 应该是 Column的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 19:25