本文介绍了以与 pyspark 中类似的方式在 pandas 中分配一个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

df = pd.DataFrame([['A', 1],['B', 2],['C', 3]], columns=['index', 'result'])
索引结果
A1
B2
C3

我想创建一个新列,例如将结果"列乘以 2,我只是想知道是否有一种方法可以像 pyspark 那样在 Pandas 中做到这一点

I would like to create a new column, for example multiply the column 'result' by two, and I am just curious to know if there is a way to do it in pandas as pyspark does it.

In pyspark:
df = df\
.withColumn("result_multiplied", F.col("result")*2)

我不喜欢每次必须执行操作时都写数据帧的名称,因为它在 Pandas 中完成,例如:

I don't like the fact of writing the name of the dataframe everytime I have to perform an operation as it is done in pandas such as:

In pandas:
df['result_multiplied'] = df['result']*2

推荐答案

使用 DataFrame.assign:

df = df.assign(result_multiplied = df['result']*2)

或者如果 result 列在代码中处理之前是必要的 lambda 函数来处理 result 列中的计数值:

Or if column result is processing in code before is necessary lambda function for processing counted values in column result:

df = df.assign(result_multiplied = lambda x: x['result']*2)

查看差异列的示例result_multiplied 是由多个原始df['result'] 计算的,对于result_multiplied1 是在mul(2):

Sample for see difference column result_multiplied is count by multiple original df['result'], for result_multiplied1 is used multiplied column after mul(2):

df = df.mul(2).assign(result_multiplied = df['result']*2,
                      result_multiplied1 = lambda x: x['result']*2)
print (df)
  index  result  result_multiplied  result_multiplied1
0    AA       2                  2                   4
1    BB       4                  4                   8
2    CC       6                  6                  12

这篇关于以与 pyspark 中类似的方式在 pandas 中分配一个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 14:42