I've experimented a lot with various combinations of groupby, stack(), and expanding() to no avail.推荐答案这是我自己尝试计算合并所有列的扩展 Z 分数的尝试.欢迎对如何更有效地进行评论.This is my own attempt at trying to calculate the expanding Z-Scores pooling all of the columns. Comments on how to do it more efficiently would be welcome.def pooled_expanding_zscore(df, min_periods=2):"""Calculates an expanding Z-Score down the rows of the DataFrame while pooling all of the columns.Assumes that indexes are not hierarchical.Assumes that df does not have columns named 'exp_mean' and 'exp_std'."""# Get last sorted column namecolNames = df.columns.valuescolNames.sort()lastCol = colNames[-1]# Index nameindexName = df.index.name# Normalize DataFramedf_stacked = pd.melt(df.reset_index(),id_vars=indexName).sort_values(by=[indexName,'variable'])# Calculates the expanding mean and standard deviation on df_stacked# Keeps just the rows where 'variable'==lastColdf_exp = df_stacked.expanding(2)['value']df_stacked.loc[:,'exp_mean'] = df_exp.mean()df_stacked.loc[:,'exp_std'] = df_exp.std()exp_stats = (df_stacked.loc[df_stacked.variable==lastCol,:] .reset_index() .drop(['index','variable','value'], axis=1) .set_index(indexName))# add exp_mean and exp_std back to dfdf = pd.concat([df,exp_stats],axis=1)# Calculate Z-Scoredf_mat = df.loc[:,colNames].as_matrix()exp_mean_mat = df.loc[:,'exp_mean'].as_matrix()[:,np.newaxis]exp_std_mat = df.loc[:,'exp_std'].as_matrix()[:,np.newaxis]zScores = pd.DataFrame( (df_mat - exp_mean_mat) / exp_std_mat, index=df.index, columns=colNames)# Use min_periods to kill off early rowszScores.iloc[:min_periods-1,:] = np.nanreturn zScores 这篇关于Pandas - 跨多列扩展 Z-Score的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-27 16:52