本文介绍了复杂的(对我而言)在Pandas中由宽变长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

个人(索引从0到5)在两个位置之间选择:A和B.我的数据格式很宽,包含因人而异的特征(ind_var)和仅因位置而异的特征(location_var).

Individuals (indexed from 0 to 5) choose between two locations: A and B.My data has a wide format containing characteristics that vary by individual (ind_var) and characteristics that vary only by location (location_var).

例如,我有:

In [281]:

df_reshape_test = pd.DataFrame( {'location' : ['A', 'A', 'A', 'B', 'B', 'B'], 'dist_to_A' : [0, 0, 0, 50, 50, 50], 'dist_to_B' : [50, 50, 50, 0, 0, 0], 'location_var': [10, 10, 10, 14, 14, 14], 'ind_var': [3, 8, 10, 1, 3, 4]})

df_reshape_test

Out[281]:
    dist_to_A   dist_to_B   ind_var location location_var
0    0            50             3   A       10
1    0            50             8   A       10
2    0            50            10   A       10
3    50           0              1   B       14
4    50           0              3   B       14
5    50           0              4   B       14

变量位置"是个人选择的变量.dist_to_A是从个人选择的位置到位置A的距离(与dist_to_B相同)

The variable 'location' is the one chosen by the individual.dist_to_A is the distance to location A from the location chosen by the individual (same thing with dist_to_B)

我希望我的数据具有以下形式:

I'd like my data to have this form:

    choice  dist_S  ind_var location    location_var
0    1        0       3         A           10
0    0       50       3         B           14
1    1        0       8         A           10
1    0       50       8         B           14
2    1        0      10         A           10
2    0       50      10         B           14
3    0       50       1         A           10
3    1        0       1         B           14
4    0       50       3         A           10
4    1        0       3         B           14
5    0       50       4         A           10
5    1        0       4         B           14

其中choice == 1表示个人已选择该位置,而dist_S是距所选位置的距离.

where choice == 1 indicates individual has chosen that location and dist_S is the distance from the location chosen.

我阅读了有关 .stack 方法,但无法弄清楚在这种情况下如何应用它.感谢您的宝贵时间!

I read about the .stack method but couldn't figure out how to apply it for this case.Thanks for your time!

注意:这只是一个简单的例子.我要查找的数据集具有不同的位置数量,每个位置的个人数量也不尽相同,因此,我正在寻找一种灵活的解决方案

NOTE: this is just a simple example. The datasets I'm looking have varying numbers of location and number of individuals per location, so I'm looking for a flexible solution if possible

推荐答案

实际上,pandas有一个wide_to_long命令,可以方便地完成您打算做的事情.

In fact, pandas has a wide_to_long command that can conveniently do what you intend to do.

df = pd.DataFrame( {'location' : ['A', 'A', 'A', 'B', 'B', 'B'], 
                'dist_to_A' : [0, 0, 0, 50, 50, 50], 
                'dist_to_B' : [50, 50, 50, 0, 0, 0], 
                'location_var': [10, 10, 10, 14, 14, 14], 
                'ind_var': [3, 8, 10, 1, 3, 4]})

df['ind'] = df.index

#The `location` and `location_var` corresponds to the choices, 
#record them as dictionaries and drop them 
#(Just realized you had a cleaner way, copied from yous). 

ind_to_loc = dict(df['location'])
loc_dict = dict(df.groupby('location').agg(lambda x : int(np.mean(x)))['location_var'])
df.drop(['location_var', 'location'], axis = 1, inplace = True)
# now reshape
df_long = pd.wide_to_long(df, ['dist_to_'], i = 'ind', j = 'location') 

# use the dictionaries to get variables `choice` and `location_var` back.

df_long['choice'] = df_long.index.map(lambda x: ind_to_loc[x[0]])
df_long['location_var'] = df_long.index.map(lambda x : loc_dict[x[1]])
print df_long.sort()

这将为您提供所需的表格:

This gives you the table you asked for:

              ind_var  dist_to_ choice  location_var
ind location                                        
0   A               3         0      A            10
    B               3        50      A            14
1   A               8         0      A            10
    B               8        50      A            14
2   A              10         0      A            10
    B              10        50      A            14
3   A               1        50      B            10
    B               1         0      B            14
4   A               3        50      B            10
    B               3         0      B            14
5   A               4        50      B            10
    B               4         0      B            14

当然,如果您想要的话,您可以生成一个选择变量,该变量采用01.

Of course you can generate a choice variable that takes 0 and 1 if that's what you want.

这篇关于复杂的(对我而言)在Pandas中由宽变长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 11:02