问题描述
我有一个 15000 个村庄的数据集,对于 1 个地区,有 12 个街区/Taluka,在那个地区种植了几种作物,我必须检查该村庄的作物播种面积,并选择 10 个村庄用于以随机抽样为基础的每种作物,我的第一步是删除数据集中的 0 个播种面积的村庄,删除 0 个播种面积后我得到 6674 个村庄,接下来我检查,在一个地区,在一个街区/Taluka 中有多少村庄是剩余的,所以我使用 pivot 和 group by 函数来检查.枢轴后我可以看到在一个街区/塔卢卡中只剩下不到 10 个村庄,所以在那段时间我需要删除那个输出少于 10 个村庄的街区/塔卢卡,但接下来我很难如何使用计数函数获取数据,数据透视表仅给出数字 102,42....等但我可以看到实际数据村名称,hec 中的播种面积等..这是我的代码
I have a dataset for 15000 Villages,For 1 district,there are 12 blocks/Taluka, In that district there are several crops grown,I have to check that, crop wise sown area for that villages, and select 10 villages for each crop in a random sampling basis , My first step is to remove 0 sown area villages in a data set, after removing 0 sown area I get 6674 villages, next I am check that, in a district,In a block/Taluka how many villages are remaining,so I am use pivot and group by function for check that. After pivot I can seen that In a block/taluka there are only remaining less than 10 number of villages, so in that time I need to deleted that block/taluka which are giving output of less than 10 villages, but next I am struggle to how to get data from using count function, pivot table give that only number 102,42....etc but where I can see that actual data village name,sown area in hec etc..here is my code
import pandas as pd
import numpy as np
d=pd.read_excel("/media/desktop/District.xlsx","Data")
d.drop(d.loc[d['Area in hec']==0].index, inplace=True)
d.count()
Sr no 6674
District 6674
Taluka 6674
Revenue Circle 6674
Village Name 6674
Crop 6674
Area in hec 6674
pivot = d.pivot_table(index=['Taluka','Crop'], values=['Area in hec'], aggfunc='count')
pivot=pivot.reset_index()
pivot.loc[pivot['Area in hec'] >= 10]
Taluka Crop Area in hec
0 Ahmednagar Bajra 102
2 Ahmednagar Cotton 33
3 Ahmednagar Greengram 86
4 Ahmednagar Maize 77
5 Ahmednagar Redgram 24
6 Ahmednagar Soyabean 74
7 Akole Bajra 78
8 Akole Blackgram 29
10 Akole Groundnut 162
11 Akole Maize 91
12 Akole Paddy 125
13 Akole Soyabean 129
14 Jamkhed Bajra 86
15 Jamkhed Blackgram 87
16 Jamkhed Cotton 86
17 Jamkhed Greengram 87
18 Jamkhed Groundnut 13
19 Jamkhed Maize 87
20 Jamkhed Onion 47
21 Jamkhed Redgram 87
22 Jamkhed Soyabean 65
23 Karjat Bajra 119
24 Karjat Blackgram 111
25 Karjat Cotton 106
26 Karjat Greengram 118
27 Karjat Groundnut 34
28 Karjat Maize 119
29 Karjat Onion 107
30 Karjat Redgram 103
31 Karjat Sesame(Til) 10
.. ... ... ...
63 Pathardi Groundnut 118
64 Pathardi Maize 123
65 Pathardi Onion 77
66 Pathardi Redgram 132
67 Pathardi Sesame(Til) 25
68 Pathardi Soyabean 26
70 Rahuri Bajra 44
72 Rahuri Cotton 72
73 Rahuri Greengram 20
75 Rahuri Maize 54
77 Rahuri Soyabean 60
78 Sangamner Bajra 163
80 Sangamner Cotton 39
81 Sangamner Greengram 37
82 Sangamner Groundnut 75
83 Sangamner Maize 179
84 Sangamner Redgram 46
85 Sangamner Soyabean 137
86 Shevgaon Bajra 98
88 Shevgaon Cotton 112
89 Shevgaon Greengram 31
90 Shevgaon Groundnut 41
91 Shevgaon Maize 54
92 Shevgaon Onion 31
93 Shevgaon Redgram 98
94 Shevgaon Soyabean 15
95 Shrirampur Bajra 15
96 Shrirampur Cotton 50
97 Shrirampur Maize 54
99 Shrirampur Soyabean 40
[85 rows x 3 columns]
另外,我尝试过 groupby 功能
Also, I have tried groupby function
Groupby=d.groupby(['Taluka', 'Crop'])['Village Name'].aggregate('count')
Groupby
Taluka Crop
Ahmednagar Bajra 102
Blackgram 3
Cotton 33
Greengram 86
Maize 77
Redgram 24
Soyabean 74
Akole Bajra 78
Blackgram 29
Greengram 9
Groundnut 162
Maize 91
Paddy 125
Soyabean 129
Jamkhed Bajra 86
Blackgram 87
Cotton 86
Greengram 87
Groundnut 13
Maize 87
Onion 47
Redgram 87
Soyabean 65
Karjat Bajra 119
Blackgram 111
Cotton 106
Greengram 118
Groundnut 34
Maize 119
Onion 107
...
Rahuri Bajra 44
Blackgram 1
Cotton 72
Greengram 20
Groundnut 8
Maize 54
Redgram 7
Soyabean 60
Sangamner Bajra 163
Blackgram 7
Cotton 39
Greengram 37
Groundnut 75
Maize 179
Redgram 46
Soyabean 137
Shevgaon Bajra 98
Blackgram 9
Cotton 112
Greengram 31
Groundnut 41
Maize 54
Onion 31
Redgram 98
Soyabean 15
Shrirampur Bajra 15
Cotton 50
Maize 54
Redgram 4
Soyabean 40
Name: Village Name, dtype: int64
现在,我想要这个数据,即艾哈迈德纳加尔块的 102 个村庄用于种植 bajra,艾哈迈德纳加尔块/塔卢卡的 33 个村庄用于种植棉花等.
now, I want this data i.e list of 102 villages for Ahmednagar block for crop bajra ,33 villages for Ahmednagar block/taluka for crop cotton..etc.
任何帮助都会帮助我解决这个问题,谢谢
Any help it will helps me to solve this,Thanks
推荐答案
我得到了答案.我使用的以下代码,
I got the answer. The following code I used,
import pandas as pd
import numpy as np
d=pd.read_excel("/media/desktop/District.xlsx","Data")
d.drop(d.loc[d['Area in hec']==0].index, inplace=True)
d.count()
def f(x):
x['No.of Villages'] = x.groupby(['Taluka','Crop'])['Area in hec'].transform('count')
x['No.of Villages'] = x['No.of Villages'].fillna('')
return x
df1 = d.groupby(['Taluka','Crop']).apply(f)
Final=df1.loc[df1['No.of Villages'] >= 10]
这篇关于在pandas中使用Groupby或pivot count函数后如何应用一些分析并获取原始数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!