本文介绍了pandas Series.value_counts对于相等计数的字符串返回不一致的顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我运行以下代码时:

s = pandas.Series(['c', 'a', 'b', 'a', 'b'])
print(s.value_counts())

有时候我会得到:

a    2
b    2
c    1
dtype: int64

有时我会得到:

b    2
a    2
c    1
dtype: int64

例如等效计数返回的索引顺序不同.如果Series值是整数而不是字符串,我将无法重现.

e.g. the index order returned for equivalent counts is not the same. I couldn't reproduce this if the Series values are integers instead of strings.

为什么会发生这种情况?每次获取相同索引顺序的最有效方法是什么?

Why does this happen, and what is the most efficient way to get the same index order every time?

我希望它仍按计数降序排列,但要保持等价项目的顺序一致.

I want it to still be sorted in descending order by counts, but to be consistent in the order of equivalent-counts items.

我正在运行Python 3.7.0和pandas 0.23.4

I'm running Python 3.7.0 and pandas 0.23.4

推荐答案

给定一系列内容,您可以通过几种方法进行一致的排序:

You have a few options to sort consistently given a series:

s = pd.Series(['a', 'b', 'a', 'c', 'c'])
c = s.value_counts()

按索引排序

使用 pd.Series.sort_index :

sort by index

Use pd.Series.sort_index:

res = c.sort_index()

a    2
b    1
c    2
dtype: int64

按计数排序(任意关系)

对于递减计数,不执行任何操作,因为这是默认设置.否则,您可以使用 pd.Series.sort_values ,默认为ascending=True.无论哪种情况,您都不应假设如何处理联系.

sort by count (arbitrary for ties)

For descending counts, do nothing, as this is the default. Otherwise, you can use pd.Series.sort_values, which defaults to ascending=True. In either case, you should make no assumptions on how ties are handled.

res = c.sort_values()

b    1
c    2
a    2
dtype: int64

更有效地,您可以使用c.iloc[::-1]颠倒顺序.

More efficiently, you can use c.iloc[::-1] to reverse the order.

您可以使用 numpy.lexsort 来按计数排序,然后按然后排序.请注意相反的顺序,即首先使用 进行排序.

You can use numpy.lexsort to sort by count and then by index. Note the reverse order, i.e. -c.values is used first for sorting.

res = c.iloc[np.lexsort((c.index, -c.values))]

a    2
c    2
b    1
dtype: int64

这篇关于pandas Series.value_counts对于相等计数的字符串返回不一致的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 15:10