水桶之类的复杂性如何为O（n + k）的，如果我们用链表实现的桶？

本文介绍了水桶之类的复杂性如何为O（n + k）的，如果我们用链表实现的桶？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我很好奇，为什么桶排序为O（N + K）运行时，如果我们使用的是链表实现的桶。举例来说，假设我们有这个输入：

I am curious about why bucket sort has a runtime of O(n + k) if we use buckets implemented with linked lists. For example, suppose that we have this input:

n = no of element= 8
k = range = 3

array = 2,2,1,1,1,3,1,3

该桶将是这样的：

The buckets will look like this:

1: 1 -> 1 -> 1 -> 1
2: 2 -> 2
3: 3 -> 3

花费的总时间插入这些桶是O（n），假设我们存储尾指针的链表。

The total time spent inserting into these buckets is O(n), assuming that we store a tail pointer in the linked lists.

有关删除，我们必须去每个桶，然后删除该桶的每个节点。因此，复杂度应该是O（K *的桶链表的平均长度），因为我们遍历每个链表。

For deleting we have to go to each bucket and then delete each node in that bucket. Hence complexity should be O(K * average length of link list of bucket) as we are traversing each linked list.

不过，我读了桶排序的复杂度为O（N + K）。为什么不同意这种说法与我的分析？请纠正我，因为我仍然在学习计算复杂性。

However, I read that bucket sort's complexity is O(n + k). Why doesn't this agree with my analysis? Please correct me as I am still learning computational complexity.

推荐答案

您的分析的几乎的正确的，但有一个重要的细节，你失踪了。

Your analysis is almost correct, but there's an important detail that you're missing.

现在，你是正确的，在整个输入数组迭代的元素分布到桶花费时间为O（n）。但是，你稍微偏离的时候，你说的时间组装阵列所需的总金额为O（K *（每桶元素）的平均数）。请注意，因为有n个元素和k桶，这将出来O（K *（N / K））= O（N），为O（n）的总运行时间。要知道为什么真正的答案是O（n + k）的，我们需要更仔细地看看这个大O的术语。

Right now, you are correct that iterating across the input array to distribute the elements into buckets takes time O(n). However, you are slightly off when you say that the total amount of time required to assemble the array is O(k * (average number of elements per bucket)). Note that because there are n elements and k buckets, this would come out to O(k * (n / k)) = O(n), for a total runtime of O(n). To see why the real answer is O(n + k), we need to look more carefully at that big-O term.

对于初学者来说，你是绝对正确的时间，你花费在每个桶的平均金额为O（N / K）。那你说，既然有k个桶，总运行时间则是O（K *（N / K））= O（N）。然而，这是不正确的：尤其是不可以真的，K * O（N / K）= O（N）。这样做的原因是，术语O（N / K）则躲在一个常数因子。当你访问的每个水桶和一起来看看它包含的元素，它并不需要完全N / K的时间，或N / K的时间甚至一些常数倍。例如，如果桶是空的怎么办？在这种情况下，你仍然花费一定的时间在看斗，因为你必须确定，你不应该迭代它的元素。因此，每个桶所需的时间更精确的重新presentation是类似于C （N / K）+ C ，其中c 0 和C 是特定于实现的常量。这当然pression，当然，O（N / K）。

For starters, you are absolutely right that the average amount of time that you spend on each bucket is O(n / k). You then say that since there are k buckets, the total runtime is then O(k * (n / k)) = O(n). However, this is incorrect: in particular, it is not true that k * O(n / k) = O(n). The reason for this is that the term O(n / k) is hiding a constant factor. When you visit each bucket and take a look at the elements it contains, it doesn't take exactly n / k time, or even some constant multiple of n / k time. For example, what happens if the bucket is empty? In that case, you're still spending some amount of time looking at the bucket, since you have to determine that you shouldn't iterate over its elements. Thus a more accurate representation of the time required per bucket is something like c(n / k) + c, where c and c are implementation-specific constants. This expression is, of course, O(n / k).

美中不足的是，当你被K乘以这个EX pression得到工作完成总量会发生什么。如果计算

The catch is what happens when you multiply this expression by k to get the total amount of work done. If you compute

K *（C （N / K）+ C ）

您获得

C N + C个 1 K

正如你所看到的，这个前pression直接取决于K，所以总的运行时间为O（n + k）。

As you can see, this expression depends directly on k, so the total runtime is O(n + k).

有一个更直接的方式，在这个结果到达将是看看code的桶排序的第二个步骤，它看起来是这样的：

A more direct way to arrive at this result would be to look at the code for the second step of the bucket sort, which looks like this:

For each bucket b:
    For each element x in b:
        Append x to the array.

多少工作是整体做了什么？那么，有k个不同的桶，因此最外层循环必须至少需要O（k）的时间，因为我们来看看在每个桶。内，内循环将执行一个总为O（n）整体倍的，因为总分布在水桶n个元素的存在。由此，我们可以得到O（N + K）的总运行时间。

How much work is done overall? Well, there are k different buckets, so the outermost loop must take at least O(k) time, because we have to look in each bucket. Inside, the inner loop will execute a total of O(n) times overall, because there are a total of n elements distributed across the buckets. From this, we get the O(n + k) total runtime.

的原因，这是重要的是，它意味着如果尝试做一个桶排序与一个巨大的料斗数量（例如，大于n大得多），运行时可能是由扫描整个所有的所需要的时间占主导地位水桶找你实际使用的水桶，即使大部分都是空的。这基数排序是非常有用的原因是，它采用桶多次重复那种那里只有两个水桶，其中O（n）的运行时间为O（n + 2）=。因为你只需要做，而不是○○（LG U）本次迭代（其中U是数组中的最大值），运行时间为O（n LG U）（N + U）你会得到斗排序，这是非常糟糕的。

The reason that this is important is that it means that if you try doing a bucket sort with a huge number of buckets (say, much greater than n), the runtime might be dominated by the time required to scan over all the buckets looking for the buckets that you actually used, even if most of them are empty. The reason that radix sort is useful is that it uses multiple iterations of bucket sort where there are only two buckets, which runs in time O(n + 2) = O(n). Since you only need to do O(lg U) iterations of this (where U is the maximum value in the array), the runtime is O(n lg U) instead of the O(n + U) you'd get from bucket sort, which is much worse.

希望这有助于！

这篇关于水桶之类的复杂性如何为O（n + k）的，如果我们用链表实现的桶？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！