如何在2019年的OpenCV中正确使用多线程?

本文介绍了如何在2019年的OpenCV中正确使用多线程?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我阅读了一些有关OpenCV中多线程的文章和帖子:

I read some articles and posts regarding Multithreading in OpenCV:

一方面，您可以构建具有TBB或OpenMP支持的OpenCV，它们在内部并行化OpenCV的功能.
另一方面，您可以自己创建多个线程并并行调用函数以在应用程序级别实现多线程.

但是我无法获得一致的答案，哪种多线程方法是正确的方法.

But I couldn't get consistent answers which method of multithreading is the right way to go.

关于TBB， answer 并进行了5次投票:

Regarding TBB, an answer from 2012 with 5 upvotes:

关于应用程序级别的多线程，请主持人对 answers.opencv.org 的评论:

Regarding multithreading on application level, an comment from an moderator on answers.opencv.org:

但是另一个 answer 有3票赞成声明:

But another answer with 3 upvotes is stating:

问题描述:

因此，我认为在应用程序级别使用(多)线程至少可以.但是当我长时间运行程序时，遇到了奇怪的性能问题.

Problem Description:

So I thought it was at least okay to use (multi)threading on application level. But I encountered strange performance problems when running my program for longer time periods.

研究了这些性能问题后，我创建了以下最小，完整且可验证的示例代码:

After investigating these performance problems I created this minimal, complete, and verifiable example code:

#include "opencv2\opencv.hpp"
#include <vector>
#include <chrono>
#include <thread>

using namespace cv;
using namespace std;
using namespace std::chrono;

void blurSlowdown(void*) {
    Mat m1(360, 640, CV_8UC3);
    Mat m2(360, 640, CV_8UC3);
    medianBlur(m1, m2, 3);
}

int main()
{
    for (;;) {
        high_resolution_clock::time_point start = high_resolution_clock::now();

        for (int k = 0; k < 100; k++) {
            thread t(blurSlowdown, nullptr);
            t.join(); //INTENTIONALLY PUT HERE READ PROBLEM DESCRIPTION
        }

        high_resolution_clock::time_point end = high_resolution_clock::now();
        cout << duration_cast<microseconds>(end - start).count() << endl;
    }
}

实际行为:

如果程序长时间运行，则打印的时间跨度为

Actual Behavior:

If the program is running for an extended period of time the time spans printed by

cout << duration_cast<microseconds>(end - start).count() << endl;

越来越大.

运行该程序约10分钟后，打印的时间跨度增加了一倍，这在正常波动下无法解释.

After running the program for around 10 minutes the printed timespans have doubled, which is not explainable with normal fluctuations.

我希望程序的行为是时间跨度保持恒定，即使它们可能比直接调用函数还要长.

The behavior of the program I would expect is that the time spans are staying pretty much constant, even tho they might be longer than calling the function directly.

直接调用该函数时:

[...]
for (int k = 0; k < 100; k++) {
    blurSlowdown(nullptr);
}
[...]

打印的时间跨度保持不变.

The printed time spans are staying constant.

不调用cv函数时:

void blurSlowdown(void*) {
    Mat m1(360, 640, CV_8UC3);
    Mat m2(360, 640, CV_8UC3);
    //medianBlur(m1, m2, 3);
}

打印的时间跨度也保持恒定.因此，将线程与OpenCV函数结合使用时肯定有问题.

The printed time spans are staying constant too. So there must be something wrong when using threading in combination with OpenCV functions.

我知道上面的代码无法实现实际的多线程，因此在调用 blurSlowdown()函数的同时只有一个线程处于活动状态.
我知道创建线程并随后清理它们并不是免费的，并且比直接调用该函数要慢.
不是关于代码通常很慢.问题是打印时间跨度越来越长 .
该问题与 medianBlur()函数无关，因为它是在其他函数中发生的，例如 erode()或 blur()也是.
该问题在Mac上用clang ++复制，请参见@Mark Setchell的评论
使用调试库而不是发行版时，该问题得到了放大

I know that the code above does NOT achieve actual multithreading there will only be one thread active at the same time that is calling the blurSlowdown() function.
I know that creating threads and and cleaning them up afterwards is not coming free and will be slower than calling the function directly.
It is NOT about that the code is slow in general. The problem is that the printed time spans are getting longer and longer over time.
The problem is not related to the medianBlur() function since it happens on other with other functions like erode() or blur() too.
The problem was reproduced under Mac under clang++ see comment by @Mark Setchell
The problem is amplified when using the debug library instead of the release

Windows 10 64位
MSVC编译器
OpenCV 3.4.2官方二进制文件

在OpenCV的应用程序级别上可以使用(多)线程吗?
如果是，为什么我的程序在时间上会在 Growing 上方显示时间跨度?
如果没有，为什么使用OpenCV，则认为线程安全，请说明如何解释基里尔·科尼亚科夫(Kirill Kornyakov)的声明代替
2019年的TBB/OpenMP是否受到广泛支持?
如果是，那么什么能提供更好的性能，应用程序级别的多线程(如果允许)或TBB/OpenMP?

Is it okay to use (multi)threading on application level with OpenCV?
If yes, why are the time spans printed by my program above GROWING over time?
If no, why is OpenCV then considered thread safe and please explain how to interpret the statement from Kirill Kornyakov instead
Is TBB / OpenMP in 2019 now widely supported?
If yes, what offers better performance, multithreading on application level(if allowed) or TBB / OpenMP?

推荐答案

首先，感谢您清楚说明问题.

First of all, thank you for the clarity of the question.

问:是否可以在OpenCV的应用程序级别上使用(多)线程?

Q: Is it okay to use (multi)threading on application level with OpenCV?

A:是的，在OpenCV的应用程序级别上使用多线程是完全可以的，除非并且直到您正在使用可以利用多线程的功能(例如模糊，色彩空间更改)，在这里您可以拆分将图像分为多个部分，并在整个部分中应用全局功能，然后重新组合以提供最终输出.

A: Yes it is totally ok to use multithreading on application level with OpenCV unless and until you are using functions which can take advantage of multithreading such as blurring, colour space changing, here you can split the image into multiple parts and apply global functions throughout the divided part and then recombine it to give the final output.

在某些函数中，例如Hough，pca_analysis在将它们应用于分割的图像部分然后重新组合时无法给出正确的结果，因此在应用程序级别对这些函数应用多线程可能无法给出正确的结果，因此不应该这样做.

In some functions such as Hough, pca_analysis which cannot give correct results when they are applied to divided image sections and then recombined, applying multithreading on application level to such functions may not give correct results and thus should not be done.

正如πάνταῥεῖ提到的那样，多线程的实现不会给您带来优势，因为您是在for循环本身中加入线程的.我建议您使用Promise和Future对象(如果您想使用示例，请在注释中告诉我，我将分享该片段.

As πάντα ῥεῖ mentioned, your implementation of multithreading will not give you an advantage because you are joining the thread in the for loop itself. I would suggest you use promise and future objects(If you want an example of how to, let me know down in the comments, I will share the snippet.

下面的答案进行了大量研究，感谢您提出问题，它确实有助于我在多线程知识中添加信息:)

问:，如果是，为什么我的程序在一段时间内在GROWING上方显示了时间跨度?

Q: If yes, why are the time spans printed by my program above GROWING over time?

A:经过大量研究，我发现创建和销毁线程会占用大量CPU和内存资源.当我们初始化线程时(在您的代码中此行: thread t(blurSlowdown，nullptr); )，将一个标识符写入该变量所指向的内存位置，并且该标识符使我们能够引用线程.现在在您的程序中，您正在以很高的速率创建和销毁线程，这就是发生的事情，有一个分配给程序的线程池，我们的程序可以通过该线程池运行和销毁线程，我将其简短地介绍一下以下说明:

A: After a lot of research I found out that creating and destroying threads takes a lot of CPU as well as memory resources. When we initialize a thread(in your code by this line: thread t(blurSlowdown, nullptr); ) an identifier is written to the memory location to which this variable points and this identifier enables us to refer to the thread. Now in your program you are creating and destroying thread at a very high rate, now this is what happens, there is a thread pool allocated to a program through which our program can run and destroy threads, I will keep it short and let's look at the explanation below:

创建线程时，这将创建一个指向该线程的标识符.
销毁线程时，该内存被释放

但是

在不久之后再次创建线程时，第一个线程被销毁，该新线程的标识符指向该线程中的新位置(除前一个线程之外的位置)池.

When you again create a thread after no time the first thread is destroyed, the identifier of this new thread points to a new location(location other than the previous thread) in the thread pool.

反复创建和销毁线程后，线程池已耗尽，因此 CPU被迫稍微减慢我们的程序周期，以便再次释放线程池为新线程腾出空间.

After repeatedly creating and destroying a thread, the thread pool is exhausted and so CPU is forced to slow down our program cycles a bit so that the thread pool is again freed for making space for a new thread.

Intel TBB和OpenMP非常擅长线程池管理，因此在使用它们时可能不会发生此问题.

Intel TBB and OpenMP are very good at thread pool management so this problem may not occur while using them.

问::现在是否广泛支持2019年的TBB?

Q: Is TBB in 2019 now widely supported?

A:是的，您可以在OpenCV程序中利用TBB，同时在构建OpenCV时也启用TBB支持.

A: Yes, you can take advantages of TBB in your OpenCV program while also turning on TBB support on building OpenCV.

这是一个在medianBlur中执行TBB的程序:

Here is a program for TBB implementation in medianBlur:

#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <iostream>
#include <chrono>

using namespace cv;
using namespace std;
using namespace std::chrono;

class Parallel_process : public cv::ParallelLoopBody
{

private:
    cv::Mat img;
    cv::Mat& retVal;
    int size;
    int diff;

public:
    Parallel_process(cv::Mat inputImgage, cv::Mat& outImage,
                     int sizeVal, int diffVal)
        : img(inputImgage), retVal(outImage),
          size(sizeVal), diff(diffVal)
    {
    }

    virtual void operator()(const cv::Range& range) const
    {
        for(int i = range.start; i < range.end; i++)
        {
            /* divide image in 'diff' number
               of parts and process simultaneously */

            cv::Mat in(img, cv::Rect(0, (img.rows/diff)*i,
                                     img.cols, img.rows/diff));
            cv::Mat out(retVal, cv::Rect(0, (retVal.rows/diff)*i,
                                         retVal.cols, retVal.rows/diff));

            cv::medianBlur(in, out, size);
        }
    }
};

int main()
{
    VideoCapture cap(0);

    cv::Mat img, out;

    while(1)
    {
        cap.read(img);
        out = cv::Mat::zeros(img.size(), CV_8UC3);

        // create 8 threads and use TBB
        auto start1 = high_resolution_clock::now();
        cv::parallel_for_(cv::Range(0, 8), Parallel_process(img, out, 9, 8));
        //cv::medianBlur(img, out, 9); //Uncomment to compare time w/o TBB
        auto stop1 = high_resolution_clock::now();
        auto duration1 = duration_cast<microseconds>(stop1 - start1);

        auto time_taken1 = duration1.count()/1000;
        cout << "TBB Time: " <<  time_taken1 << "ms" << endl;

        cv::imshow("image", img);
        cv::imshow("blur", out);
        cv::waitKey(1);
    }

    return 0;
}

在我的机器上，TBB实施大约需要10毫秒，而没有TBB的实施大约需要40毫秒.

On my machine, TBB implementation takes around 10ms and w/o TBB it takes around 40ms.

问:，如果是，什么可以提供更好的性能，在应用程序级别(如果允许)或TBB/OpenMP上提供多线程?

Q: If yes, what offers better performance, multithreading on the application level(if allowed) or TBB / OpenMP?

A:我建议您在POSIX多线程(pthread/thread)上使用TBB/OpenMP，因为TBB为您提供了更好的线程控制能力和更好的并行代码编写结构，并在内部管理pthread.如果使用pthread，则必须注意代码中的同步和安全性.但是使用这些框架会抽象出处理线程的需求，该线程可能会变得非常复杂.

A: I would suggest using TBB/OpenMP over POSIX multithreading(pthread/thread) because TBB offers you better control over thread + better structure for writing parallel code and internally it manages pthreads. In case if you use pthreads you will have to take care of sync and safety etc in your code. But using these framework abstracts the need for handling thread which may get very complex.

编辑:我检查了有关图像尺寸与要在其中划分处理的线程数不兼容的注释.因此，这是一个潜在的解决方法(尚未测试，但应该可以工作)，将图像分辨率缩放到兼容的尺寸，例如:

I checked the comments regarding the incompatibility of image dimensions with the number of thread in which you want to divide the processing. So here is a potential workaround(haven't tested but should work), scale the image resolution to the compatible dimensions like:

如果图像分辨率为485 x 647，请将其缩放为488 x 648，然后将其传递给 Parallel_process ，然后将输出缩放为原始尺寸458 x647.

If your image res is 485 x 647, scale it to 488 x 648 then pass it to Parallel_process then scale back the output to the original size of 458 x 647.

要比较TBB和OpenMP，请检查此答案

For comparison of TBB and OpenMP check this answer

这篇关于如何在2019年的OpenCV中正确使用多线程?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！