本文介绍了简单的多任务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有很多函数,它们彼此之间不依赖做它们的事情,而且每个函数都需要花费一些时间.因此,我认为如果可以使用多个线程,我将确保运行时安全.例如:

So I have a bunch of functions, that don't depend on each other to do their stuff, and each of them takes quite some time. So i thought i would safe runtime if I could use Multiple Threads. For example:

axial_velocity = calc_velocity(data_axial, factors_axial)
radial_velocity = calc_velocity(data_radial, factors_radial)
circumferential_velocity = calc_velocity(data_circ, factors_circ)

到目前为止,我所有的变量都是列表(也很长的列表)

All my variables so far are lists (pretty long lists too)

我必须对每个输入文件执行此操作,如果超过200个,则需要花费数小时...(我估计大约有1000个以上)

I have to do this for every input file, and this takes hours if there are more than 200... (I excpect about 1000+)

为了减少运行时间,我尝试尽可能少地检查计算数据(尤其是完整性检查),这很有帮助,但是下一个改进是为每个数据集使用一个线程.

To reduce the runtime I tried to check compute the data as little as possible (especially sanity checks) which helped greatly, but the next improvement would be using one thread for each set of data.

我已经尝试过这样的事情(过于简单化):

I've tried something like this (oversimplyfied):

from multiprocessing import Pool

def calc_velocity(data, factor):
    buffer_list = []
    for index, line in enumerate(data):
        buffer_list.append(data[index] * factor[index])
    return buffer_list

data_axial = [1, 2, 3]
factors_axial = [3, 2, 1]

if __name__ == '__main__':
    p = Pool(4)
    axial_velocity = p.map(calc_velocity, args = (data_axial, factors_axial))

和:

from multiprocessing import Process


def calc_velocity(data_pack):
    data = []
    factor = []
    data.extend(data_pack[0])
    factor.extend(data_pack[1])
    buffer_list = []
    for index, line in enumerate(data):
        buffer_list.append(data[index] * factor[index])
    return buffer_list


data_axial = [1, 2, 3]
factors_axial = [3, 2, 1]

if __name__ == '__main__':
    data_pack = []
    data_pack.append(data_axial)
    data_pack.append(factors_axial)
    p = Process(target = calc_velocity, args = data_pack)
    p.start()
    p.join()
    print p

这些都不起作用,我不知道如何使它们起作用.

None of these work, and I can't figure out how to make them work.

推荐答案

如果结果完成后就不需要结果,那么简单的multiprocessing.Pool.map()足以将您的任务分成要运行的单独进程并行,例如:

If you don't need the results as soon as they are completed, a simple multiprocessing.Pool.map() is more than enough to separate your task into separate processes to run in parallel, e.g.:

import multiprocessing

def worker(args):  # a worker function invoked for each sub-process
    data, factor = args[0], args[1]  # Pool.map() sends a single argument so unpack them
    return [e * factor[i] for i, e in enumerate(data)]

if __name__ == "__main__":  # important process guard for cross-platform use
    calc_pool = multiprocessing.Pool(processes=3)  # we only need 3 processes
    data = (  # pack our data for multiprocessing.Pool.map() ingestion
        (data_axial, factors_axial),
        (data_radial, factors_radial),
        (data_circ, factors_circ)
    )
    # run our processes and await responses
    axial_velocity, radial_velocity, circumferential_velocity = calc_pool.map(worker, data)

但是,问题中涉及到的部分在于提示您有大量数据要传递-当Python使用多处理时,它不共享其内存,而至少在具有fork的系统上,可以使用写时复制优化功能,在进程之间传递数据总是会调用极其缓慢的pickle-unpickle例程来打包和发送数据.

However, the concerning part in your question lies in the hint that you have a large sum of data to pass around - when Python uses multiprocessing it doesn't share its memory, and while at least on systems with fork it can use copy-on-write optimization, passing data between processes always invokes an extremely slow pickle-unpickle routine to pack and send the data.

因此,请确保交换的数据量最少-例如,如果要从文件加载data_axialfactors_axial,最好发送文件路径并让进程加载/解析文件本身,而不是在主进程中加载​​文件,然后发送已加载的数据.

For that reason, make sure the amount of data you exchange is minimal - for example, if you were loading data_axial and factors_axial from a file, better just send the file path(s) and let the worker() process load/parse the file(s) itself than load the file in your main process and then send over the loaded data.

如果您需要经常(随机)访问子流程中的大量(可变)共享数据,建议您使用一些内存数据库来完成任务,例如 Redis .

If you need to frequently (and randomly) access large amounts of (mutable) shared data in your sub-processes, I'd suggest you to use some in-memory database for the task, like Redis.

这篇关于简单的多任务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-30 02:32