本文介绍了Python:集群作业管理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在具有两个阶段的计算集群(slurm)上运行python脚本,它们是顺序的.我编写了两个python脚本,一个用于阶段1,另一个用于阶段2.每天早上,我检查所有第1阶段的工作是否都以视觉方式完成.只有到那时,我才开始第2阶段.

I am running python scripts on a computing cluster (slurm) with two stages and they are sequential. I wrote two python scripts, one for Stage 1 and another for Stage 2. Every morning I check if all Stage 1 jobs are completed visually. Only then, I start Stage 2.

通过在单个python脚本中组合所有阶段和作业管理,是否存在更优雅/自动的方式?我如何知道工作是否完成?

Is there a more elegant/automated way by combining all stages and job management in a single python script? How can I tell if the job has completed?

工作流程类似于以下内容:

The workflow is similar to the following:

while not job_list.all_complete():
    for job in job_list:
        if job.empty():
            job.submit_stage1()

        if job.complete_stage1():
            job.submit_stage2()

    sleep(60)

推荐答案

您有以下几种做法:

06-29 16:16