本文介绍了Python:集群作业管理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在具有两个阶段的计算集群(slurm)上运行python脚本,它们是顺序的.我编写了两个python脚本,一个用于阶段1,另一个用于阶段2.每天早上,我检查所有第1阶段的工作是否都以视觉方式完成.只有到那时,我才开始第2阶段.
I am running python scripts on a computing cluster (slurm) with two stages and they are sequential. I wrote two python scripts, one for Stage 1 and another for Stage 2. Every morning I check if all Stage 1 jobs are completed visually. Only then, I start Stage 2.
通过在单个python脚本中组合所有阶段和作业管理,是否存在更优雅/自动的方式?我如何知道工作是否完成?
Is there a more elegant/automated way by combining all stages and job management in a single python script? How can I tell if the job has completed?
工作流程类似于以下内容:
The workflow is similar to the following:
while not job_list.all_complete():
for job in job_list:
if job.empty():
job.submit_stage1()
if job.complete_stage1():
job.submit_stage2()
sleep(60)
推荐答案
您有以下几种做法:
- 使用 Slurm Python API 来管理作业
- 使用作业依赖项(在 sbatch手册页中搜索
-dependency
) - 具有第1阶段的提交脚本,完成后就为第2阶段提交作业
- 使用工作流程管理系统,例如
- Fireworks https://materialsproject.github.io/fireworks/
- Bosco https://osg-bosco.github.io/docs/
- Slurm管道 https://github.com/acorg/slurm-pipeline
- Luigi https://github.com/spotify/luigi
- use the Slurm Python API to manage the jobs
- use job dependencies (search for
--dependency
in the sbatch man page) - have the submission script for stage 1 submit the job for stage 2 when it finished
- use a workflow management system such as
- Fireworks https://materialsproject.github.io/fireworks/
- Bosco https://osg-bosco.github.io/docs/
- Slurm pipelines https://github.com/acorg/slurm-pipeline
- Luigi https://github.com/spotify/luigi
这篇关于Python:集群作业管理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!