Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single node pipeline processing #58

Open
manuparra opened this issue Jun 2, 2023 · 2 comments
Open

Single node pipeline processing #58

manuparra opened this issue Jun 2, 2023 · 2 comments

Comments

@manuparra
Copy link

Hi all, I've been reviewing the code and I see that the pipeline scripts are tightly coupled to their execution in slurm (of course this is a pipeline for this kind of clusters). We are doing some tests to run the pipeline without slurm, directly using singularity, but we see that there are references to variables and of course to procedures of the slurm structure, which makes this complicated.

Do you think it could be ported to a model without slurm?

@Jordatious
Copy link
Collaborator

Jordatious commented Jun 2, 2023

Hi @manuparra, it's generally on the road map to support other non-SLURM platforms, although I must admit that single node/VM support isn't a strong part of that, as multi-node processing is a crucial design of the pipeline. However, I believe one can use MPI on a single node/VM anyway, assuming there are enough cores to make it worthwhile. And a few tasks like tclean can also make use of multiple cores through OpenMP.

So, with a few tweaks, I have used the pipeline successfully on a single VM. However, that was before version 1.1, in which we introduced SPW splitting.

To do this, one can take the sbatch scripts written and simply run them as bash scripts, since the #SBATCH lines are commented out. The tweaks include removing the SLURM srun wrapper, which is easily done within the code, or otherwise making srun a script/alias or something understood by the system. Another tweak is to write your own submit_pipeline.sh script, where instead of using SLURM dependencies, you use the bash && operator, which will achieve a similar effect by running each job after the previous one successfully finishes, or discontinuing the job if the previous job crashes. So for example, if you first make all the sbatch scripts executable (e.g. with chmod +x validate_input.sbatch), using the default scripts in the scripts config parameter, you could write something like this:

./partition.sbatch && ./validate_input.sbatch && ./flag_round_1.sbatch && ./calc_refant.sbatch && ./setjy.sbatch && ./xx_yy_solve.sbatch && ./xx_yy_apply.sbatch && ./flag_round_2.sbatch && ./xx_yy_solve.sbatch && ./xx_yy_apply.sbatch && ./split.sbatch && ./quick_tclean.sbatch

And you might also want to look at redirecting the output to a log within your sbatch scripts, with something like this:

1> logs/validate_input.out 2> logs/validate_input.err

However, doing this with SPW splitting (where nspw > 1) would require a bit more thought. Writing a script to launch the custom submit_pipeline.sh scripts inside each SPW wouldn't be difficult, but the trick after that would be automatically running the post-cross-calibration scripts such as concatenation and further selfcal and science imaging. But it would be easy splitting that into a separate step following the previous example. It would just require further intervention by the user after the first cross-calibration steps have run over all SPWs.

Another trick that might be useful is the [-l --local] option, which bypasses SLURM/srun and builds the pipeline without it.

Are you thinking of doing some of this development yourself? What's the platform and software you're using? I'd be happy to walk you through doing some of these things if it's useful.

@Jordatious
Copy link
Collaborator

Hi @manuparra, I suppose you could consider a high level script that runs the custom submit_pipeline.sh scripts inside the SPW directories, and then also uses the && operator and || operator (e.g. when you wish to run concat, selfcal and science imaging even if some SPWs failed) to then run the final imaging steps in a similar way.

Of course, beyond that, you could abandon the bash approach altogether and use Python or something else, but that may require quite a bit of development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants