README.md 1.36 KB
Newer Older
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
1 2
# Workload Manager

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
3 4 5 6 7
On HIMster 2 / Mogon 2, load the following module first
```bash
module load lang/Python/3.6.6-foss-2018b
```

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
8 9
On a single node run with 
```bash
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
10
mpirun -n 4 ./wkmgr.py -v date
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
11 12
```

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
13
or for multi nodes:
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
14
```bash
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
15 16 17 18
salloc -p parallel --reservation=himkurs -A m2_himkurs -N 1 -t 1:00:00
module load math/SUNDIALS/2.7.0-intel-2018.03
module load lang/Python/3.6.6-foss-2018b
srun -n 20 ~/workload-manager/wkmgr.py -v ~/workload-manager/examples/LGS/PulsedLGS ~/workload- manager/examples/LGS/Run27_LaPalma_Profile_I50
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
19 20
```

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
21 22 23 24 25 26 27 28 29 30 31 32
with loader (untested on cluster so far) replace `wkmgr.py` with `wkloader.py`.

## Hints

### When to Use
- Single fast analysis step (eg your analysis file runs for only a minute)
- 1000's or more single analysis steps
- Usage of all cores in node exclusive partitions (Mogon 2, not on HIMster 2)


### Comparision

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
33
- Queue based work distribution, equal work distribution (in contrast to SLURM multiprog or [staskfarm](https://github.com/cmeesters/staskfarm) from [MogonWiki Node local scheduling](https://mogonwiki.zdv.uni-mainz.de/dokuwiki/node_local_scheduling))
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
34 35 36 37 38
- Usage of MPI: 
  - large connected jobs (>200 cores) are preferred by the job manager
  - efficiently supports both node local and multi node usage 
  - keeps environment , also in multi node sutiations (with GNU parallel only on node local)
- Usage of Python makes changes for users simple