README.md 1.53 KB
Newer Older
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
1 2
# Workload Manager

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
3 4 5 6 7
On HIMster 2 / Mogon 2, load the following module first
```bash
module load lang/Python/3.6.6-foss-2018b
```

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
8 9
On a single node run with 
```bash
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
10
mpirun -n 4 ./wkmgr.py -v date
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
11 12
```

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
13
or for multi nodes:
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
14
```bash
15
#reserve ressources
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
16
salloc -p parallel --reservation=himkurs -A m2_himkurs -N 1 -t 1:00:00
17 18

#load modules for demo analysis and MPI4Py
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
19 20
module load math/SUNDIALS/2.7.0-intel-2018.03
module load lang/Python/3.6.6-foss-2018b
21 22 23

#run
srun -n 20 ~/workload-manager/wkmgr.py -v -i ~/workload-manager/examples/LGS/Run27_LaPalma_Profile_I50 ~/workload-manager/examples/LGS/PulsedLGS
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
24 25
```

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
26 27 28 29 30 31 32 33 34 35 36 37
with loader (untested on cluster so far) replace `wkmgr.py` with `wkloader.py`.

## Hints

### When to Use
- Single fast analysis step (eg your analysis file runs for only a minute)
- 1000's or more single analysis steps
- Usage of all cores in node exclusive partitions (Mogon 2, not on HIMster 2)


### Comparision

Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
38
- Queue based work distribution, equal work distribution (in contrast to SLURM multiprog or [staskfarm](https://github.com/cmeesters/staskfarm) from [MogonWiki Node local scheduling](https://mogonwiki.zdv.uni-mainz.de/dokuwiki/node_local_scheduling))
Peter-Bernd Otte's avatar
Peter-Bernd Otte committed
39 40 41 42 43
- Usage of MPI: 
  - large connected jobs (>200 cores) are preferred by the job manager
  - efficiently supports both node local and multi node usage 
  - keeps environment , also in multi node sutiations (with GNU parallel only on node local)
- Usage of Python makes changes for users simple
44 45

- Only one disadvantage: Number of ranks fixed during runtime -- in contrast to SLRUM jobs.