Commit 3b182fa1 authored by Peter-Bernd Otte's avatar Peter-Bernd Otte

Improved First Steps Manual

parent 749f6819
......@@ -12,7 +12,7 @@ If at least one condition is met, then the use is recommended:
- Usage of MPI:
- large connected jobs (>200 cores) are preferred by the job manager
- efficiently supports both node local and multi node usage
- keeps environment , also in multi node sutiations (with GNU parallel only on node local)
- keeps environment, also in multi node situations (with GNU parallel only on node local)
- Usage of Python makes changes for users simple
- Only one disadvantage: Number of ranks fixed during runtime -- in contrast to SLRUM jobs.
......@@ -97,20 +97,29 @@ Complete [the installation steps](#installation) first, see above.
```
to enable Python 3.6 and MPI4Py support. You can also add this line to your `~/.bashrc` configuration file to speed up the process when you log in again.
2. Next, test the parameters for the workload-manager. To do so, run short tests (with the dry-run option) on the headnode. More examples with different parameters see the next chapter
* On a head node run with
```bash
./wkmgr.py -n [YOUR EXECUTABLE]
```
* Or reserve a dedicated node for this purpose first, eg
```bash
salloc -p devel -A m2_him_exp -N 1 -t 1:30:00
#or during a turorial
salloc -p parallel --reservation=himkurs -A m2_himkurs -N 1 -t 1:30:00
````
#and do some test runs like in the head node case.
3. Once you found the right launcher arguments, submit the job interactively with
2. Next, test the parameters for the workload-manager. To do so, run short tests (IMPORTANT: with the dry-run option) on the headnode. Please note, that no computation is allowed in the headnode, only tests.
On a head node run with
```bash
./wkmgr.py -n [YOUR EXECUTABLE]
```
[More examples with different parameters](#examples-and-faq) see the next chapter.
3. Once you found the right launcher arguments, run your job interactively.
First reserve some resources:
```bash
# Reserve a dedicated node for this purpose
salloc -p devel -A m2_him_exp -N 1 -t 1:30:00
#or ONLY during a turorial at HIM
#salloc -p parallel --reservation=himkurs -A m2_himkurs -N 1 -t 1:30:00
```
and wait until the command returns with a free node computer name. Log into this machine
```bash
ssh [NODENAME returned from salloc]
```
and run on this machine:
```bash
#load modules for demo analysis and MPI4Py
module purge
......@@ -118,10 +127,21 @@ Complete [the installation steps](#installation) first, see above.
module load lang/Python/3.6.6-foss-2018b
#run some example provided in the git repository
srun -n 20 ~/workload-manager/wkmgr.py -v -i ~/workload-manager/examples/LGS/Run27_LaPalma_Profile_I50 ~/workload-manager/examples/LGS/PulsedLGS
mpirun -n 20 ~/workload-manager/wkmgr.py -v -i ~/workload-manager/examples/LGS/Run27_LaPalma_Profile_I50 ~/workload-manager/examples/LGS/PulsedLGS
```
`mpirun` works in this context only on the node itself. If you are about to reserve more than a single node, you need to use `srun`.
`srun` works for all cases, for a single core up to the multinode scenario. The currently only drawback is the point, that it does not print the output from the called program. If you are in doubt, make first tests with `mpirun` and continue with `srun`.
Start your usage of `srun` again with first reserving resources, eg two full nodes
```bash
salloc -p parallel -A m2_him_exp -N 2 -t 1:30:00
```
Interactively in this context means, that you first allocate ressources and later does one or several run steps with `srun`.
4. Run your jobs scripted:
and run the following command from the headnode (yes, this differs from mpirun):
```bash
srun -n 32 ~/workload-manager/wkmgr.py -v -i ~/workload-manager/examples/LGS/Run27_LaPalma_Profile_I50 ~/workload-manager/examples/LGS/PulsedLGS
```
4. Once you are happy with the results from interactive submission of jobs, make your life easier and run your jobs scripted, non interactively:
```bash
#!/bin/bash
#-----------------------------------------------------------------
......@@ -171,7 +191,7 @@ Two options:
ssh {REPLACE WITH A COMPUTER NAME, eg z0477} "cat /proc/cpuinfo | grep processor | wc -l"
````
Note, that `/procs/cpuinfo` normally reports a number, which is twice as high as the number of cores. The effect comes from the point, that it treats [HyperThreating](https://en.wikipedia.org/wiki/Hyper-threading) in the same way as normal processors. Generally speaking, better **use only the number of cores** in your jobs.
Note, that `/procs/cpuinfo` normally reports a number, which is twice as high as the number of cores. The effect comes from the point, that it treats [Hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) in the same way as normal processors. Generally speaking, better **use only the number of cores** in your jobs.
#### Input / Output File Example
Task: Run the analysis binary for each input file in MyInputDirectory on 20 cores
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment