Quantcast
Channel: Oliver C. Grant
Viewing all articles
Browse latest Browse all 25

Cluster Slurm Jobs Submission scontrol hold release sinfo

$
0
0

Submitting to specific nodes:

sbatch –exclude node[001-008] submit.GPU.thor.sh
sbatch –nodelist node010 submit.GPU.thor.sh

Holding jobs in the queue:

This allows you to have jobs in the queue that won’t run even there are resources available. Usually to let others go ahead of you.

scontrol hold $jobID (get the id from doing squeue and looking at leftmost column)
and then:
scontrol release $jobID

Checking status of nodes

sinfo (“mix” is working, “idle” is waiting, both “down” and “drain” are bad):

[oliver@thoreau 0.4.0_glycoproteinLys.pdb]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST 
defq* up infinite 5 down* node[003-007] 
defq* up infinite 1 drain node001 
defq* up infinite 3 mix node[002,009-010] 
mdaas up infinite 2 idle node[008,011]

Checking GPU status:

ssh node001
nvidia-smi
oliver@node009 ~]$ nvidia-smi 
Mon May 13 01:06:36 2024 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 On | 00000000:3B:00.0 Off | Off |
| 38% 64C P2 187W / 230W | 754MiB / 16125MiB | 97% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 5000 On | 00000000:5E:00.0 Off | Off |
| 33% 38C P2 64W / 230W | 466MiB / 16125MiB | 24% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Quadro RTX 5000 On | 00000000:AF:00.0 Off | Off |
| 33% 23C P8 7W / 230W | 0MiB / 16125MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Quadro RTX 5000 On | 00000000:D8:00.0 Off | Off |
| 33% 23C P8 7W / 230W | 0MiB / 16125MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 289931 C ...ps/amber20/bin/pmemd.cuda 751MiB |
| 1 N/A N/A 293344 C ...ps/amber20/bin/pmemd.cuda 463MiB |
+-----------------------------------------------------------------------------+

GPU-Util will tell you how much work it's doing. The processes should all be on separate GPUs i.e. 0 and 1, not 0 and 0.

 

 


Viewing all articles
Browse latest Browse all 25

Trending Articles