TSCC houses the 640-core supercomputer as part of a resource sharing system which allows researchers to perform calculations and experiments when they need extra computing power.
Your first login session will familiarize you with the cluster, teach you how to do some useful tasks on the queue, and help you set up a common and useful directory structure shared.
Learning goals:
To check whether you the user are authorized to log into a server, the server has a few options. It can ask you for a password, which will have to match exactly, or it will check whether you are an "Authorized User" by looking at your ~/.ssh/
file and whether your private key ~/.ssh/id_rsa
matches its list of authorized keys in the servers authorized_keys
file. What underlies this concept, which is called "Public Key Cryptography," is cryptography, number theory, and hashing - learn more about it here.
Copy the biom262_rsa
private key emailed out to ~/.ssh
. To avoid errors, you may also need to add the key with the extension .pri
, which we will do as well.
cp ~/Downloads/biom262_rsa ~/.ssh
cp ~/Downloads/biom262_rsa ~/.ssh/biom262_rsa.pri
Use ssh-add
to add this private key to the list of keys ssh
looks at when matching you up with the accepted users to a server.
ssh-add ~/.ssh/biom262_rsa
ssh-add ~/.ssh/biom262_rsa.pri
(Instructions compiled from here1, Analyzing Next-Gen Seq (ANGUS) data workshop, and How To Forge)
Get the latest PuTTY executables (link to zip. It doesn't need to be installed so you can put the file on your desktop or wherever you like to put your programs. You'll need the putty.exe
, pscp.exe
, and pageant.exe
files.
First, open PUTTYGEN.EXE. Click "Load" to load an existing private key file, and select the Biom262_RSA file as the key. Save the key as a private key, and don't set a passphrase.
Next, open PAGEANT.EXE. Check your taskbar on the bottom-right (you may need to click the triangle to expand the taskbar) and click the syspanel icon: Add the key you just generated using PUTTYGEN.EXE and it should appear on the Pageant Key List.
(proceed to the steps shown below now)
In your terminal, type the following (you'll need to replace "##
" with your number). This should not ask for a password. If it asks for a password, raise your hand.
$ ssh ucsd-train##@tscc.sdsc.edu
Rocks 6.2 (SideWinder)
Profile built 17:40 06-Jan-2016
Kickstarted 18:26 06-Jan-2016
TSCC Cluster Login Node
For information on using the TSCC, please visit http://idi.ucsd.edu/computing
By using the TSCC, you agree to the Acceptable Use Policy found on
http://idi.ucsd.edu/_files/TSCC-Acceptable-Use-Policy.pdf
PLEASE NOTE: a portion of the /oasis/tscc/scratch filesystem failed during
Tuesday's outage. We are working on repairs; in the meantime, we have
mounted the old (pre 12/7) filesystem read-only.
[ucsd-train##@tscc-login1 ~]$
You may get a message like the one below. This is expected for the first time you connect to any server. Press "Enter" (which will choose "yes", the first and default anwser)
The authenticity of host '192.168.0.100 (192.168.0.100)' can't be established.
RSA key fingerprint is 3f:1b:f4:bd:c5:aa:c1:1f:bf:4e:2e:cf:53:fa:d8:59.
Are you sure you want to continue connecting (yes/no)?
If you get this error:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: UNPROTECTED PRIVATE KEY FILE! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0644 for '/Users/olga/.ssh/biom262_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: /Users/olga/.ssh/biom262_rsa
Password:
Then you need to make the key not readable by "group" and "other". Do this with:
chmod go-r ~/.ssh/biom262_rsa
Which subtracts the "r" (reading) permission from "g" (group) and "o" (other).
Double-click the PuTTY exe.
Enter tscc.sdsc.edu
as the host (not lyorn.idyll.org
as above). You may get a warning like the one below. This is completely normal for when you first log on to a new server. Click "Yes"
You'll get a terminal like the one below. Make sure to log in with your "ucsd-train##
" username.
Phew, we made it to TSCC!
Create the base storage location for your code development (or just use your home area):
mkdir code
mkdir notebooks
mkdir data
ln -s /oasis/tscc/scratch/$USER $HOME/scratch
Now look at what's there in your home directory with ls -l
(the -l
stands for "long listing")
The output should look like this
total 10
drwxr-xr-x 2 ucsd-train12 biom262-group 2 Jan 4 11:57 code
drwxr-xr-x 2 ucsd-train12 biom262-group 2 Jan 4 11:57 data
drwxr-xr-x 2 ucsd-train12 biom262-group 2 Jan 4 11:57 notebooks
lrwxrwxrwx 1 ucsd-train12 biom262-group 32 Jan 4 11:57 scratch -> /oasis/tscc/scratch/ucsd-train12
Unix commands are written in "BASH (stands for "Bourne-again shell", where "Bourne shell" was a previous version but someone thought they could do better so they made BASH).
Set a BASH environment variable
export STR="hello world"
Access a variable
echo $STR
The most important environment variable is $PATH
. Folders in this path are automatically searched when looking for executable tools via auto-complete or which
echo $PATH
which programname.sh
Customize your BASH profile by editing your ~/.bashrc
file with nano
:
nano ~/.bashrc
This command is executed each time you log in to TSCC:
source ~/.bashrc
A convenient command to add to your .bashrc
is to do "long listing" of files with a few keystrokes:
alias ll='ls -lha'
To add this to your .bashrc
, type nano ~/.bashrc
(or some other editor that you prefer), which will open up a text-only editor, and add the line above. Then save and quit
( optional ) Additional details on BASH profile customization
When you log in to TSCC, you are connected to a "login node". When executing a task, you should always use an "execution node".
Write a small script to test with. The following command will create test_script.sh
if it doesn't exist already. Write echo "hello I am a test"
into the file.
$ nano test_script.sh
Now you should be able to look at the file with cat
and see the contents you just wrote.
$ cat test_script.sh
echo "hello I am a test"
To submit a script that you wrote, in this case called test_script.sh, to TSCC, for 10 minutes (time is in hours:minutes:seconds
)
$ qsub -q hotel -l nodes=1:ppn=1 -l walltime=0:10:00 test_script.sh
3962194.tscc-mgr.local
What happeneed is this: you typed some keystrokes on your laptop, which were sent to the "head node" on TSCC (the first node you get onto when you log on - it appears as tscc-login1
or tscc-login2
). A "node" is equivalent to a single computer.
Then, the head node told the job scheduleer to run your script, test_script.sh
for a maximum of 10 minutes, on one node (one computer) and with one processor (how many cores/pieces of the computer to use. Maximum is 16, but be nice and get 8 or less). In general, increase ppn
or processors per node first, rather than the nodes. This is because your program probably wants to use shared memory between the processes and this is harder to do across computers, and thus your code will be slower.
The nice thing about supercomputers is that you can submit a job, close your laptop, get on an airplane, and it will still be running!
Since we didn't specify a name for the job, two files will be created from the results of our script:
test_script.sh.o3962194 # o = output
test_script.sh.e3962194 # e = error
Where the .o
file captures the output sent via stdout
and the .e
file captures the output via stderr
. In this case the .e
file should be empty:
$ cat test_script.sh.e3962194
(No output)
And the .o
file should contain the text we wrote and the node this script was run on:
$ cat test_script.sh.o3962194
hello I am a test
Nodes: tscc-0-25
qsub
¶To submit interactive jobs (which you will need for running Jupyter notebooks), do:
qsub -I -q hotel -l nodes=1:ppn=1 -l walltime=0:30:00
To submit to the home-scrm queue, add -W group_list=scrm-group to your qsub command:
qsub -I -l walltime=0:30:00 -q hotel -W group_list=scrm-group
Check the status of your jobs:
qstat -u $USER
Check the status of your array jobs, you need to specify -t
to see the status of the individual array pieces.
qstat -t $USER
Example output:
tscc-mgr.local:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
3924211.tscc-mgr.local obotvinnik home-scr STDIN 21407 1 8 -- 168:00:00 R 133:31:05
Killing jobs
qdel 2006527
Kill an array job
qdel 2006527[]
Kill all your jobs
qdel $(qselect -u $USER)
Check the status of the queue (so you know which queues to NOT submit to!)
Example output is:
$ qstat -q
server: tscc-mgr.local
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
home-dkeres -- -- -- -- 2 0 -- E R
home-komunjer -- -- -- -- 0 0 -- E R
home-ong -- -- -- -- 2 0 -- E R
home-tg -- -- -- -- 0 0 -- E R
home-yeo -- -- -- -- 3 1 -- E R
home-visres -- -- -- -- 0 0 -- E R
home-mccammon -- -- -- -- 15 29 -- E R
home-scrm -- -- -- -- 1 0 -- E R
hotel -- -- 168:00:0 -- 232 26 -- E R
home-k4zhang -- -- -- -- 0 0 -- E R
home-kkey -- -- -- -- 0 0 -- E R
home-kyang -- -- -- -- 2 1 -- E R
home-jsebat -- -- -- -- 1 0 -- E R
pdafm -- -- 72:00:00 -- 1 0 -- E R
condo -- -- 08:00:00 -- 18 6 -- E R
gpu-hotel -- -- 336:00:0 -- 0 0 -- E R
glean -- -- -- -- 24 75 -- E R
gpu-condo -- -- 08:00:00 -- 16 36 -- E R
home-fpaesani -- -- -- -- 4 2 -- E R
home-builder -- -- -- -- 0 0 -- E R
home -- -- -- -- 0 0 -- E R
home-mgilson -- -- -- -- 0 4 -- E R
home-eallen -- -- -- -- 0 0 -- E R
----- -----
321 180
To show available processors
$ showbf
backfill window (user: 'ucsd-train12' group: 'biom262-group' partition: ALL) Mon Jan 4 11:51:55
1258 procs available for 6:41:01
1246 procs available for 6:46:49
1234 procs available for 6:47:55
1222 procs available for 6:58:56
1210 procs available for 7:03:56
1198 procs available for 7:07:21
1197 procs available for 7:53:57
1196 procs available for 2:22:28:47
1189 procs available for 3:21:36:13
1181 procs available for 3:21:36:37
1171 procs available for 7:18:01:48
1169 procs available for 7:18:02:38
1168 procs available for 11:13:35:19
1152 procs available for 11:13:38:46
1151 procs available for 11:13:39:00
1150 procs available for 11:13:39:06
1149 procs available for 11:13:39:08
1148 procs available for 11:13:39:20
1146 procs available for 12:16:17:21
1145 procs available for 12:16:24:43
1144 procs available for 12:19:30:54
1128 procs available for 12:19:32:45
1112 procs available for 12:19:47:11
1097 procs available for 13:02:59:20
1095 procs available for 18:00:46:12
1085 procs available for 18:00:52:42
1073 procs available for 18:01:18:11
1061 procs available for 19:08:20:06
1059 procs available for 32:23:59:07
1055 procs available for 39:06:02:29
1051 procs available for 39:08:29:56
1047 procs available with no timelimit
Show specs of all nodes (show first 20 lines for brevity)
$ pbsnodes -a | head 20
tscc-0-0
state = job-exclusive
np = 16
properties = rack0,ib,ibswitch1,mem64,hotel-node,ibgroup0,sandy
ntype = cluster
jobs = 0/3939246.tscc-mgr.local,1/3939246.tscc-mgr.local,2/3939246.tscc-mgr.local,3/3939246.tscc-mgr.local,4/3939246.tscc-mgr.local,5/3939246.tscc-mgr.local,6/3939246.tscc-mgr.local,7/3939246.tscc-mgr.local,8/3939246.tscc-mgr.local,9/3939246.tscc-mgr.local,10/3939246.tscc-mgr.local,11/3939246.tscc-mgr.local,12/3939246.tscc-mgr.local,13/3939246.tscc-mgr.local,14/3939246.tscc-mgr.local,15/3939246.tscc-mgr.local
status = rectime=1451937165,varattr=,jobs=3939246.tscc-mgr.local,state=free,netload=326963399719,gres=,loadave=1.00,ncpus=16,physmem=66068376kb,availmem=62016840kb,totmem=68116372kb,idletime=834178,nusers=1,nsessions=1,sessions=25478,uname=Linux tscc-0-0.sdsc.edu 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003
tscc-0-1
state = free
np = 16
properties = rack0,ib,ibswitch1,mem64,hotel-node,ibgroup0,sandy
ntype = cluster
jobs = 0/3940449[3162].tscc-mgr.local,1/3940449[3162].tscc-mgr.local,2/3940449[3162].tscc-mgr.local,3/3940449[3162].tscc-mgr.local,4/3940449[3162].tscc-mgr.local,5/3940449[3162].tscc-mgr.local,6/3940449[3162].tscc-mgr.local,7/3940449[3162].tscc-mgr.local
status = rectime=1451937126,varattr=,jobs=3940449[3162].tscc-mgr.local,state=free,netload=25069731960580,gres=,loadave=15.57,ncpus=16,physmem=66068376kb,availmem=52672300kb,totmem=68116372kb,idletime=10776,nusers=1,nsessions=1,sessions=60315,uname=Linux tscc-0-1.sdsc.edu 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003