Jupyter notebooks are a way that you can have code, text, images, and math all live together in harmony. This concept can be called "Literate Proramming", where all code that is written also has comments or an explanation of why it was written (which is not always obvious). In particular, the Jupyter notebook allows you to write some code, run (evaluate) it, and see the output all in once. This is called a "read-eval-print loop" or REPL.
By the end of this notebook, you will have...
biom262
specific environment that has only your specific packagesssh ucsd-train##@tscc.sdsc.edu
There's some issue with Mac El Capitan where you keep getting asked for the password. To fix this, you'll need to do ssh-add
to add your private key again:
ssh-add ~/.ssh/biom262_rsa.pri
Remember that this means that this file ~/.ssh/biom262_rsa.pri
has to already exist. If not, you will need to follow these instructions again to set it up.
I suggest adding this command to your ~/.bash_profile
:
nano ~/.bash_profile
And add the line to the end of the file:
ssh-add ~/.ssh/biom262_rsa.pri
So this will get run EVERY time you open a new terminal window or tab and you (shouldn't) ever have to do the stupid ssh-add
thing ever again!
Download the Anaconda Python/R package manager using wget
(web-get). The link below is from the Anaconda downloads page. This takes some time..
wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda3-2.4.1-Linux-x86_64.sh
To install Anaconda, run the shell script with bash (this will take some time). It will ask you a bunch of questions, and use the defaults for them (press enter for all)
bash Anaconda3-2.4.1-Linux-x86_64.sh
If you get this warning:
WARNING:
You currently have a PYTHONPATH environment variable set. This may cause
unexpected behavior when running the Python interpreter in Anaconda3.
For best results, please verify that your PYTHONPATH only points to
directories of packages that are compatible with the Python interpreter
in Anaconda3: /home/ucsd-train46/anaconda3
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/ucsd-train46/.bashrc ? [yes|no]
[no] >>>
Type "yes
" that you do want Anaconda3 to get installed.
Anaconda may say that you need to open another terminal window to activate it - don't listen to it. Follow the instructions below to activate.
IMPORTANT: Make sure this command truly added lines to your ~/.bashrc
file
This has added the folder ~/anaconda3
to your system and added stuff to your $PATH
variable in ~/.bashrc
, but your current $PATH
variable has not been updated, and therefore the terminal has no idea where this newfangled thing is. If you try to do any conda
command, you'll get an error:
[ucsd-train12@tscc-login2 ~]$ conda --help
-bash: conda: command not found
To activate conda
, use source
on your .bashrc
:
source ~/.bashrc
Check your python version:
$ python -V
Python 3.5.1 :: Anaconda 2.4.1 (64-bit)
If you do python -v
(little "v") by accident, exit Python with:
>>> quit()
$
And make sure your Python is pointing to the Anaconda Python:
$ which python
/home/ucsd-train##/anaconda3/bin/python
For biom262
, we'll want to install these additional packages to Anaconda:
seaborn
(nicer plots in Python)Create a biom262
-specific conda
environment. By specifying these packages, we're also specifying all their dependencies (the other packages that each of these requires)
conda install --channel r r r-irkernel seaborn
Here's that big command broken down:
conda
- the base command (like how git
was the base command you used for git stuff). Every conda
subcommand is actually conda-subcommand
e.g. conda-create
under the hood, but we use it with just the spaces for convenience.install
- The conda subcommand to create an environment--channel r
- A "channel" is a URL to a folder that contains packages that you can install. Anaconda doesn't come with the R channel by default so we have to specify it here.r r-irkernel seaborn
- The packages to install.The output is quite big, it will look something like this:
$ conda install --channel https://conda.anaconda.org/r r r-irkernel seaborn
Warning: could not import binstar_client (invalid token (pkg_resources.py, line 44))Fetching package metadata: ......
Solving package specifications: ...................................................................
Package plan for installation in environment /home/ucsd-train01/anaconda3:
The following packages will be downloaded:
package | build
---------------------------|-----------------
glib-2.43.0 | 2 7.4 MB r
decorator-4.0.6 | py35_0 6 KB defaults
numpy-1.10.2 | py35_0 5.8 MB defaults
pyzmq-15.1.0 | py35_0 782 KB defaults
requests-2.9.0 | py35_0 647 KB defaults
setuptools-19.1.1 | py35_0 348 KB defaults
cairo-1.12.18 | 6 594 KB defaults
conda-3.19.0 | py35_0 180 KB defaults
scipy-0.16.1 | np110py35_0 23.3 MB defaults
harfbuzz-0.9.35 | 6 1.1 MB r
pango-1.36.8 | 3 796 KB r
nbconvert-4.1.0 | py35_0 275 KB defaults
r-base-3.2.2 | 0 20.6 MB r
seaborn-0.6.0 | np110py35_0 257 KB defaults
r-3.2.2 | 0 2 KB r
r-base64enc-0.1_3 | r3.2.2_0 25 KB r
r-boot-1.3_17 | r3.2.2_0 575 KB r
r-cluster-2.0.3 | r3.2.2_0 466 KB r
r-codetools-0.2_14 | r3.2.2_0 45 KB r
r-digest-0.6.8 | r3.2.2_2 93 KB r
r-foreign-0.8_66 | r3.2.2_0 220 KB r
r-jsonlite-0.9.17 | r3.2.2_0 927 KB r
r-kernsmooth-2.23_15 | r3.2.2_0 84 KB r
r-lattice-0.20_33 | r3.2.2_0 698 KB r
r-magrittr-1.5 | r3.2.2_1 154 KB r
r-mass-7.3_45 | r3.2.2_0 1.0 MB r
r-nnet-7.3_11 | r3.2.2_0 99 KB r
r-repr-0.3 | r3.2.2_0 44 KB r
r-rpart-4.1_10 | r3.2.2_0 861 KB r
r-rzmq-0.7.7 | r3.2.2_3 60 KB r
r-spatial-7.3_11 | r3.2.2_0 122 KB r
r-stringi-1.0_1 | r3.2.2_0 10.7 MB r
r-survival-2.38_3 | r3.2.2_0 4.4 MB r
r-uuid-0.1_2 | r3.2.2_0 18 KB r
r-class-7.3_14 | r3.2.2_0 82 KB r
r-irdisplay-0.3 | r3.2.2_0 23 KB r
r-matrix-1.2_2 | r3.2.2_0 3.1 MB r
r-nlme-3.1_122 | r3.2.2_0 2.0 MB r
r-stringr-1.0.0 | r3.2.2_0 78 KB r
r-evaluate-0.8 | r3.2.2_0 39 KB r
r-mgcv-1.8_9 | r3.2.2_0 1.8 MB r
r-irkernel-0.5 | r3.2.2_1 71 KB r
r-recommended-3.2.2 | r3.2.2_0 707 B r
------------------------------------------------------------
Total: 89.7 MB
The following NEW packages will be INSTALLED:
cairo: 1.12.18-6 defaults
glib: 2.43.0-2 r
harfbuzz: 0.9.35-6 r
libgcc: 4.8.5-1 r
ncurses: 5.9-4 r
pango: 1.36.8-3 r
pcre: 8.31-0 defaults
pixman: 0.32.6-0 defaults
r: 3.2.2-0 r
r-base: 3.2.2-0 r
r-base64enc: 0.1_3-r3.2.2_0 r
r-boot: 1.3_17-r3.2.2_0 r
r-class: 7.3_14-r3.2.2_0 r
r-cluster: 2.0.3-r3.2.2_0 r
r-codetools: 0.2_14-r3.2.2_0 r
r-digest: 0.6.8-r3.2.2_2 r
r-evaluate: 0.8-r3.2.2_0 r
r-foreign: 0.8_66-r3.2.2_0 r
r-irdisplay: 0.3-r3.2.2_0 r
r-irkernel: 0.5-r3.2.2_1 r
r-jsonlite: 0.9.17-r3.2.2_0 r
r-kernsmooth: 2.23_15-r3.2.2_0 r
r-lattice: 0.20_33-r3.2.2_0 r
r-magrittr: 1.5-r3.2.2_1 r
r-mass: 7.3_45-r3.2.2_0 r
r-matrix: 1.2_2-r3.2.2_0 r
r-mgcv: 1.8_9-r3.2.2_0 r
r-nlme: 3.1_122-r3.2.2_0 r
r-nnet: 7.3_11-r3.2.2_0 r
r-recommended: 3.2.2-r3.2.2_0 r
r-repr: 0.3-r3.2.2_0 r
r-rpart: 4.1_10-r3.2.2_0 r
r-rzmq: 0.7.7-r3.2.2_3 r
r-spatial: 7.3_11-r3.2.2_0 r
r-stringi: 1.0_1-r3.2.2_0 r
r-stringr: 1.0.0-r3.2.2_0 r
r-survival: 2.38_3-r3.2.2_0 r
r-uuid: 0.1_2-r3.2.2_0 r
seaborn: 0.6.0-np110py35_0 defaults
The following packages will be UPDATED:
conda: 3.18.8-py35_0 defaults --> 3.19.0-py35_0 defaults
decorator: 4.0.4-py35_0 defaults --> 4.0.6-py35_0 defaults
nbconvert: 4.0.0-py35_0 defaults --> 4.1.0-py35_0 defaults
numpy: 1.10.1-py35_0 defaults --> 1.10.2-py35_0 defaults
pyzmq: 14.7.0-py35_1 defaults --> 15.1.0-py35_0 defaults
requests: 2.8.1-py35_0 defaults --> 2.9.0-py35_0 defaults
scipy: 0.16.0-np110py35_1 defaults --> 0.16.1-np110py35_0 defaults
setuptools: 18.5-py35_0 defaults --> 19.1.1-py35_0 defaults
Proceed ([y]/n)?
Press "y" to proceed
Fetching packages ...
r-base64enc-0. 100% |######################################################| Time: 0:00:00 347.71 kB/s
r-digest-0.6.8 100% |######################################################| Time: 0:00:01 81.73 kB/s
r-jsonlite-0.9 100% |######################################################| Time: 0:00:18 52.38 kB/s
r-magrittr-1.5 100% |######################################################| Time: 0:00:00 436.33 kB/s
r-repr-0.3-r3. 100% |######################################################| Time: 0:00:00 75.39 kB/s
r-rzmq-0.7.7-r 100% |######################################################| Time: 0:00:00 294.29 kB/s
r-stringi-1.0_ 100% |######################################################| Time: 0:00:03 3.71 MB/s
r-uuid-0.1_2-r 100% |######################################################| Time: 0:00:00 247.64 kB/s
r-irdisplay-0. 100% |######################################################| Time: 0:00:00 329.61 kB/s
r-stringr-1.0. 100% |######################################################| Time: 0:00:00 366.92 kB/s
r-evaluate-0.8 100% |######################################################| Time: 0:00:00 381.25 kB/s
r-irkernel-0.5 100% |######################################################| Time: 0:00:00 342.46 kB/s
Extracting packages ...
[ COMPLETE ]|#########################################################################| 100%
Linking packages ...
[ COMPLETE ]|#########################################################################| 100%
Start jupyter notebook server, where "####
" is some number larger than 1024 (this is for a unique "port" number - yes like a port for boats and ships - that your notebook will run on). The &
("ampersand") at the end is important, because it tells the Jupyter process to run in the background, so we can run other commands on top.
$ jupyter notebook --no-browser --port #### &
[1] 12583
(biom262)[ucsd-train12@tscc-login2 ~]$ [I 13:23:05.786 NotebookApp] Writing notebook server cookie secret to /home/ucsd-train12/.local/share/jupyter/runtime/notebook_cookie_secret
[I 13:23:06.291 NotebookApp] Serving notebooks from local directory: /home/ucsd-train12
[I 13:23:06.291 NotebookApp] 0 active kernels
[I 13:23:06.291 NotebookApp] The IPython Notebook is running at: http://localhost:7788/
[I 13:23:06.291 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
The [1] 12583
shows you the process number of the jupyter
process. You can convince yourself that it's really there by using ps
which shows you all processes:
$ ps
PID TTY TIME CMD
6593 pts/44 00:00:00 bash
12583 pts/44 00:00:02 jupyter-noteboo
12592 pts/44 00:00:00 links
14208 pts/44 00:00:00 ps
This can happen if you ran jupyter notebook
and killed it but wanted to run it again. This is because even though the jupyter notebook
process died, there's still a "zombie" (real computer term) process running. To kill it, do:
$ ps
PID TTY TIME CMD
6593 pts/44 00:00:00 bash
12583 pts/44 00:00:02 jupyter-noteboo
12592 pts/44 00:00:00 links
14198 pts/44 00:00:01 jupyter-noteboo
14208 pts/44 00:00:00 ps
See which processes are associated with jupyter
and then use kill -9
to stop them ("kill" stops the process and -9
means the meanest form like premeditated murder of the program)
$ kill -9 12583
$ kill -9 14198
If you're getting this screen:
Then you forgot the --no-browser
flag. Try again:
jupyter notebook --no-browser --port ####
If you get this error:
[ucsd-train21@tscc-login1 ~]$ jupyter notebook —no-browser —port 1224
[C 18:12:26.584 NotebookApp] No such file or directory: /home/ucsd-train21/—no-browser
That means that you forgot the extra dashes for --no-browser
and --port
. Try again with:
jupyter notebook —-no-browser —-port ####
NameError: name 'pkg_resources' is not defined
¶You may get this error:
$ jupyter notebook --no-browser --port 7788 &
[1] 47665
[ucsd-train01@tscc-login1 ~]$ Traceback (most recent call last):
File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/path.py", line 122, in <module>
import pkg_resources
File "/opt/biotools/bx-python/lib/python2.7/site-packages/distribute-0.6.10-py2.7.egg/pkg_resources.py", line 44
def _bypass_ensure_directory(name, mode=0777):
^
SyntaxError: invalid token
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ucsd-train01/anaconda3/bin/jupyter-notebook", line 4, in <module>
from notebook.notebookapp import main
File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/notebook/notebookapp.py", line 83, in <module>
from IPython.paths import get_ipython_dir
File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/IPython/__init__.py", line 48, in <module>
from .terminal.embed import embed
File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/IPython/terminal/embed.py", line 16, in <module>
from IPython.core.interactiveshell import DummyMod
File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 31, in <module>
from pickleshare import PickleShareDB
File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/pickleshare.py", line 41, in <module>
from path import path as Path
File "/home/ucsd-train01/anaconda3/lib/python3.5/site-packages/path.py", line 126, in <module>
except pkg_resources.DistributionNotFound:
NameError: name 'pkg_resources' is not defined
^C
[1]+ Exit 1 jupyter notebook --no-browser --port 7788
The issue is that there are multiple Pythons around and they don't know how to talk to each other. To get rid of this, do
export PYTHONPATH=
Which will empty your paths of possible python libraries.
*Note: These commands must be done on your home laptop in another terminal window, NOT on TSCC*
Now, back on your home laptop, open another tab in your terminal window. To send this notebook back to your laptop from TSCC (aka "tunneling"), use this command (replace ####
and username
with your own port and username). You will also need to replace tscc-login#
with either tscc-login1
or tscc-login2
, whichever you got randomly assigned to when you logged in.
ssh -NL ####:localhost:#### username@tscc-login#.sdsc.edu
The first time you connect, you'll get output like this. Say "yes" that you want to continue connecting. Computers are paranoid schizophrenic and freak out any time something new is happening.
MacBook:~ benlewis$ ssh -NL 1225:localhost:1225 ucsd-train21@tscc-login1.sdsc.edu
The authenticity of host 'tscc-login1.sdsc.edu (132.249.107.90)' can't be established.
RSA key fingerprint is ee:40:c3:c6:19:03:9d:29:23:e6:ee:82:80:02:87:9b.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'tscc-login1.sdsc.edu' (RSA) to the list of known hosts.
Write failed: Broken pipe
"¶If you see a "Write failed: Broken pipe
" output in your laptop terminal, that means you've lost connection to the server. It's nothing to worry about. Solution: Re-run your ssh
command (either logging in to TSCC or doing the tunneling) and you'll be all set.
bind: Address already in use
"¶If you try to do the tunneling and you see this kind of output:
MacBook:~ benlewis$ ssh -NL 1224:localhost:1224 ucsd-train21@tscc-login2.sdsc.edu
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 1224
Could not request local forwarding
That happens when you've run multiple of the ssh -NL ####:localhost:#### ucsd-train##@tscc-login#.sdsc.edu
commands, and your computer is getting confused because you're telling it to do multiple things with the same port number.
Solution: Close the tab and open a new one - start fresh! Try doing the ssh -NL ...
command again.
To get Jupyter notebook on your computer, you'll need to set up a "SSH Tunnel" that "listens" to that particular port, and thus gets the Jupyter notebook from TSCC.
So that you only need to have one Putty session open, we'll make a new TSCC Session. Create one with ucsd-train##@tscc-login#.sdsc.edu
, and call the session "TSCC Jupyter"
Go to "Connection > SSH > Auth." Click the checkbox next to "Allow agent forwarding" and add the "Putty Private Key" that you created with the biom262_rsa
file.
Go to "Connnection > SSH > Tunnels." Then:
####
for your source portlocalhost:####
for your DestinationYou should now see this:
So you don't have to do this every time... Save your settings! Go all the way back to the "Session" window and click "Save"
jupyter notebook
command¶You'll see this kind of output:
And now you can move on to the next step to view the notebooks! Go to http://localhost:####
in your browser (Chrome, Firefox, IE)
Connect to the jupyter notebook server http://localhost:####/
.
You should see a page that looks like this:
Start a new notebook using the dropdown menu in the top right of the screen: