This is a Jupyter notebook. Here is a Jupyter tutorial.
This cell is a Markdown cell: it contains text rather than executable code.
You can also style text in a Markdown cell using html
Markdown cells in Jupyter support MathJax, so you can write lovely typeset mathematics, e.g., $ e^{i\pi} = -1$.
$$ e^x \equiv \sum_{j=1}^\infty \frac{x^j}{j!}. $$Here is a demo of Markdown and MathJax in Jupyter
This particular Jupyter notebook is an R notebook (there are also Python and Julia notebooks, among programming languages). This notebook is communicating with an R kernel and can execute R commands. R is a high-level statistics package with a versatile programming language.
Here is an introduction to R.
For a more thorough introduction to R, see http://statistics.berkeley.edu/computing/r-bootcamp https://github.com/berkeley-scf/r-bootcamp-2014/blob/master/schedule/schedule.pdf
Most of Jupyter's functionality is clear from its drop-down menus: commands to insert or delete cells, to execute cells, to clear output, etc.
One of the most useful features of Jupyter is its help functions and tab completion.
For instance, typing "tab" while you are typing the name of a function will give you a list of functions that start with the letters you have typed so far.
Please click "help" and take the User Interface Tour.
The rest of this notebook is a brief introduction to R (within Jupyter). We will see more of R in later sections of the course, as we encounter particular topics linear algebra, least squares, optimization, random number generation, the Bootstrap, etc.
# This is a code cell (but this line is a comment, because it starts with '#')
# This is an R notebook, so you can type R commands into this cell, for example:
print('Hello world!')
[1] "Hello world!"
# more R
# arithmetic
5+2
5^2
5/2
5/2; # lines ending with semicolon don't produce printed output
sqrt(-1) # no answer among the real numbers
sqrt(-1 + 0i) # R understands complex numbers, though
Warning message: In sqrt(-1): NaNs produced
[1] 0+1i
# variables
x <- 5; # assignment uses an right arrow, <-
x
6 -> y; # can assign in the other direction, too
y
x^2
sqrt(x)
# R has some pre-defined values
pi
# R represents "not a number" by "NA" (it also represents missing values by "NA," so be careful)
x <- NA;
x + 1 # arithmetic with "NA" gives "NA"
is.na(x) # check whether x is not a number
is.na(pi)
[1] NA
# ranges
1:5 # up
5:1 # down
seq(from=10, to=5, by=-1) # more general
seq(from=10, to=8, by = -0.1) # not just integers
seq(length=5, from=-1, by=0.5) # a different way to call "seq"
# careful about order of operations! Sequences are expanded _before_ arithmetic operations
1:6-1 # range is expanded into [1, 2, 3, 4, 5, 6], then scalar 1 is subtracted from each element
1:(6-1)
# catenation to make lists
c(1,2,3,4)
c(1,2,3,4,x)
# list of repeated values
rep(y,3)
rep(3:1,3) # defaults to putting 3 copies of the first argument end-to-end
rep(3:1, times=3)
rep(3:1, each=3) # copy each element 3 times
# how long is a list?
length(y)
# indexing a list
y[1] # R uses 1-based indexing
y[5]
y[6] # nothing there!
[1] NA
[1] NA
# slightly more advanced indexing
#
y <- 1:5;
y[-2] # y without its second element
y[2:4] # 2nd through 4th elements of y
y[y>2] # logical indexing
y[ y > 1 & y < 5] # more logical indexing
# building arrays
x <- 1:5;
y <- 6:10;
# column binding
cbind(x,y) # glue these together as columns
cbind(x,y,x^2)
x | y |
---|---|
1 | 6 |
2 | 7 |
3 | 8 |
4 | 9 |
5 | 10 |
x | y | |
---|---|---|
1 | 6 | 1 |
2 | 7 | 4 |
3 | 8 | 9 |
4 | 9 | 16 |
5 | 10 | 25 |
# Dataframes
# Dataframes are like arrays, but with names for the columns
x <- data.frame(var1=1:5, time=3:7, val=runif(5)); # "runif" generates uniform random numbers
x
x$val
var1 | time | val | |
---|---|---|---|
1 | 1 | 3 | 0.9480642 |
2 | 2 | 4 | 0.7340699 |
3 | 3 | 5 | 0.9362691 |
4 | 4 | 6 | 0.7923703 |
5 | 5 | 7 | 0.1712906 |
# row binding
rbind(x,y,log(y))
var1 | time | val | |
---|---|---|---|
1 | 1 | 3 | 0.9480642 |
2 | 2 | 4 | 0.7340699 |
3 | 3 | 5 | 0.9362691 |
4 | 4 | 6 | 0.7923703 |
5 | 5 | 7 | 0.1712906 |
6 | 6 | 7 | 8 |
7 | 1.791759 | 1.94591 | 2.079442 |
# changing the dimension of an array
x <- 1:20;
dim(x) <- c(4,5);
x # note this is done columnwise first
dim(x) <- c(5,4);
x
# you can turn x back into a list
x <- as.vector(x);
x
# an alternative
x <- array(1:20, dim=c(4,5));
x
1 | 5 | 9 | 13 | 17 |
2 | 6 | 10 | 14 | 18 |
3 | 7 | 11 | 15 | 19 |
4 | 8 | 12 | 16 | 20 |
1 | 6 | 11 | 16 |
2 | 7 | 12 | 17 |
3 | 8 | 13 | 18 |
4 | 9 | 14 | 19 |
5 | 10 | 15 | 20 |
1 | 5 | 9 | 13 | 17 |
2 | 6 | 10 | 14 | 18 |
3 | 7 | 11 | 15 | 19 |
4 | 8 | 12 | 16 | 20 |
# "recycling" in array construction
x <- 1:5;
array(x, dim=c(5,4))
array(x, dim=c(4,5))
1 | 1 | 1 | 1 |
2 | 2 | 2 | 2 |
3 | 3 | 3 | 3 |
4 | 4 | 4 | 4 |
5 | 5 | 5 | 5 |
1 | 5 | 4 | 3 | 2 |
2 | 1 | 5 | 4 | 3 |
3 | 2 | 1 | 5 | 4 |
4 | 3 | 2 | 1 | 5 |
# operations on lists
y <- 1:5;
y^2
sqrt(y)
log(y)
# DANGER: R "recycles" values in list arithmetic to make lengths match. This can have unexpected consequences
x <- 1:5;
y <- 10:8;
x + 2*y + 3 # this "should" complain because x and y have different lengths. But instead,
# R copies y 1.4 times to make [10, 9, 8, 10, 9], so the length matches that of
# x, the longest vector in the sum. That vector is then multiplied by 2.
# 3 is added as a scalar to every element
Warning message: In x + 2 * y: longer object length is not a multiple of shorter object length
# DANGER: R "recycles" values in list arithmetic to make lengths match. This can have unexpected consequences
x <- 1:6;
y <- 10:8;
x + 2*y + 3 # this "should" complain because x and y have different lengths. But instead,
# R copies y 1.4 times to make [10, 9, 8, 10, 9], so the length matches that of
# x, the longest vector in the sum. That vector is then multiplied by 2.
# 3 is added as a scalar to every element
# logical (Boolean) variables
T
F
x <- T;
x
1 == 2
1 > 2
1 >= 2
1 != 2
# logical operators
!T
!F
T | F # logical "or"
F | F
T & T # logical "and"
T & F
!(T & F)
# most numerical values can be cast as Booleans
0 & T
0 | T
1 & T
pi & T
# but not strings:
"hello" & T # fail
Error in "hello" & T: operations are possible only for numeric, logical or complex types
# sorting, max, min
x <- 10:5;
x
sort(x)
# what permutation puts the list in sorted order?
order(x)
#
max(x)
min(x)
max(x^2)
max(x)^2
# other useful functions on lists
x <- 1:10
sum(x) # sum of the elements
prod(x) # product of all the elements
mean(x) # mean of the elements
x[11] <- NA # now x has an element that's NA
sum(x) # fail
prod(x)
mean(x)
sum(x, na.rm = T) # sum, omitting NAs
prod(x, na.rm = T) # product, omitting NAs
mean(x, na.rm = T) # mean, omitting NAs
[1] NA
[1] NA
[1] NA
# utilities
help("c") # help on a particular function
c {base} | R Documentation |
This is a generic function which combines its arguments.
The default method combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed.
c(..., recursive = FALSE)
... |
objects to be concatenated. |
recursive |
logical. If |
The output type is determined from the highest type of the components
in the hierarchy NULL < raw < logical < integer < double < complex < character
< list < expression. Pairlists are treated as lists, but non-vector
components (such names and calls) are treated as one-element lists
which cannot be unlisted even if recursive = TRUE
.
c
is sometimes used for its side effect of removing attributes
except names, for example to turn an array into a vector.
as.vector
is a more intuitive way to do this, but also drops
names. Note too that methods other than the default are not required
to do this (and they will almost certainly preserve a class attribute).
This is a primitive function.
NULL
or an expression or a vector of an appropriate mode.
(With no arguments the value is NULL
.)
This function is S4 generic, but with argument list
(x, ..., recursive = FALSE)
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
unlist
and as.vector
to produce
attribute-free vectors.
c(1,7:9) c(1:5, 10.5, "next") ## uses with a single argument to drop attributes x <- 1:4 names(x) <- letters[1:4] x c(x) # has names as.vector(x) # no names dim(x) <- c(2,2) x c(x) as.vector(x) ## append to a list: ll <- list(A = 1, c = "C") ## do *not* use c(ll, d = 1:3) # which is == c(ll, as.list(c(d = 1:3)) ## but rather c(ll, d = list(1:3)) # c() combining two lists c(list(A = c(B = 1)), recursive = TRUE) c(options(), recursive = TRUE) c(list(A = c(B = 1, C = 2), B = c(E = 7)), recursive = TRUE)
# more utilities
ls() # what variables are in the namespace?
rm(y) # delete the variable y
y
ls()
Error in eval(expr, envir, enclos): object 'y' not found
# printing and flow control
x <- 3;
for (i in 1:x) { # first flow control
print(x^i)
}
i <- 1;
while (i <= 5) { # second flow control
print(x^i)
i <- i+1;
}
if (1 < 2) { # third flow control
print('1 is less than 2')
}
if (2 < 1) {
print('2 is less than 1')
} else {
print('2 is not less than 1')
}
if (2 < 0) {
print('2 is less than 0')
} else if (2 < 1) {
print('2 is less than 1')
} else {
print('2 is neither less than 1 nor less than zero')
}
[1] 3 [1] 9 [1] 27 [1] 3 [1] 9 [1] 27 [1] 81 [1] 243 [1] "1 is less than 2" [1] "2 is not less than 1" [1] "2 is neither less than 1 nor less than zero"
# R has very extensive plotting capabilities.
# The following examples are from the Jupyter R example page, https://try.jupyter.org/
x <- stats::rnorm(50)
opar <- par(bg = "white")
plot(x, ann = FALSE, type = "n") +
abline(h = 0, col = gray(.90)) +
lines(x, col = "green4", lty = "dotted") +
points(x, bg = "limegreen", pch = 21) +
title(main = "Simple Use of Color In a Plot",
xlab = "Just a Whisper of a Label",
col.main = "blue", col.lab = gray(.8),
cex.main = 1.2, cex.lab = 1.0, font.main = 4, font.lab = 3)
# color wheel example from try.jupyter.org
par(bg = "gray")
pie(rep(1,24), col = rainbow(24), radius = 0.9) +
title(main = "A Sample Color Wheel", cex.main = 1.4, font.main = 3) +
title(xlab = "(Use this as a test of monitor linearity)",
cex.lab = 0.8, font.lab = 3)
Next chapter: Mathematical Preliminaries