This notebook discusses how we can deal with functions $f:R^n \rightarrow R$, that is which take values in $R^n$ and give back a real number. An example would be $f(x,y) = x^2 + y^3$. For the most part we specialize to $n=2$.
For this task, we use the Plots
package for graphing. Additionally, we need the LinearAlgebra
package for some vector operations:
using Plots, LinearAlgebra, ForwardDiff
Plots.PyPlotBackend()
For functions from $R^2$ to $R$, we can visualize them in various ways. We discuss surface plots and contour plots
A plot of the set of points $(x ,y, f(x,y))$ is known as a surface plot. These may be graphed with the Plots
package.
For a numeric function, we need to create values for $x$, $y$ and then compute a grid of numbers for $f(x,y)$. For example,
f(x, y) = x^2 + y^2
xs = range(-5, stop=5, length=50)
ys = range(-5, stop=5, length=50)
zs = [f(x,y) for y in ys, x in xs]
surface(xs, ys, zs)
(The third line can be replaced by calling surface(xs, ys, f)
.)
Using symbolic values is quite similar, save for the call N.(zs)
to make numeric, not symbolic, values:
using SymPy
@vars x y real=true
zs = [x^2 - y^2 for y in ys, x in xs]
plot(xs, ys, N.(zs))
An alternative plot to a surface plot is the contour plot. These graph in the $x-y$ plane the solutions to $f(x,y) = c$ for different values of $c$. Contour plots in 2D are produced in Plots
with the contour
function. The same data preparation as surface
is used:
f(x, y) = x^2 + y^2
zs = [f(x,y) for y in ys, x in xs]
contour(xs, ys, zs)
The named argument levels=[a,b,...]
will bypass the automatic generation of the levels to plot and will instead plot those corresponding to the vector of values. (Though not all backends respect this argument.)
Some backends for Plots
allows for a selection of the viewing position with a mouse (e.g. plotly()
) but the one for use within the notebook does not. This example shows how to use Interact
to mimic that ability.
The viewing elevation and azimuth can be adjusted with the camera=(azimuth, elevation)
argument. (If we were using spherical coordinates, $(\rho, \theta, \phi)$ to describe position, then elevation would be the $90 - \theta$ and azimuth the $\phi$, in degrees.)
Specifying the azimuth and elevation can be made interactive using the Interact
package by the following pattern:
f(x, y) = x^2 + y^2
zs = [f(x,y) for y in ys, x in xs]
using Interact
@manipulate for azim in -90:90, elev in 0:90; do
plot(xs, ys, zs, camera=(azim, elev))
end
The key is to put the drawing commands within the @manipulate
call.
A vector can be added to a plot. As with parametric plots, we utilize this function that allows a vector to be added by specifying the point to anchor the vector at and the vector:
xs_ys(vs) = (A=hcat(vs...); Tuple([A[i,:] for i in eachindex(vs[1])]))
xs_ys(v,vs...) = xs_ys([v, vs...])
xs_ys(r::Function, a, b, n=100) = xs_ys(r.(range(a, stop=b, length=n)))
function arrow!(plt::Plots.Plot, p, v; kwargs...)
if length(p) == 2
quiver!(plt, xs_ys([p])..., quiver=Tuple(xs_ys([v])); kwargs...)
elseif length(p) == 3
# 3d quiver needs support
# https://github.com/JuliaPlots/Plots.jl/issues/319#issue-159652535
# headless arrow instead
plot!(plt, xs_ys(p, p+v)...; kwargs...)
end
end
arrow!(p,v;kwargs...) = arrow!(Plots.current(), p, v; kwargs...)
arrow! (generic function with 2 methods)
(Again, this is in the MTH229
package.)
Here we draw the contours of some function g
with specified levels and add a few vectors:
g(x,y) = x^2 + 2y^2
xs = ys = range(-2, stop=2, length=50)
zs = [g(x,y) for y in ys, x in xs]
contour(xs, ys, zs, levels=[1,2,3,4,5])
arrow!([ 1,1], 0.25*[ 1,-1]) ## start at [1,1] in direction [1,-1]
arrow!([-1,1], 0.25*[-1, 1])
Not all surfaces are easily described by points of the form $(x,y,f(x,y))$ for some function $f$ in the same way as not all 2 dimensional curves are well described by the graph of a function. In the case of two-dimensional curves we parameterized them by describing the $x$ and $y$ position in terms of functions of third variable $t$. In a similar manner we can parameterize a surface, though it takes two variables to describe.
Take for example the task of plotting a sphere. A sphere is most easily described in spherical coordinates which map $(r, \theta, \phi)$ into $(x,y,z)$ by:
## Spherical
X(r,theta,phi) = r * sin(theta) * sin(phi)
Y(r,theta,phi) = r * sin(theta) * cos(phi)
Z(r,theta,phi) = r * cos(theta)
Z (generic function with 1 method)
(We use capital letters, as we have used x
and y
for symbolic objects previously in this notebook and redefining them as variables causes an error.)
We then generate xs
, ys
and zs
by iterating over a region described in terms of $\theta$ and $\phi$, as $r$ is constant for a sphere:
thetas = range(0, stop=pi, length=250)
phis = range(0, stop=pi/2, length=250)
xs = [X(1, theta, phi) for theta in thetas, phi in phis]
ys = [Y(1, theta, phi) for theta in thetas, phi in phis]
zs = [Z(1, theta, phi) for theta in thetas, phi in phis]
surface(xs, ys, zs)
The repeating of similar lines to define xs
, ys
, and zs
can be avoided, but it isn't much nicer with this solution:
H(theta, phi) = [X(1, theta, phi), Y(1, theta, phi), Z(1, theta, phi)]
out = [ [H(theta, phi)[i] for theta in thetas, phi in phis] for i in 1:3]
surface(out...)
A plane in 3 dimensions is described by a point and two vectors. The two vectors must be non-collinear, which means their cross product should be non-zero. Though a bit tedious, a graph is easy to make once we parameterize the plane:
p = [1,1,1]
u = [1,0,1]
v = [0,1,1]
@assert norm(u × v) > 0 # \times<tab> for ×
plane(a,b) = p + a*u + b*v
as = bs = range(-1, stop=1, length=50)
xs = [plane(a, b)[1] for b in bs, a in as]
ys = [plane(a, b)[2] for b in bs, a in as]
zs = [plane(a, b)[3] for b in bs, a in as]
surface(xs, ys, zs)
arrow!(p, u, color="red")
arrow!(p, v, color="yellow")
For function $f:R^n \rightarrow R$ the concept of a derivative is extended to partial derivatives for $n$ variables. The partial variable of $x$ is defined by holding $y$ constant while a derivative in $x$ is taken.
The definition has:
The gradient of $f$, $\nabla f$, is the vector-valued function of partial derivatives $[\partial f / \partial x, \partial f / \partial y]$ (taking $R^2$ into $R^2$.)
The ForwardDiff
package provides a gradient
function for evaluating the gradient at a point. The functions used must take a vector and return a scalar quantity. This requires some redefinitions. For example, we can define $f(x,y) = x^2 -2xy+y^3$ with:
f(x,y) = x^2 - 2x*y + y^3
f (generic function with 1 method)
Or
f(x) = x[1]^2 + 2x[1] * x[2] + x[2]^3
f (generic function with 2 methods)
The latter is a bit harder to read. It is possible to use a trick like this:
f(x,y) = x^2 - 2x*y + y^3
f(x) = f(x...)
f (generic function with 2 methods)
This uses julia
's polymorphism to make the easier to read function and the function taking a vector related.
With this, the gradient at, say x=[1,2]
can be found with:
using ForwardDiff
import ForwardDiff: gradient
gradient(f, [1,2])
2-element Array{Int64,1}: -2 10
Compare this answer with that generated by hand (taking partial in $x$ and the partial in $y$):
dx(x,y) = 2x - 2y
dy(x,y) = -2x + 3y^2
(dx(1,2), dy(1,2))
(-2, 10)
We can also find partial derivatives using SymPy
's diff
function. That second argument to that function is to specify a variable, the third the degree of the derivative (it has a default of 1, so can usually be omitted) . The basic partial derivatives $\partial f/ \partial x$ and $\partial f/ \partial y$ are found as follows:
@vars x y real=true
f(x,y) = x^2 - 2x*y + y^3
ex = f(x, y)
partial_x = diff(ex, x, 1)
partial_y = diff(ex, y) # 1 is default degree
[partial_x, partial_y] # The gradient, or
[diff(ex, s) for s in (x, y)] # finds gradient in one swoop
The Hessian matrix is a matrix of second partial derivatives. We can find it as follows:
f(x,y) = x^2 - 2x*y + y^3
ex = f(x, y)
hess = [diff(diff(ex, a), b) for b in (x,y), a in (x,y)]
The Hessian is also available for SymPy
objects through sympy.hessian
:
sympy.hessian(ex, (x,y))
The gradient is not, so we define a helper function:
Grad(ex::SymPy.Sym, vars=free_symbols(ex)) = [diff(ex, v) for v in vars]
Grad (generic function with 2 methods)
(As gradient
is defined in the ForwardDiff
package, we avoid a name collision, which can be worked around, by using Grad
.)
The partial derivative in $x$ at a point, $(a,b)$, can be viewed by slicing the graph of $f(x,y)$ with the plane $y=b$. This intersection will result in a function whose derivative is just the partial derivative.
The derivative indicates the slope of the tangent line, so the partial derivative in $x$ at $(a,b)$ indicates the slope of the surface were one constrained to move in the fixed direction $y=b$.
The directional derivative, $\nabla_v(f)$, is the slope were one constrained to move in a direction specified by a vector $v$. This generalizes the partial in $x$ where $v$ may be taken to be $c \cdot [1,0]$ and the partial in $y$ where $v$ may be taken to be $c \cdot [0,1]$, for some scalar $c$. For an arbitrary $v$, the partial derivative is defined by a limit of $(f(x + hv) - f(x))/h$ where $x$ and $v$ are vectors, but may be more easily written in terms of the gradient via
The above includes the magnitude of $v$ in its interpretation, where this is the slope of a curve being traversed in the direction of $v$ with speed given by $\| v\|$. It is not uncommon to have a definition in terms of moving at unit speed, as this is more intrinsic to the function $f$.
Take $v$ to be a unit vector, then the magnitude of the the directional derivative is
where $\theta$ is the angle between the gradient and $v$. Taking $v$ in the direction of the gradient will maximize this magnitude. Hence the gradient is in the direction of greatest increase of ascent on the surface.
That is, if one is on a hill described by $(x,y,f(x,y))$, then walking so that your $(x,y)$ is in the direction of the gradient will walk in the steepest possible way. Contrast this to walking along a contour, $f(x,y) = k$, which by definition will have no slope up or down, as the $z$ coordinate stays at $k$. In some sense the gradient and the contour are orthogonal concepts. In fact, if we plot, we can see that the gradient is perpendicular to the tangent line of the contour. For example, taking the simple function $f(x,y) = x^2 + y^2$:
f(x,y) = x^2 + y^2
grad_f(x,y) = [2*x, 2*y]
xs = ys = range(-3, stop=3, length=50)
zs = [f(x,y) for y in ys, x in xs]
contour(xs, ys, zs, levels = [1,2,4,9], aspect_ratio=:equal)
## add gradient
arrow!([sqrt(3)/2,1/2], grad_f(sqrt(3)/2,1/2))
## tangent to contour by implicit differentiation solves 2x + 2y y' = 0
## or y' = -x/y (which is in direction [y, -x])
arrow!([sqrt(3)/2,1/2], [1/2, -sqrt(3)/2])
(The scale of the $x$ and $y$ axis is equal through the extra argument aspect_ratio=:equal
.)
Were this not the case, then the directional derivative in a direction of the tangent to the contour would have non-zero magnitude (as the directional derivative is a dot product with the gradient which is non-zero when the two vectors are and the two vectors are not perpendicular). That would mean moving in that direction will have a slope up or down, but that direction is along a contour which is by definition flat.
The expression above to find the implicit derivative $dy/dx$ for $f(x,y) = x^2 + y^2$ can be done in SymPy
, but requires some work. The key is to define a symbolic function, F = SymFunction("F")
and use that to substitute in for y
:
F = SymFunction("F")
@vars x y real=true
ex = x^2 + y^2
tmp = diff(subs(ex, y => F(x)), x) # x^2 + F(x)^2
ex1 = solve(tmp, diff(F(x),x)) # solve for dF/dx in 2x + 2F(x) dF/dx, gives [-x/F(x)]
tl = subs(ex1[1], F(x) => y) # now it is [-x/y]
[diff(ex,s) for s in [x,y]] ⋅ [1, tl] # 0 as orthogonal.
(This uses the fact that the vector [1,m]
is parallel to a line with slope m
.)
Of course, the last part is of a more general nature. For example, suppose $r(t) = \langle x(t), y(t) \rangle$ parameterizes the contour $G(x,y) = k$. The $(G \circ r)(t) = k$, a constant so would have derivative $0$. But by the chain rule, $0 = \nabla G(x(t), y(t)) \cdot r'(t)$, so the gradient is always perpendicular to the tangent vector, as was the case above.
The fact that the gradient points in the direction of greatest increase can be exploited. The first generalizes the characterization of local extrema for differentiable functions as only occurring when the derivative is zero (a critical point). The same is true for two-dimensional functions, where a critical point would be where the gradient is $0$ length. (By assuming differentiable we avoid talking about those critical points where the derivative is undefined, such as at a cusp.)
The definition of a local maximum of a function $f$ is not much different when $f:R^n \rightarrow R$ for $n = 2$ than the $n=1$ case: $f(a,b)$ is a local maximum of $f$ if $f(a,b) \geq f(x,y)$ for any $(x,y)$ in an open disk about $(a,b)$. That is, the value $f(a,b)$ is the largest in some nearby neighborhood, at least. Using a mountain analogy, this can be the ultimate peak, or a lower peak, it just needs to be the tallest value in some (open) region. A local minimum has a similar defintion.
If $f$ is differentiable and $(a,b)$ is an interior point of the domain, then one has at a local extrema that both partial derivatives will be 0. This is a direct analogue of the $n=1$ case.
However, as with the $n=1$ case, it is not enough to know that both partials are $0$ to conclude you are at a local extrema. An example where this is not the case is a saddle point. The function $f(x,y) = y^2 - y^4 - x^2$ demonstrates this at $0$.
f1(x,y) = y^2 - y^4 - x^2
xs = ys = range(-1, stop=1, length=50)
zs = [f1(x,y) for y in ys, x in xs]
surface(xs, ys, zs)
There is a test though that involves the Hessian, whose determinant is $d = f_{xx}f_{yy} - f_{xy}^2$:
The function $f$ has a local maximum if $f_{xx} < 0$ and $d > 0$
The function $f$ has a local minimum if $f_{xx} > 0$ and $d > 0$
The function $f$ has a saddle point if $d < 0$
The test is otherwise inconclusive.
(The second derivative test when $n=1$ is similar to the first, second and fourth statements without consideration of $d$.)
Let's see this in use: maximize $g(x,y) = 2 + 2x + 2y - x^2 - y^2$ over the triangular region bounded by $x=0$, $y=0$ and $y=9-x$.
g(x,y) = 2 + 2x + 2y - x^2 - y^2
h(x,y) = (x>0) & (y>0) & (y < 9-x) ? g(x,y) : NaN
xs = ys = range(0, stop=9, length=250)
zs = [h(x,y) for y in ys, x in xs]
surface(xs, ys, zs)
We can see that the maximum occurs near the origin, and not out at the edge. Let's find it symbolically:
@vars x y real=true
grad = [diff(g(x,y), x), diff(g(x,y),y)]
solve(grad)
Dict{Any,Any} with 2 entries: x => 1 y => 1
The answer of $y=1$ gives $x=1$ is in the region and has a zero gradient, as we can check by substituting in these values for $x$ and $y$ in the gradient expression:
subs.(grad, x => 1, y => 1)
The value of $g_{xx}$ is the upper left value which is $-2$. By the theorem, the value $(1,1)$ will correspond to a local maximum if the determinant is positive, and it is:
hess = sympy.hessian(g(x,y), (x,y))
det(hess)
The relative maximum is the value:
g(1,1)
4
Is this the maximum value over the region? For that, we need to check the boundary of the region, that is when $x=0$ or $y=0$ or $y=9-x$. This is (tediously) done in three steps:
Check the values with $x=0$. Then we have the function $h(y) = g(0,y)$ over $[0,9]$. This function of a single variable and it has a larger value would occur at an endpoint or a critical point:
h(y) = g(0,y)
out = solve(diff(h(y), y))
append!(out, [0, 9])
h.(out)
Nope, nothing bigger than $4$.
Check the values with $y=0$. Then we have $h(x) = g(x,0)$ over $[0,9]$. Repeating gives the same answer:
h(x) = g(x,0)
out = solve(diff(h(x), x))
append!(out, [0, 9])
h.(out)
(You could skip this last one by noting the $x$ and $y$ enter symmetrically.
Check values $y = 9-x$. Here we parameterize by $t$ going from $0$ to $9$:
@vars t
h(t) = g(t, 9-t)
out = solve(diff(h(t), t))
append!(out, [0, 9])
h.(out)
Again, all these values are less than $4$, so we can conclude that $4$ is a local maximum and the absolute maximum over the region.
Unconstrained optimization is related to the gradient. A related, but different question is constrained optimization. A generic description would be along the lines of
Maximize $f(x,y)$ given $g(x,y) = k$.
For this problem the gradient of $f$ may never be $0$ along the contour of $g$. However, a maximal value will be determined by the gradient. For example, the gradient of $f$ should be perpendicular to the contour lines of $g$, as otherwise we could move in the direction of the projection of the gradient of $f$ onto the tangent of the contour line of $g$ and increase our function. However, the gradient of $g$ is also perpendicular to the tangent of the contour lines of $g$, so in fact we need the gradient of $f$ and the gradient of $g$ to be parallel.
That is, in addition to satisfying the constraint, a necessary condition for an optimal value is for some $\lambda$ that:
This is the basis for the technique of Lagrange Multipliers.
The following graphic illustrates the method for the problem where we maximize $f(x,y) = 5 + x + y^2$ over the constraint formed by $g(x,y) = x^2 + y^2 = 1$. We make a contour plot, then draw for a few points the gradient of $f$ and the gradient of $g$. When the arrows are parallel, we have a possible maxima. In the figure this occurs at $\pi/3$.
g(x,y) = x^2 + y^2
∇g(x,y) = [2x, 2y]
f(x,y) = 5 + x + y^2
∇f(x,y) = [1, 2*y]
uvec(u) = u/norm(u)
xs = ys = range(-3, stop=3, length=50) ## should be square!
ts = [0, pi/6, pi/3, pi/2]
zcs = [g(x,y) for y in ys, x in xs]
p = contour(xs, ys, zcs, levels=[1,4,9])
## parameterize g(x,y) == 1
rad(t) = [cos(t), sin(t)]
for t in ts
arrow!(p, rad(t), uvec(∇g(rad(t)...)), color=:blue)
arrow!(p, rad(t), uvec(∇f(rad(t)...)), color=:red )
end
p
To solve this problem algebraically, we can use SymPy
as follows:
using SymPy
@vars x y real=true
f(x,y) = 5 + x + y^2
g(x,y) = x^2 + y^2
gradf = Grad(f(x,y), [x,y]) # add [x,y] to ensure order
gradg = Grad(g(x,y), [x,y])
[gradf gradg] # gradf in first column, gradg in second
We see from the first row that 1 = \ambda 2x
and from the second 2y= lambda 2y
. If $y=0$, then $1 = \lambda 2 x$ can have $x$ take any value. Similarly, If $x=1/2$ then $\lambda=1$ and from $2y=1\cdot 2y$, we see that $y$ can take any value. However, the constraint imposed by $g(x,y)=1$ limits the values in each case giving us 4 possible answers: $(1,0)$, $(-1,0)$, $(1/2, \sqrt{3}/2)$ and $(1/2, -\sqrt{3}/2)$. Of these, the latter two can be seen to be associated with the maximum values along the constraint, the second is the minimum and the first a local minimum.
We could use SymPy
's solve
feature to do this directly. First we introduce a lambda
variable and then using the multi-variable form of solve
:
@vars lambda real=true
out = solve(gradf - lambda * gradg)
1-element Array{Dict{Any,Any},1}: Dict(lambda=>1,x=>1/2)
We see that the x=1/2
case is found, but we miss the $y=0$ case. To use the constraint to find $y$, we have:
ex = subs(g(x,y), x => 1//2) # or out[1][x] to fish out the 1/2
solve(ex - 1, y) # two answers
The value of $(1/2, \sqrt{3}/2)$ corresponds to an azimuth of 60 degrees, as expected, and can be checked to be a maximum, as can the value $(1/2, -\sqrt{3}/2)$.
It can be simpler to solve this by adding the constraint as an equation. We do this here by pushing it onto the equations to solve:
exs = gradf - lambda * gradg
push!(exs, g(x,y) - 1)
out = solve(exs)
4-element Array{Dict{Any,Any},1}: Dict(lambda=>-1/2,x=>-1,y=>0) Dict(lambda=>1/2,x=>1,y=>0) Dict(lambda=>1,x=>1/2,y=>-sqrt(3)/2) Dict(lambda=>1,x=>1/2,y=>sqrt(3)/2)
The display of the values is not very good, but the output gives four answers which we can check via:
[f(o[x], o[y]) for o in out]
The second and third (with x=1/2
) are the largest, the one with x=-1
the smallest, and the one with x=1
a local extrema, but not an extrema over the entire set of values specified by the constraint.
The Langragian associated with the constrained problem is $l(x,y,\lambda) = f(x,y) - \lambda h(x,y)$, where $h(x,y) = g(x,y) - k$. The solutions to $\nabla l = 0$ are the same as above.
Find the maximal volume of box with fixed surface area of 64 square units (problem is from Paul's Online Math Notes ).
@vars a b c real=true
Vol(a,b,c) = a*b*c
SA(a,b,c) = 2*(a*b + a*c + b*c)
gradf = Grad(Vol(a,b,c), [a,b,c])
gradg = Grad(SA(a,b,c), [a,b,c])
[gradf gradg] # gradf in first column,...
What is $\lambda$? Here we add the constraint to the equations and let SymPy
do the work.
@vars lambda real=true
exs = gradf - lambda * gradg
push!(exs, SA(a,b,c) - 64)
out = solve(exs, [lambda, a, b, c])
out[2]
(sqrt(6)/3, 4*sqrt(6)/3, 4*sqrt(6)/3, 4*sqrt(6)/3)
The two answers differ only by minus signs. Only the second is possible and yields a cube with sides of length $4\sqrt{6}/3$.
Find the minimal surface area of a box with fixed volume 64 cubic units (problem is from Paul's Online Math Notes ).
The basic equations are identical, save for the constraint being switched. We have then:
@vars a b c real=true
@vars lambda real=true
Vol(a,b,c) = a*b*c
SA(a,b,c) = 2*(a*b + a*c + b*c)
gradf = Grad(Vol(a,b,c), [a,b,c])
gradg = Grad(SA(a,b,c), [a,b,c])
exs = gradf - lambda * gradg
push!(exs, Vol(a,b,c) - 64)
And now
solve(exs)
1-element Array{Dict{Any,Any},1}: Dict(lambda=>1,c=>4,a=>4,b=>4)
This too has a solution which is a cube, here with side lengths of 4. However, note we minimized here, but maximized in the previous question.
A right circular can is to hold 355 $mm^3$. Find the dimensions of the radius, $r$, and height, $h$, that minimize surface area. (Problem modified from one here.)
@vars radius height real=true
@vars lambda real=true
Vol(r,h) = PI*r^2* h
SA(r,h) = 2*PI*r^2 + 2*PI*r*h
gradf = Grad(Vol(radius, height), [radius, height])
gradg = Grad(SA(radius, height), [radius, height])
exs = gradf - lambda * gradg
push!(exs, Vol(radius,height) - 355)
solve(exs)[1]
Dict{Any,Any} with 3 entries: radius => 2^(2/3)*355^(1/3)/(2*pi^(1/3)) lambda => 2^(2/3)*355^(1/3)/(4*pi^(1/3)) height => 2^(2/3)*355^(1/3)/pi^(1/3)
(There is only one answer, so we use [1]
to display just that, as otherwise the display is not great.)
If we had solved without adding the extra constraint, we see the relationship between h
and r
more clearly:
solve(gradf - lambda * gradg)[3] # third one has answer
Dict{Any,Any} with 2 entries: radius => 0 lambda => 0
That height=2 radius
says the profile of this can is a square with height equal to the diameter.