In this notebook, we construct some typical sparse matrices (the type that would arise from discretizing an elliptic PDE) in Julia, explore how they are represented and visualized, look at sparse-direct solvers via sparse-Cholesky factorization, and examine the impact of the matrix ordering on the sparse-direct solver.
As a "nice" example of a sparse matrix, which has a structure typical of many applications arising from discretized PDEs and similar situations, let us consider a discrete Laplacian, analogous to $-\nabla^2 = -\partial^2/\partial x^2 - \partial^2/\partial y^2$ on a 2d $n_1 \times n_2$ grid, with $m = n_1 n_2$ degrees of freedom. The $m\times m$ matrix resulting from a classic five-point stencil on a 2d grid is sparse ($< 5m$ nonzero entries) and is real-symmetric and positive-definite.
How this matrix is constructed is outside the scope of 18.335, but see my notes from 18.303 on the subject. Essentially, we first construct a 1d "difference" matrix $D$ analogous to $\partial/\partial x$ on a 1d grid with Dirichlet (zero) boundary conditions, then from the second-difference matrix analogous to $-\partial^2/\partial x^2$ via $D^T D$, and finally we put these 1d matrices together into a "2d" matrix via Kronecker products (kron
in Julia). We use spdiagm
to construct the (bidiagonal) difference matrices as sparse matrices in Julia, and as a result all of the subsequent matrix products are computed in an efficient sparse fashion.
# returns -∇² (discrete Laplacian, real-symmetric positive-definite) on n₁×n₂ grid
function ∇²(n₁,n₂)
o₁ = ones(n₁)
∂₁ = spdiagm((-o₁,o₁),(-1,0),n₁+1,n₁)
o₂ = ones(n₂)
∂₂ = spdiagm((-o₂,o₂),(-1,0),n₂+1,n₂)
return kron(speye(n₂), ∂₁'*∂₁) + kron(∂₂'*∂₂, speye(n₁))
end
∇² (generic function with 1 method)
The resulting matrix data structure (technically in compressed-sparse-column/CSC format) only stores the nonzero entries:
∇²(4,3)
12x12 sparse matrix with 46 Float64 entries: [1 , 1] = 4.0 [2 , 1] = -1.0 [5 , 1] = -1.0 [1 , 2] = -1.0 [2 , 2] = 4.0 [3 , 2] = -1.0 [6 , 2] = -1.0 [2 , 3] = -1.0 [3 , 3] = 4.0 [4 , 3] = -1.0 ⋮ [6 , 10] = -1.0 [9 , 10] = -1.0 [10, 10] = 4.0 [11, 10] = -1.0 [7 , 11] = -1.0 [10, 11] = -1.0 [11, 11] = 4.0 [12, 11] = -1.0 [8 , 12] = -1.0 [11, 12] = -1.0 [12, 12] = 4.0
We can use the full
function to convert it back to a dense matrix, which is a bit easier to read (but grossly inefficient for large sizes):
full(∇²(4,3))
12x12 Array{Float64,2}: 4.0 -1.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 4.0 -1.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 4.0 -1.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 4.0 0.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 0.0 4.0 -1.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 -1.0 4.0 -1.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 -1.0 4.0 -1.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 -1.0 4.0 0.0 0.0 0.0 -1.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 0.0 4.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 -1.0 4.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 -1.0 4.0 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.0 -1.0 4.0
We can also use the spy
function in PyPlot (analogous to the function of the same name in Matlab) to visualize the sparse-matrix structure:
using PyPlot
spy(∇²(10,5))
PyObject <matplotlib.image.AxesImage object at 0x31fab5750>
Since our matrix is real-symmetric positive-definite, we can solve systems of equations using the Cholesky factorization, which is provided in an abstract form by the cholfact
function. For a sparse matrix, Julia calls the CHOLMOD sparse-direct Cholesky algorithms in SuiteSparse. In fact, the Cholesky factorization is also called automatically by \
(falling back to LU for non-posdef matrices), and we can also call a sparse LU factorization (from UMFPACK) explicitly via lufact
. Let's time them for a big problem:
A = ∇²(600,200) # a 120,000×120,000 matrix
b = rand(600*200)
@time A \ b
@time cholfact(A) \ b
@time lufact(A) \ b;
elapsed time: 1.232090894 seconds (967 kB allocated) elapsed time: 1.188330829 seconds (957 kB allocated) elapsed time: 1.627708219 seconds (11 MB allocated)
Note that storing a dense matrix of this size would require $8\times120000^2$ bytes, or about 100 GiB! On my machine, Cholesky factorization of a $1200\times1200$ dense matrix takes about 0.03 seconds for me, so scaling by $100^3$ yields an expected dense-matrix factorization time (on a theoretical version of my machine with enough memory) of about 8.3 hours.
The spy
function doesn't know how to visualize a cholfact
object directly, but we can convert it to an ordinary sparse matrix via sparse
and then plot its nonzero patterns. By default, cholfact
uses an approximate minimum-degree algorithm (AMD), that does a darn good job with the sparse-Laplacian matrix here:
spy(sparse(cholfact(∇²(10,5))))
title("sparse Cholesky factor, AMD ordering")
PyObject <matplotlib.text.Text object at 0x31fc37990>
To investigate the effect of different permutations on the Cholesky fill-in, it is useful to provide a permutation different than the default AMD permutation. Fortunately, the CHOLMOD package allows you to provide a user-defined permutation when computing the factorization, and Julia was recently updated to give access to this feature via a perm
argument to cholfact
.
if VERSION < v"0.4.0-dev+4289"
error("You need a newer version of Julia to access user-specified sparse-Cholesky permutations")
end
First, let us try sparse Cholesky with no reordering at all, by just passing 1:m
as the permutation. As expected, this results in a lot more fill-in, and in particular we expect $\Theta(m^{3/2})$ fill:
spy(sparse(cholfact(∇²(10,5), perm=1:50)))
title("sparse Cholesky factor, natural ordering")
PyObject <matplotlib.text.Text object at 0x31fceba10>
N = 10:10:400
m = N.^2
loglog(m, [nnz(cholfact(∇²(n,n), perm=1:n*n)) for n in N], "r*")
loglog(m, [nnz(cholfact(∇²(n,n))) for n in N], "bo")
loglog(m, m.^(3/2), "r-")
loglog(m, m .* log2(m), "b-")
xlabel(L"matrix size $m$")
ylabel("# nonzeros in Cholesky")
legend(["natural", "AMD", L"m^{3/2}", L"m \log_2(m)"], loc="upper left")
PyObject <matplotlib.legend.Legend object at 0x320485ed0>
Since we know that our matrix comes from a 5-point stencil on a perfectly regular grid, it is straightforward to implement a nested-dissection ordering by a little function that recursively divides the grid in two along the longer axis:
# return nested-dissection permutation for an m × n grid (in column-major order). ndissect is an optional
# number of dissection steps before reverting to the normal ordering, and lm is an optional leading dimension
# (used for indexing computations).
function nest(m,n, ndissect=30, lm=m)
if ndissect <= 0 || m*n < 5
return reshape([i + (j-1)*lm for i=1:m, j=1:n], m*n)
elseif m >= n
msep = div(m,2)
N1 = nest(msep-1,n, ndissect-1, lm)
N2 = nest(m-msep,n, ndissect-1, lm) + msep
Ns = msep + (0:n-1)*lm
return [N1; N2; Ns]
else
nsep = div(n,2)
N1 = nest(m,nsep-1, ndissect-1, lm)
N2 = nest(m,n-nsep, ndissect-1, lm) + nsep*lm
Ns = (1:m) + (nsep-1)*lm
return [N1; N2; Ns]
end
end
nest (generic function with 3 methods)
As a quick check, let's make sure that it outputs each index exactly once:
sort(nest(30,60)) == [1:30*60;]
true
Just by looking at the sparsity pattern, it's clearly a bit different from AMD:
spy(sparse(cholfact(∇²(10,5), perm=nest(10,5))))
title("sparse Cholesky factor, nested-dissection ordering")
PyObject <matplotlib.text.Text object at 0x3277b6710>
We can use the nnz
function to compare the fill-in for natural, nested-dissection, and AMD ordering for a $100\times120$ grid. CHOLMOD's default ordering (AMD plus a postorder permutation) is actually pretty darn good, and slightly beats our hand-rolled nested-dissection; both are about 3 times better than natural order:
m = 100
n = 120
nnz(cholfact(∇²(m,n), perm=1:m*n)), nnz(cholfact(∇²(m,n), perm=nest(m,n))), nnz(cholfact(∇²(m,n)))
(1333457,435519,396160)
We can also look at the effect of different levels of nesting:
figure(figsize=(10,10))
n1, n2 = 10, 5
m = n1*n2
subplot(2,2,1)
F = cholfact(∇²(n1,n2), perm=1:m)
spy(sparse(F))
title("natural, $(nnz(F)/m^2*100)% fill", y=1.1)
subplot(2,2,2)
F = cholfact(∇²(n1,n2), perm=nest(n1,n2, 1))
spy(sparse(F))
title("1 dissection, $(nnz(F)/m^2*100)% fill", y=1.1)
subplot(2,2,3)
F = cholfact(∇²(n1,n2), perm=nest(n1,n2, 2))
spy(sparse(F))
title("2 dissections, $(nnz(F)/m^2*100)% fill", y=1.1)
subplot(2,2,4)
F = cholfact(∇²(n1,n2), perm=nest(n1,n2, 3))
spy(sparse(F))
title("3 dissections, $(nnz(F)/m^2*100)% fill", y=1.1)
PyObject <matplotlib.text.Text object at 0x32e4c2e50>