Sparse-direct solvers in Julia¶

In this notebook, we construct some typical sparse matrices (the type that would arise from discretizing an elliptic PDE) in Julia, explore how they are represented and visualized, look at sparse-direct solvers via sparse-Cholesky factorization, and examine the impact of the matrix ordering on the sparse-direct solver.

A nice sparse matrix: discrete Laplacian¶

As a "nice" example of a sparse matrix, which has a structure typical of many applications arising from discretized PDEs and similar situations, let us consider a discrete Laplacian, analogous to $-\nabla^2 = -\partial^2/\partial x^2 - \partial^2/\partial y^2$ on a 2d $n_1 \times n_2$ grid, with $m = n_1 n_2$ degrees of freedom. The $m\times m$ matrix resulting from a classic five-point stencil on a 2d grid is sparse ($< 5m$ nonzero entries) and is real-symmetric and positive-definite.

How this matrix is constructed is outside the scope of 18.335, but see my notes from 18.303 on the subject. Essentially, we first construct a 1d "difference" matrix $D$ analogous to $\partial/\partial x$ on a 1d grid with Dirichlet (zero) boundary conditions, then from the second-difference matrix analogous to $-\partial^2/\partial x^2$ via $D^T D$, and finally we put these 1d matrices together into a "2d" matrix via Kronecker products (kron in Julia). We use spdiagm to construct the (bidiagonal) difference matrices as sparse matrices in Julia, and as a result all of the subsequent matrix products are computed in an efficient sparse fashion.

In [1]:

# returns -∇² (discrete Laplacian, real-symmetric positive-definite) on n₁×n₂ grid
function ∇²(n₁,n₂)
    o₁ = ones(n₁)
    ∂₁ = spdiagm((-o₁,o₁),(-1,0),n₁+1,n₁)
    o₂ = ones(n₂)
    ∂₂ = spdiagm((-o₂,o₂),(-1,0),n₂+1,n₂)
    return kron(speye(n₂), ∂₁'*∂₁) + kron(∂₂'*∂₂, speye(n₁))
end

Out[1]:

∇² (generic function with 1 method)

The resulting matrix data structure (technically in compressed-sparse-column/CSC format) only stores the nonzero entries:

In [2]:

∇²(4,3)

Out[2]:

12x12 sparse matrix with 46 Float64 entries:
	[1 ,  1]  =  4.0
	[2 ,  1]  =  -1.0
	[5 ,  1]  =  -1.0
	[1 ,  2]  =  -1.0
	[2 ,  2]  =  4.0
	[3 ,  2]  =  -1.0
	[6 ,  2]  =  -1.0
	[2 ,  3]  =  -1.0
	[3 ,  3]  =  4.0
	[4 ,  3]  =  -1.0
	⋮
	[6 , 10]  =  -1.0
	[9 , 10]  =  -1.0
	[10, 10]  =  4.0
	[11, 10]  =  -1.0
	[7 , 11]  =  -1.0
	[10, 11]  =  -1.0
	[11, 11]  =  4.0
	[12, 11]  =  -1.0
	[8 , 12]  =  -1.0
	[11, 12]  =  -1.0
	[12, 12]  =  4.0

We can use the full function to convert it back to a dense matrix, which is a bit easier to read (but grossly inefficient for large sizes):

In [3]:

full(∇²(4,3))

Out[3]:

12x12 Array{Float64,2}:
  4.0  -1.0   0.0   0.0  -1.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
 -1.0   4.0  -1.0   0.0   0.0  -1.0   0.0   0.0   0.0   0.0   0.0   0.0
  0.0  -1.0   4.0  -1.0   0.0   0.0  -1.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0  -1.0   4.0   0.0   0.0   0.0  -1.0   0.0   0.0   0.0   0.0
 -1.0   0.0   0.0   0.0   4.0  -1.0   0.0   0.0  -1.0   0.0   0.0   0.0
  0.0  -1.0   0.0   0.0  -1.0   4.0  -1.0   0.0   0.0  -1.0   0.0   0.0
  0.0   0.0  -1.0   0.0   0.0  -1.0   4.0  -1.0   0.0   0.0  -1.0   0.0
  0.0   0.0   0.0  -1.0   0.0   0.0  -1.0   4.0   0.0   0.0   0.0  -1.0
  0.0   0.0   0.0   0.0  -1.0   0.0   0.0   0.0   4.0  -1.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0  -1.0   0.0   0.0  -1.0   4.0  -1.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0  -1.0   0.0   0.0  -1.0   4.0  -1.0
  0.0   0.0   0.0   0.0   0.0   0.0   0.0  -1.0   0.0   0.0  -1.0   4.0

We can also use the spy function in PyPlot (analogous to the function of the same name in Matlab) to visualize the sparse-matrix structure:

In [4]:

using PyPlot

In [5]:

spy(∇²(10,5))

Out[5]:

PyObject <matplotlib.image.AxesImage object at 0x31fab5750>

Sparse Cholesky factorization¶

Since our matrix is real-symmetric positive-definite, we can solve systems of equations using the Cholesky factorization, which is provided in an abstract form by the cholfact function. For a sparse matrix, Julia calls the CHOLMOD sparse-direct Cholesky algorithms in SuiteSparse. In fact, the Cholesky factorization is also called automatically by \ (falling back to LU for non-posdef matrices), and we can also call a sparse LU factorization (from UMFPACK) explicitly via lufact. Let's time them for a big problem:

In [16]:

A = ∇²(600,200) # a 120,000×120,000 matrix
b = rand(600*200)
@time A \ b
@time cholfact(A) \ b
@time lufact(A) \ b;

elapsed time: 1.232090894 seconds (967 kB allocated)
elapsed time: 1.188330829 seconds (957 kB allocated)
elapsed time: 1.627708219 seconds (11 MB allocated)

Note that storing a dense matrix of this size would require $8\times120000^2$ bytes, or about 100 GiB! On my machine, Cholesky factorization of a $1200\times1200$ dense matrix takes about 0.03 seconds for me, so scaling by $100^3$ yields an expected dense-matrix factorization time (on a theoretical version of my machine with enough memory) of about 8.3 hours.

The spy function doesn't know how to visualize a cholfact object directly, but we can convert it to an ordinary sparse matrix via sparse and then plot its nonzero patterns. By default, cholfact uses an approximate minimum-degree algorithm (AMD), that does a darn good job with the sparse-Laplacian matrix here:

In [7]:

spy(sparse(cholfact(∇²(10,5))))
title("sparse Cholesky factor, AMD ordering")

Out[7]:

PyObject <matplotlib.text.Text object at 0x31fc37990>

Choosing our own Cholesky permutation¶

To investigate the effect of different permutations on the Cholesky fill-in, it is useful to provide a permutation different than the default AMD permutation. Fortunately, the CHOLMOD package allows you to provide a user-defined permutation when computing the factorization, and Julia was recently updated to give access to this feature via a perm argument to cholfact.

In [8]:

if VERSION < v"0.4.0-dev+4289"
    error("You need a newer version of Julia to access user-specified sparse-Cholesky permutations")
end

Natural-order Cholesky¶

First, let us try sparse Cholesky with no reordering at all, by just passing 1:m as the permutation. As expected, this results in a lot more fill-in, and in particular we expect $\Theta(m^{3/2})$ fill:

In [9]:

spy(sparse(cholfact(∇²(10,5), perm=1:50)))
title("sparse Cholesky factor, natural ordering")

Out[9]:

PyObject <matplotlib.text.Text object at 0x31fceba10>

In [10]:

N = 10:10:400
m = N.^2
loglog(m, [nnz(cholfact(∇²(n,n), perm=1:n*n)) for n in N], "r*")
loglog(m, [nnz(cholfact(∇²(n,n))) for n in N], "bo")
loglog(m, m.^(3/2), "r-")
loglog(m, m .* log2(m), "b-")
xlabel(L"matrix size $m$")
ylabel("# nonzeros in Cholesky")
legend(["natural", "AMD", L"m^{3/2}", L"m \log_2(m)"], loc="upper left")

Out[10]:

PyObject <matplotlib.legend.Legend object at 0x320485ed0>

Nested dissection¶

Since we know that our matrix comes from a 5-point stencil on a perfectly regular grid, it is straightforward to implement a nested-dissection ordering by a little function that recursively divides the grid in two along the longer axis:

In [11]:

# return nested-dissection permutation for an m × n grid (in column-major order).  ndissect is an optional
# number of dissection steps before reverting to the normal ordering, and lm is an optional leading dimension
# (used for indexing computations).
function nest(m,n, ndissect=30, lm=m)
    if ndissect <= 0 || m*n < 5
        return reshape([i + (j-1)*lm for i=1:m, j=1:n], m*n)
    elseif m >= n
        msep = div(m,2)
        N1 = nest(msep-1,n, ndissect-1, lm)
        N2 = nest(m-msep,n, ndissect-1, lm) + msep
        Ns = msep + (0:n-1)*lm
        return [N1; N2; Ns]
    else
        nsep = div(n,2)
        N1 = nest(m,nsep-1, ndissect-1, lm)
        N2 = nest(m,n-nsep, ndissect-1, lm) + nsep*lm
        Ns = (1:m) + (nsep-1)*lm
        return [N1; N2; Ns]
    end
end

Out[11]:

nest (generic function with 3 methods)

As a quick check, let's make sure that it outputs each index exactly once:

In [12]:

sort(nest(30,60)) == [1:30*60;]

Out[12]:

true

Just by looking at the sparsity pattern, it's clearly a bit different from AMD:

In [13]:

spy(sparse(cholfact(∇²(10,5), perm=nest(10,5))))
title("sparse Cholesky factor, nested-dissection ordering")

Out[13]:

PyObject <matplotlib.text.Text object at 0x3277b6710>

We can use the nnz function to compare the fill-in for natural, nested-dissection, and AMD ordering for a $100\times120$ grid. CHOLMOD's default ordering (AMD plus a postorder permutation) is actually pretty darn good, and slightly beats our hand-rolled nested-dissection; both are about 3 times better than natural order:

In [14]:

m = 100
n = 120
nnz(cholfact(∇²(m,n), perm=1:m*n)), nnz(cholfact(∇²(m,n), perm=nest(m,n))),  nnz(cholfact(∇²(m,n)))

Out[14]:

(1333457,435519,396160)

We can also look at the effect of different levels of nesting:

In [15]:

figure(figsize=(10,10))
n1, n2 = 10, 5
m = n1*n2

subplot(2,2,1)
F = cholfact(∇²(n1,n2), perm=1:m)
spy(sparse(F))
title("natural, $(nnz(F)/m^2*100)% fill", y=1.1)

subplot(2,2,2)
F = cholfact(∇²(n1,n2), perm=nest(n1,n2, 1))
spy(sparse(F))
title("1 dissection, $(nnz(F)/m^2*100)% fill", y=1.1)

subplot(2,2,3)
F = cholfact(∇²(n1,n2), perm=nest(n1,n2, 2))
spy(sparse(F))
title("2 dissections, $(nnz(F)/m^2*100)% fill", y=1.1)

subplot(2,2,4)
F = cholfact(∇²(n1,n2), perm=nest(n1,n2, 3))
spy(sparse(F))
title("3 dissections, $(nnz(F)/m^2*100)% fill", y=1.1)

Out[15]:

PyObject <matplotlib.text.Text object at 0x32e4c2e50>