Recent Julia issues like #5705 have adjusted the bounds on linear algebra tests. The purpose of this notebook is to provide more extensive numerical demonstrations of why the current linear algebra tests are fundamentally broken and need to be systematically replaced by more theoretically justified error bounds, as explained in #5605.
Here is a snippet of code representing a typical linear algebra test in Julia:
#Test linear solve on upper triangular matrices
n=10
eltya=Float64 #eltype of matrix
eltyb=Float64 #eltype of vector
A=convert(Matrix{eltya}, randn(n,n))
A=triu(A)
b=convert(Vector{eltyb}, randn(n))
ε = max(eps(abs(float(one(eltya)))),eps(abs(float(one(eltyb)))))
x = A \ b
#@test_approx_eq_eps triu(a)*x b 20000ε
#Instead of the actual assertion, show the values being compared
@show norm(A*x), norm(b), 20000ε
abs(norm(A*x) - norm(b))/ε*1
(norm(*(A,x)),norm(b),*(20000,ε)) => (4.026090098980171,4.026090098980171,4.440892098500626e-12)
0.0
Mathematically, the test being performed corresponds to asserting that the inequality
|‖Ax‖−‖b‖|≤Cϵis valid, where ϵ is machine epsilon and C=20000 is an arbitrary magic constant.
Most of the time, this test will appear to be well-behaved. The deviation between Ax and b appears to be small in norm, and it looks like the magic number 20000 would obviously cover any contingency. But how robust is this observation?
t=10^6
n=10
T=Float64
b=randn(n)
c=zeros(t) #log10 of magic numbers
ε=eps(real(one(T)))
for i=1:t
A = convert(Matrix{T}, triu(randn(n,n)))
x = A \ b
c[i] = max(log10(abs(norm(A*x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
end
@show minimum(c), maximum(c)
x, y = hist(c, 50)
(minimum(c),maximum(c)) => (-2.0,12.659850021280592)
(-2.5:0.5:13.0,[67288,0,0,0,0,99814,180104,156724,160350,120193 … 65,24,14,0,5,0,0,0,0,1])
The statistics show that every now and then, a nearly singular matrix is sampled, causing the magic number to blow up.
Where do the old and new bounds 15000ϵ and 20000ϵ lie on the distribution?
using Gadfly
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("log10(c)"), Guide.YLabel("Density"),
xintercept=[log10(15000), log10(20000)], Geom.vline(color="orange"))
Despite the large change in the absolute magnitude of the magic constant, we see that this only shifts the threshold slightly on this histogram of log(c), thanks to the extremely long tail in the magic number c. What is the probability that a given random matrix will fail the old and new cutoffs?
for cutoff in (15000, 20000)
p = sum([c .> log10(cutoff)])/t
@show cutoff, p
end
(cutoff,p) =>
WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:58. Use "x[i:end]" instead. WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:71. Use "x[i:end]" instead. WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:77. Use "x[i:end]" instead. WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:92. Use "x[i:end]" instead. WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:106. Use "x[i:end]" instead. WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:111. Use "x[i:end]" instead. WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:122. Use "x[i:end]" instead.
(15000,0.032434) (cutoff,p) => (20000,0.027863)
The change of the magic number in PR #5707 decreases the probability of failure by an essentially negligible amount.
Plotting the failure rate vs the magic constant log(c) shows that in the grand scheme of things, the change in the magic constant does very little, and it would take an enormous shift in c to make the failure rate negligible, so much so that the test would be practically useless.
plot(x=x, y=1-cumsum(y)/t, Geom.line, Guide.XLabel("log10(c)"), Guide.YLabel("Probability of failure"),
xintercept=[log10(15000), log10(20000)], Geom.vline(color="Orange"))
Let's have a look at which method is being dispatched by the linear solver.
@which A\b
@which \(Triangular(A), b)
@which Base.LinAlg.naivesub!(Triangular(A), b)
For a triangular matrix A, this either dispatches to the LAPACK xTRTRS subroutine, which in turn calls the substitution solver xTRSM, or the naïve substitution solver Base.LinAlg.naivesub!
, depending on the element type.
What does theory tell us about the error bounds on substitution? You can derive an error bound using first-order matrix perturbation theory, or if you're lazy, consult a reference like Higham's Accuracy and Stability of Numerical Algorithms, Ch. 8 (pdf).
Error bound results usually come in two flavors. First, we have forward error bounds, which bound the norm of the difference between the computed solution ˆx and the exact solution x.
For the triangular system, we have the forward error bound (Higham, Theorem 8.5):
‖x−ˆx‖∞‖x‖∞≤cond(A,x)γn1−cond(T)γnwith constant
γn=nϵ1−nϵand the Skeel condition numbers
cond(A,x)=‖|A||A−1||x|‖∞‖x‖∞and
cond(A)=‖|A||A−1|‖∞where |A| means that the modulus is taken componentwise for every element of A.
Second, we also have backward error bounds, which assert that the computed solution ˆx=A∖b is an exact solution to a slightly perturbed problem (A+δA)ˆx=b+δb. Sometimes results are also known if only A or only b is assumed to be perturbed.
For the triangular system, we have the backward error bound (Higham, Theorem 8.3)
|δA|≤γn|A|where only the matrix A is perturbed.
Ordinarily, this error bound is useless in practice since the exact solution x is not known. However, thanks to Julia's BigFloat
s, we can approximate the exact solution with that returned by big(A) \ big(b)
.
#Define the Skeel condition numbers
cond_skeel(A::AbstractMatrix) = norm(abs(inv(A))*abs(A), Inf)
cond_skeel(A::AbstractMatrix, x::AbstractVector) = norm(abs(inv(A))*abs(A)*abs(x), Inf)
cond_skeel (generic function with 2 methods)
t=10^4
n=10
T=Float64
b=randn(n)
c=zeros(t) #log10 of magic numbers
r=zeros(t) #ratio of lhs/rhs in inequality
ε=eps(real(one(T)))
γ = n*ε/(1-n*ε) #γ_n
for i=1:t
A = convert(Matrix{T}, triu(randn(n,n)))
̂x = A \ b #we called this x earlier
c[i] = max(log10(abs(norm(A* ̂x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
bigA = big(A)
x = bigA \ b #x is now the exact solution
lhs = norm(̂x-x,Inf)/norm(x,Inf)
rhs = cond_skeel(bigA, x)*γ/(1-cond_skeel(bigA)*γ)
r[i] = lhs/rhs
end
plot(x=c, y=r, Guide.XLabel("log10(c)"), Guide.YLabel("lhs/rhs"))
There doesn't seem to be much correlation between the originally coded test and the test using the forward error bound. But at least the forward error bound test seems to be robust; every point has lhs <= rhs
.
x, y=hist(r, 100)
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("lhs/rhs"), Guide.YLabel("Density"),
xintercept=[1.0], Geom.vline(color="orange"))
mean(r), std(r)/sqrt(t-1) #mean with its standard error
(8.240937806544264e-5,3.067119378454114e-6)
The inequality seems to be fairly loose, but at least nothing violates it.
From the implicit definition of the perturbed matrix δA in
(A+δA)ˆx=bwe can rearrange this to get
δAˆx=b−AˆxAnd taking the norm gives an inequality on the residual ‖b−Aˆx‖=‖δAˆx‖=‖|δAˆx|‖≤γn‖|A||ˆx|‖≤γn‖A‖‖ˆx‖.
t=10^4
n=10
T=Float64
b=randn(n)
c=zeros(t) #log10 of magic numbers
r=zeros(t) #ratio of lhs/rhs in inequality
ε=eps(real(one(T)))
γ = n*ε/(1-n*ε) #γ_n
for i=1:t
A = convert(Matrix{T}, triu(randn(n,n)))
x = A \ b
c[i] = max(log10(abs(norm(A*x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
resid = norm(abs(b - A*x))
rhs = γ * norm(A) * norm(x)
r[i] = resid/rhs
end
plot(x=c, y=r, Guide.XLabel("log10(c)"), Guide.YLabel("lhs/rhs"))
The original Julia test doesn't seem to correlate with the test derived from backward error result either. But at least none of the tests fail so far.
x, y = hist(r, 25)
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("lhs/rhs"), Guide.YLabel("Density"))
mean(r), std(r)/sqrt(t-1)
(0.005644636805875966,4.95668601105437e-5)
This Issue Note describes two replacement tests for the existing test for the linear solve with a triangular matrix. Numerical testing indicates that the new tests are more robust and do not exhibit any spurious failures when applied to a moderate number of randomly sampled matrices.