I want to show you in this notebook how you can solve a least-squares problem without having recourse to a direct application of the normal equation. If you have to use np.linalg.inv somewhere in your code, it should ring a bell. Explicit matrix inverses can almost always be avoided. And they should be avoided ! Here's how
from sklearn import linear_model, datasets
A, b, coef = datasets.make_regression(n_samples=1000, n_features=1,
n_informative=1, noise=10,
coef=True, random_state=0)
plt.plot(A[:,0], b, '.')
[<matplotlib.lines.Line2D at 0x107deef90>]
print coef
82.1903908408
The solution to the least-squares problem can be found through the direct application of the normal equations: \begin{align*} \mathbf{A}^\top\mathbf{A}\mathbf{x} = \mathbf{A}^\top\mathbf{b} \\ \mathbf{x} = (\mathbf{A}^\top\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{b} \end{align*}
where $\mathbf{A}^\top\mathbf{A}$ is symmetric. Let's try it:
coef_inv = np.linalg.inv(A.T.dot(A)).dot(A.T).dot(b)
print coef_inv
[ 82.11934255]
But does it mean that you should compute the inverse directly ? No ! Although the inverse appears in the normal equations, we can avoid computing it. This is almost always the case in matrix computations. It is not only computationally costly, it is also numerically unstable !
Given the QR factorization of our design matrix, we can find the least-squares solution as follow: \begin{align*} \mathbf{A}\mathbf{x} = \mathbf{b} \\ (\mathbf{Q}\mathbf{R})\mathbf{x} = \mathbf{b} \\ \mathbf{Q}^\top\mathbf{Q}\mathbf{R}\mathbf{x} = \mathbf{Q}^\top\mathbf{b} \\ \mathbf{R}\mathbf{x} = \mathbf{Q}^\top\mathbf{b} \end{align*}
where the last line follows from the fact that $\mathbf{Q}$ is othogonal. Since $\mathbf{R}$ is triangular, we can efficiently solve the last system of equations through backward substitution:
from scipy.linalg import solve_triangular
Q,R = np.linalg.qr(A)
solve_triangular(R, Q.T.dot(b))
array([ 82.11934255])
The complexity of QR is $\frac{2}{3}n^3+n^2+\frac{1}{3}n-2=O(n^3)$
What if the inverse of $\mathbf{A}$ does not exist (not a square matrix for example) ? We can instead take the pseudo-inverse $\mathbf{A}^+$ through the SVD decomposition:
\begin{equation} \mathbf{A} = \mathbf{U}\mathbf{D}\mathbf{V}^\top \end{equation}The pseudo-inverse is then given as: \begin{equation} \mathbf{A}^+ = \mathbf{U}\mathbf{D}^+\mathbf{V}^\top \end{equation}
The matrix $\mathbf{D}^+$ is a diagonal matrix of the inverses of the singular values of $\mathbf{A}$ of rank $r$: \begin{equation} \mathbf{D}^+ = \mbox{diag}(1/\lambda_1, ..., 1/\lambda_r,..., 0, 0) \end{equation}
Note that the singular values of $\mathbf{A}$ are the eigenvalues of $\mathbf{A}^\top\mathbf{A}$. It can be shown that the least-squares solution can be obtained through: \begin{equation} \mathbf{x}^+ = \mathbf{A}^+\mathbf{b} \end{equation}
U, s, Vt = np.linalg.svd(A, full_matrices=False)
Apinv = U.dot(np.diag(1./s)).dot(Vt)
print Apinv.T.dot(b)
[ 82.11934255]
If $\mathbf{A}$ is full-rank, calling [numpy.linalg.solve (http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html) will invoke the LAPACK routine gesv which according to cvxopt uses the LU factorization.
np.linalg.solve(A.T.dot(A), A.T.dot(b))
array([ 82.11934255])
Looking at the source code for numpy.linalg.lstq, we see that the LAPACK function dgelsd is being invoked under the hood. The dgelsd function is based on the SVD method that we have seen above.
np.linalg.lstsq(A, b)[0]
array([ 82.11934255])