Consider a ridge regression problem as follows:
This problem is a special case of a more general family of problems called regularized empirical risk minimization, where the objective function is usually comprised of two parts: a set of loss terms and a regularization term.
Now, we show how to use the package SGDOptim to solve such a problem. First, we have to prepare some simulation data:
w = [3.0, -4.0, 5.0]; # the underlying model coefficients
3-element Array{Float64,1}: 3.0 -4.0 5.0
n = 10000; X = randn(3, n); # generate 10000 sample features
3x10000 Array{Float64,2}: 1.20754 -0.106163 1.2656 -0.195827 … 0.926012 1.33045 0.24129 1.36097 -0.86756 1.05529 0.609302 1.11368 -1.57341 -1.165 -1.05901 0.648764 0.67563 0.276843 -0.262188 0.425759 -0.052067
sig = 0.1; y = vec(w'X) + sig * randn(n); # generate the responses, adding some noise
10000-element Array{Float64,1}: -7.16593 6.56201 2.89612 -1.69577 4.10703 11.8532 14.9927 3.83893 -22.1367 -10.1685 -6.24833 -3.17315 5.49029 ⋮ 10.8841 11.0071 -7.784 12.7698 -1.27293 -2.37607 -5.78557 -8.26047 -9.313 -3.02949 12.331 5.12746
Now, let's try to estimate $w$ from the data. This can be done by the following statement:
using SGDOptim
The first step is to construct a risk model, which comprises a prediction model and a loss function.
rmodel = riskmodel(LinearPred(3), # use linear prediction x -> w'x, 3 is the input dimension
SqrLoss()) # use squared loss: loss(u, y) = (u - y)^2/2
EmpiricalRisks.SupervisedRiskModel{EmpiricalRisks.LinearPred,EmpiricalRisks.SqrLoss}(EmpiricalRisks.LinearPred(3),EmpiricalRisks.SqrLoss())
Now, we are ready to solve the problem:
w_e = sgd(rmodel,
zeros(3), # the initial guess
minibatch_seq(X, y, 10), # supply the data in mini-batches, each with 10 samples
reg = SqrL2Reg(1.0e-4), # add a squared L2 regression with coefficient 1.0e-4
lrate = t->1.0/(100.0 + t), # set the rule of learning rate
cbinterval = 100, # invoke the callback every 100 iterations
callback = simple_trace) # print the optimization trace in callback
Iter 100: avg.loss = 6.5156e-03 Iter 200: avg.loss = 4.5992e-03 Iter 300: avg.loss = 4.8039e-03 Iter 400: avg.loss = 8.1380e-03 Iter 500: avg.loss = 7.6406e-03 Iter 600: avg.loss = 1.0267e-02 Iter 700: avg.loss = 1.6335e-02 Iter 800: avg.loss = 1.3647e-02 Iter 900: avg.loss = 1.1578e-02 Iter 1000: avg.loss = 1.0762e-02
3-element Array{Float64,1}: 3.00284 -3.99607 5.0021
Note that 10000 samples can be partitioned into 1000 minibatches of size 10. So there were 1000 iterations, each using a single minibatch.
Now let's compare the estimated solution with the ground-truth:
sumabs2(w_e - w) / sumabs2(w)
5.582303599714307e-7
The result looks quite accurate. We are done!