动手学深度学习 - 02 训练一个线性回归模型
定义一个线性模型:
其中,$x$ 和 $y$ 分别是输入输出数据,$W$ 和 $B$ 是参数,$\epsilon$ 是随机噪声。
现在,我们使用mxnet的随机模块,生成特征$x$和标签$y$,使用反向传播来训练这个线性回归网络,而$W$和$B$就是网络中的权重。
from mxnet import autograd, nd
生成数据 num_batches = 1000 num_inputs = 3 true_w = nd.array([[4.2 , 2.0 , -3.2 ]]).T true_b = nd.array([[2.0 ]]) x = nd.random.normal(0 , 1 , shape=(num_batches, num_inputs)) epsilon = nd.random.normal(0 , 0.01 , shape=(num_batches, 1 )) y0 = nd.dot(x, true_w) + true_b + epsilon y = y0 + epsilon print (true_w.shape)print (true_b.shape)print (x.shape)print (y.shape)print (epsilon.shape)print (x[0 :10 ])print (y[0 :10 ])
(3, 1)
(1, 1)
(1000, 3)
(1000, 1)
(1000, 1)
[[ 1.1630787 0.4838046 0.29956347]
[ 0.15302546 -1.1688148 1.5580711 ]
[-0.5459446 -2.3556297 0.5414402 ]
[ 2.6785066 1.2546344 -0.54877394]
[-0.68106437 -0.13531584 0.37723127]
[ 0.41016445 0.5712682 -2.7579627 ]
[ 1.07628 -0.6141326 1.8307649 ]
[-1.1468065 0.05383795 -2.5074806 ]
[-0.59164983 0.8586049 -0.22794184]
[ 0.20131476 0.35005474 0.5360521 ]]
<NDArray 10x3 @cpu(0)>
[[ 6.9298854]
[-4.676832 ]
[-6.7715883]
[17.516018 ]
[-2.3353257]
[13.697228 ]
[-0.5571796]
[ 5.297142 ]
[ 1.9715714]
[ 1.8121778]]
<NDArray 10x1 @cpu(0)>
读取数据 编写一个自定义的迭代器,来一批一批的获取数据。
import randomiter_batch_size = 10 def data_iter (): idx = list (range (num_batches)) random.shuffle(idx) for i in range (0 , num_batches, iter_batch_size): j = nd.array(idx[i : min (i + iter_batch_size, num_batches)]) yield nd.take(x, j), nd.take(y, j)
n = 0 for data, label in data_iter(): print (data, label) n += 1 if (n >= 2 ): break
[[ 4.2505121e+00 -2.1137807e-03 -7.9776019e-01]
[ 4.9009416e-01 -1.1063820e+00 3.5573665e-02]
[-1.2949859e-01 -3.0296946e-02 -1.7266597e-01]
[-1.5107599e+00 -9.6534061e-01 5.4608238e-01]
[ 7.8113359e-01 -1.1420447e+00 -2.8238511e-01]
[ 2.6581082e-01 5.7875359e-01 -9.6763712e-01]
[ 5.7725173e-01 3.8847062e-01 -1.2530572e+00]
[ 1.1652315e+00 1.6189508e-01 -1.6221091e-01]
[-2.1707486e-01 -6.4814579e-01 9.0141118e-01]
[ 1.1282748e+00 2.5994456e+00 -2.0640564e+00]]
<NDArray 10x3 @cpu(0)>
[[22.389832 ]
[ 1.707745 ]
[ 1.9283633]
[-8.028843 ]
[ 3.9277847]
[ 7.373327 ]
[ 9.212286 ]
[ 7.764362 ]
[-3.1022215]
[18.526558 ]]
<NDArray 10x1 @cpu(0)>
[[ 0.1787789 -0.18420085 -0.08212578]
[-1.3121095 -0.04268014 -1.0699745 ]
[ 0.20131476 0.35005474 0.5360521 ]
[-1.365017 0.69103366 -0.4321104 ]
[-0.04626409 -1.0672387 -2.0273046 ]
[-0.6345202 -0.10353587 -1.3175181 ]
[ 0.7618796 -1.1695448 0.7909283 ]
[ 0.19799927 -0.10506796 -1.3348366 ]
[ 1.6388271 0.59673244 1.1476266 ]
[ 2.1935298 -0.5385921 -0.8611334 ]]
<NDArray 10x3 @cpu(0)>
[[ 2.6571708 ]
[-0.15536022]
[ 1.8121778 ]
[-0.964692 ]
[ 6.1714783 ]
[ 3.366683 ]
[ 0.34661472]
[ 6.8972564 ]
[ 6.4096384 ]
[12.895386 ]]
<NDArray 10x1 @cpu(0)>
初始化模型参数 w = nd.random.normal(shape=(num_inputs, 1 )) b = nd.zeros((1 ,)) params = [w, b] print (w)print (b)print (params)
[[-1.3058136]
[ 0.9344402]
[ 0.5380863]]
<NDArray 3x1 @cpu(0)>
[0.]
<NDArray 1 @cpu(0)>
[
[[-1.3058136]
[ 0.9344402]
[ 0.5380863]]
<NDArray 3x1 @cpu(0)>,
[0.]
<NDArray 1 @cpu(0)>]
为参数创建梯度信息,为后面的参数求导做准备。
for param in params: param.attach_grad()
定义模型 def net (x ): return nd.dot(x, w) + b
[[-0.44976735] [ 1.0977497 ] [ 0.35266796] [ 2.1956747 ] [-2.0277233 ] [ 0.02287859] [-1.6621547 ] [-1.0749872 ] [-0.9648698 ] [-3.8309872 ]]
定义损失函数 def square_loss (yhat, y ): return (yhat - y.reshape(yhat.shape)) ** 2
定义优化求解策略:随机梯度下降SGD def sgd (params, learning_rate ): for param in params: param[:] = param - learning_rate * param.grad
训练模型 epochs = 5 learning_rate = .001 for e in range (epochs): total_loss = 0 for data, label in data_iter(): with autograd.record(): output = net(data) loss = square_loss(output, label) loss.backward() sgd(params, learning_rate) total_loss += nd.sum (loss).asscalar() print ("Epoch %d, average loss: %f" % (e, total_loss/num_batches))
Epoch 0, average loss: 12.393637
Epoch 1, average loss: 0.164320
Epoch 2, average loss: 0.002606
Epoch 3, average loss: 0.000418
Epoch 4, average loss: 0.000386
训练完,比较学到的参数和真实参数
(
[[2.]]
<NDArray 1x1 @cpu(0)>,
[2.0011895]
<NDArray 1 @cpu(0)>)
(
[[ 4.2]
[ 2. ]
[-3.2]]
<NDArray 3x1 @cpu(0)>,
[[ 4.1996527]
[ 2.000402 ]
[-3.199496 ]]
<NDArray 3x1 @cpu(0)>)
可以看到训练得到的参数,与真实参数非常接近。并且,随着训练次数的增加,损失函数值逐步收敛。
参考 B站李沐课程[MXNet/Gluon] 动手学深度学习第一课:从上手到多类分类