动手学深度学习 - 02 训练一个线性回归模型

定义一个线性模型:

其中,$x$ 和 $y$ 分别是输入输出数据,$W$ 和 $B$ 是参数,$\epsilon$ 是随机噪声。

现在,我们使用mxnet的随机模块,生成特征$x$和标签$y$,使用反向传播来训练这个线性回归网络,而$W$和$B$就是网络中的权重。

from mxnet import autograd, nd

生成数据

num_batches = 1000
num_inputs = 3

true_w = nd.array([[4.2, 2.0, -3.2]]).T
true_b = nd.array([[2.0]])
x = nd.random.normal(0, 1, shape=(num_batches, num_inputs))
epsilon = nd.random.normal(0, 0.01, shape=(num_batches, 1)) # 噪声服从均值为0,方差为0.1的正态分布
y0 = nd.dot(x, true_w) + true_b + epsilon
y = y0 + epsilon

print(true_w.shape)
print(true_b.shape)
print(x.shape)
print(y.shape)
print(epsilon.shape)

print(x[0:10])
print(y[0:10])
(3, 1)
(1, 1)
(1000, 3)
(1000, 1)
(1000, 1)

[[ 1.1630787   0.4838046   0.29956347]
 [ 0.15302546 -1.1688148   1.5580711 ]
 [-0.5459446  -2.3556297   0.5414402 ]
 [ 2.6785066   1.2546344  -0.54877394]
 [-0.68106437 -0.13531584  0.37723127]
 [ 0.41016445  0.5712682  -2.7579627 ]
 [ 1.07628    -0.6141326   1.8307649 ]
 [-1.1468065   0.05383795 -2.5074806 ]
 [-0.59164983  0.8586049  -0.22794184]
 [ 0.20131476  0.35005474  0.5360521 ]]
<NDArray 10x3 @cpu(0)>

[[ 6.9298854]
 [-4.676832 ]
 [-6.7715883]
 [17.516018 ]
 [-2.3353257]
 [13.697228 ]
 [-0.5571796]
 [ 5.297142 ]
 [ 1.9715714]
 [ 1.8121778]]
<NDArray 10x1 @cpu(0)>

读取数据

编写一个自定义的迭代器,来一批一批的获取数据。

import random

iter_batch_size = 10

def data_iter():
# 产生一个随机索引序列
idx = list(range(num_batches)) # 生成索引序列:0,1,2,...,999
random.shuffle(idx) # 随机打乱索引

# 每次随机取10个数据
for i in range(0, num_batches, iter_batch_size): # 在不打乱的情况下,会生成:0,10, 20, 30 ..., 990
j = nd.array(idx[i : min(i + iter_batch_size, num_batches)]) # 这里设置取值范围:[0:10], [10:20],...
yield nd.take(x, j), nd.take(y, j) # 迭代器获取数据
# 测试迭代器,拿2次
n = 0
for data, label in data_iter():
print(data, label)
n += 1
if (n >= 2):
break
[[ 4.2505121e+00 -2.1137807e-03 -7.9776019e-01]
 [ 4.9009416e-01 -1.1063820e+00  3.5573665e-02]
 [-1.2949859e-01 -3.0296946e-02 -1.7266597e-01]
 [-1.5107599e+00 -9.6534061e-01  5.4608238e-01]
 [ 7.8113359e-01 -1.1420447e+00 -2.8238511e-01]
 [ 2.6581082e-01  5.7875359e-01 -9.6763712e-01]
 [ 5.7725173e-01  3.8847062e-01 -1.2530572e+00]
 [ 1.1652315e+00  1.6189508e-01 -1.6221091e-01]
 [-2.1707486e-01 -6.4814579e-01  9.0141118e-01]
 [ 1.1282748e+00  2.5994456e+00 -2.0640564e+00]]
<NDArray 10x3 @cpu(0)> 
[[22.389832 ]
 [ 1.707745 ]
 [ 1.9283633]
 [-8.028843 ]
 [ 3.9277847]
 [ 7.373327 ]
 [ 9.212286 ]
 [ 7.764362 ]
 [-3.1022215]
 [18.526558 ]]
<NDArray 10x1 @cpu(0)>

[[ 0.1787789  -0.18420085 -0.08212578]
 [-1.3121095  -0.04268014 -1.0699745 ]
 [ 0.20131476  0.35005474  0.5360521 ]
 [-1.365017    0.69103366 -0.4321104 ]
 [-0.04626409 -1.0672387  -2.0273046 ]
 [-0.6345202  -0.10353587 -1.3175181 ]
 [ 0.7618796  -1.1695448   0.7909283 ]
 [ 0.19799927 -0.10506796 -1.3348366 ]
 [ 1.6388271   0.59673244  1.1476266 ]
 [ 2.1935298  -0.5385921  -0.8611334 ]]
<NDArray 10x3 @cpu(0)> 
[[ 2.6571708 ]
 [-0.15536022]
 [ 1.8121778 ]
 [-0.964692  ]
 [ 6.1714783 ]
 [ 3.366683  ]
 [ 0.34661472]
 [ 6.8972564 ]
 [ 6.4096384 ]
 [12.895386  ]]
<NDArray 10x1 @cpu(0)>

初始化模型参数

w = nd.random.normal(shape=(num_inputs, 1))
b = nd.zeros((1,))
params = [w, b]
print(w)
print(b)
print(params)
[[-1.3058136]
 [ 0.9344402]
 [ 0.5380863]]
<NDArray 3x1 @cpu(0)>

[0.]
<NDArray 1 @cpu(0)>
[
[[-1.3058136]
 [ 0.9344402]
 [ 0.5380863]]
<NDArray 3x1 @cpu(0)>, 
[0.]
<NDArray 1 @cpu(0)>]

为参数创建梯度信息,为后面的参数求导做准备。

for param in params:
param.attach_grad()

定义模型

def net(x):
return nd.dot(x, w) + b # 线性模型的预测值y_hat
# 测试一下模型
net(data)


[[-0.44976735]
[ 1.0977497 ]
[ 0.35266796]
[ 2.1956747 ]
[-2.0277233 ]
[ 0.02287859]
[-1.6621547 ]
[-1.0749872 ]
[-0.9648698 ]
[-3.8309872 ]]

定义损失函数

def square_loss(yhat, y):
# 将y reshape,避免自动广播
return (yhat - y.reshape(yhat.shape)) ** 2

定义优化求解策略:随机梯度下降SGD

def sgd(params, learning_rate):
for param in params:
param[:] = param - learning_rate * param.grad

训练模型

epochs = 5
learning_rate = .001
for e in range(epochs):
total_loss = 0
for data, label in data_iter():
# 一次迭代学习过程
with autograd.record():
output = net(data)
loss = square_loss(output, label)
loss.backward()
sgd(params, learning_rate)

total_loss += nd.sum(loss).asscalar()
print("Epoch %d, average loss: %f" % (e, total_loss/num_batches))
Epoch 0, average loss: 12.393637
Epoch 1, average loss: 0.164320
Epoch 2, average loss: 0.002606
Epoch 3, average loss: 0.000418
Epoch 4, average loss: 0.000386

训练完,比较学到的参数和真实参数

true_b, b
(
 [[2.]]
 <NDArray 1x1 @cpu(0)>,

 [2.0011895]
 <NDArray 1 @cpu(0)>)
true_w, w
(
 [[ 4.2]
  [ 2. ]
  [-3.2]]
 <NDArray 3x1 @cpu(0)>,

 [[ 4.1996527]
  [ 2.000402 ]
  [-3.199496 ]]
 <NDArray 3x1 @cpu(0)>)

可以看到训练得到的参数,与真实参数非常接近。并且,随着训练次数的增加,损失函数值逐步收敛。

参考

B站李沐课程[MXNet/Gluon] 动手学深度学习第一课:从上手到多类分类