线性回归输出结果为连续值,而对于分类问题,需要输出离散值,而softmax可以有效解决这一问题。

softmax回归从零开始实现

1
2
3
4
%matplotlib inline
import d2lzh as d2l
from mxnet import autograd, nd

读取数据集

1
2
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

初始化参数模型

由于像素为28*28,因此输入为784,而输出为10个类别

1
2
3
4
5
6
7
8
9
num_inputs = 784
num_outputs = 10

W = nd.random.normal(scale=0.01, shape=(num_inputs, num_outputs))
b = nd.zeros(num_outputs)

#梯度
W.attach_grad()
b.attach_grad()

softmax运算实现

1
2
3
#看一下三维矩阵的运算
X = nd.array([[1, 2, 3], [4, 5, 6]])
X.sum(axis=0, keepdims=True), X.sum(axis=1, keepdims=True)
(
 [[5. 7. 9.]]
 <NDArray 1x3 @cpu(0)>,

 [[ 6.]
  [15.]]
 <NDArray 2x1 @cpu(0)>)

将每一行的值转换为概率,且行和为1

1
2
3
4
def softmax(X):
X_exp = X.exp()
partition = X_exp.sum(axis = 1, keepdims = True)
return X_exp / partition
1
2
3
X = nd.random.normal(shape=(2, 5))
X_prob = softmax(X)
X_prob, X_prob.sum(axis=1)
(
 [[0.21324193 0.33961776 0.1239742  0.27106097 0.05210521]
  [0.11462264 0.3461234  0.19401033 0.29583326 0.04941036]]
 <NDArray 2x5 @cpu(0)>,

 [1.0000001 1.       ]
 <NDArray 2 @cpu(0)>)

定义模型、损失函数、分类准确率

1
2
def net(X):
return softmax(nd.dot(X.reshape(-1, num_inputs), W) + b)
1
2
3
y_hat = nd.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = nd.array([0, 2], dtype='int32')
nd.pick(y_hat, y)
[0.1 0.5]
<NDArray 2 @cpu(0)>
1
2
def cross_entropy(y_hat, y):
return -nd.pick(y_hat, y).log()
1
2
def accuracy(y_hat, y):
return (y_hat.argmax(axis=1) == y.astype('float32')).mean().asscalar()
1
accuracy(y_hat, y)
0.5
1
2
3
4
5
6
7
def evaluate_accuracy(data_iter, net):
acc_sum, n = 0.0, 0
for X, y in data_iter:
y = y.astype('float32')
acc_sum += (net(X).argmax(axis=1) == y).sum().asscalar()
n += y.size
return acc_sum / n
1
evaluate_accuracy(test_iter, net)
0.0925

训练模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
num_epochs, lr =10, 0.05

def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
params=None, lr=None, trainer=None):
for epoch in range(num_epochs):
train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
for X, y in train_iter:
with autograd.record():
y_hat = net(X)
l = loss(y_hat, y).sum()
l.backward()
if trainer is None:
d2l.sgd(params, lr, batch_size)
else:
trainer.step(batch_size) # “softmax回归的简洁实现”一节将用到
y = y.astype('float32')
train_l_sum += l.asscalar()
train_acc_sum += (y_hat.argmax(axis=1) == y).sum().asscalar()
n += y.size
test_acc = evaluate_accuracy(test_iter, net)
print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size,
[W, b], lr)
epoch 1, loss 0.3821, train acc 0.868, test acc 0.860
epoch 2, loss 0.3823, train acc 0.868, test acc 0.857
epoch 3, loss 0.3820, train acc 0.868, test acc 0.858
epoch 4, loss 0.3819, train acc 0.869, test acc 0.858
epoch 5, loss 0.3817, train acc 0.868, test acc 0.858
epoch 6, loss 0.3817, train acc 0.868, test acc 0.857
epoch 7, loss 0.3813, train acc 0.868, test acc 0.860
epoch 8, loss 0.3813, train acc 0.868, test acc 0.858
epoch 9, loss 0.3812, train acc 0.868, test acc 0.857
epoch 10, loss 0.3813, train acc 0.868, test acc 0.859

预测

第一行为真实标签,第二行为预测标签,第三行为图像

1
2
3
4
5
6
7
8
for X, y in test_iter:
break

true_labels = d2l.get_fashion_mnist_labels(y.asnumpy())
pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1).asnumpy())
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]

d2l.show_fashion_mnist(X[0:9], titles[0:9])