Run this notebook online:\ |Binder| or Colab: |Colab|

.. |Binder| image:: https://mybinder.org/badge_logo.svg
   :target: https://mybinder.org/v2/gh/deepjavalibrary/d2l-java/master?filepath=chapter_multilayer-perceptrons/mlp-scratch.ipynb
.. |Colab| image:: https://colab.research.google.com/assets/colab-badge.svg
   :target: https://colab.research.google.com/github/deepjavalibrary/d2l-java/blob/colab/chapter_multilayer-perceptrons/mlp-scratch.ipynb

.. _sec_mlp_scratch:

多层感知机的从零开始实现
========================


我们已经在数学上描述了多层感知机（MLP），现在让我们尝试自己实现一个多层感知机。为了与我们之前使用softmax回归（
:numref:`sec_softmax_scratch`
）获得的结果进行比较，我们将继续使用Fashion-MNIST图像分类数据集（
:numref:`sec_fashion_mnist`\ ）。

.. code:: java

    %load ../utils/djl-imports
    %load ../utils/plot-utils
    %load ../utils/DataPoints.java
    %load ../utils/Training.java
    %load ../utils/Accumulator.java

.. code:: java

    import ai.djl.basicdataset.cv.classification.*;
    import org.apache.commons.lang3.ArrayUtils;

.. code:: java

    int batchSize = 256;
    
    FashionMnist trainIter = FashionMnist.builder()
            .optUsage(Dataset.Usage.TRAIN)
            .setSampling(batchSize, true)
            .optLimit(Long.getLong("DATASET_LIMIT", Long.MAX_VALUE))
            .build();
    
    FashionMnist testIter = FashionMnist.builder()
            .optUsage(Dataset.Usage.TEST)
            .setSampling(batchSize, true)
            .optLimit(Long.getLong("DATASET_LIMIT", Long.MAX_VALUE))
            .build();
                                
    trainIter.prepare();
    testIter.prepare();

初始化模型参数
--------------

回想一下，Fashion-MNIST中的每个图像由\ :math:`28 \times 28 = 784`\ 个灰度像素值组成。所有图像共分为10个类别。忽略像素之间的空间结构，我们可以将每个图像视为具有784个输入特征和10个类的简单分类数据集。首先，我们将实现一个具有单隐藏层的多层感知机，它包含256个隐藏单元。注意，我们可以将这两个量都视为超参数。通常，我们选择2的若干次幂作为层的宽度。因为内存在硬件中的分配和寻址方式，这么做往往可以在计算上更高效。

我们用几个\ ``NDArray``\ 来表示我们的参数。注意，对于每一层我们都要记录一个权重矩阵和一个偏置向量。跟以前一样，我们要为这些参数的损失的梯度分配内存。

.. code:: java

    int numInputs = 784;
    int numOutputs = 10;
    int numHiddens = 256;
    
    NDManager manager = NDManager.newBaseManager();
    
    NDArray W1 = manager.randomNormal(0, 0.01f, new Shape(numInputs, numHiddens), DataType.FLOAT32);
    NDArray b1 = manager.zeros(new Shape(numHiddens));
    NDArray W2 = manager.randomNormal(0, 0.01f, new Shape(numHiddens, numOutputs), DataType.FLOAT32);
    NDArray b2 = manager.zeros(new Shape(numOutputs));
    
    NDList params = new NDList(W1, b1, W2, b2);
    
    for (NDArray param : params) {
        param.setRequiresGradient(true);
    }

激活函数
--------

为了确保我们知道一切是如何工作的，我们将使用最大值函数自己实现ReLU激活函数，而不是直接调用内置的\ ``relu``\ 函数。

.. code:: java

    public NDArray relu(NDArray X){
        return X.maximum(0f);
    }

模型
----

因为我们忽略了空间结构，所以我们使用\ ``reshape``\ 将每个二维图像转换为一个长度为\ ``numInputs``\ 的向量。我们只需几行代码就可以实现我们的模型。

.. code:: java

    public NDArray net(NDArray X) {
        X = X.reshape(new Shape(-1, numInputs));
        NDArray H = relu(X.dot(W1).add(b1));
        return H.dot(W2).add(b2);
    }

损失函数
--------

为了确保数值稳定性，同时由于我们已经从零实现过softmax函数（
:numref:`sec_softmax_scratch`
），因此在这里我们直接使用高级API中的内置函数来计算softmax和交叉熵损失。回想一下我们之前在
:numref:`subsec_softmax-implementation-revisited`
中对这些复杂问题的讨论。我们鼓励感兴趣的读者查看\ ``Loss.SoftmaxCrossEntropyLoss``\ 的源代码，以加深对实现细节的了解。

.. code:: java

    Loss loss = Loss.softmaxCrossEntropyLoss();

训练
----

幸运的是，多层感知机的训练过程与softmax回归的训练过程完全相同。可以使用和第三章类似的代码来训练模型（参见
:numref:`sec_softmax_scratch`
），将迭代周期数设置为10，并将学习率设置为0.1.

.. code:: java

    int numEpochs = Integer.getInteger("MAX_EPOCH", 10);
    float lr = 0.5f;
    
    double[] trainLoss = new double[numEpochs];
    double[] trainAccuracy = new double[numEpochs];
    double[] testAccuracy = new double[numEpochs];
    double[] epochCount = new double[numEpochs];

为了对学习到的模型进行评估，我们将在一些测试数据上应用这个模型。

.. code:: java

    float epochLoss = 0f;
    float accuracyVal = 0f;
    
    for (int epoch = 1; epoch <= numEpochs; epoch++) {
        
            System.out.print("Running epoch " + epoch + "...... ");
            // Iterate over dataset
            for (Batch batch : trainIter.getData(manager)) {
    
                NDArray X = batch.getData().head();
                NDArray y = batch.getLabels().head();
    
                try(GradientCollector gc = Engine.getInstance().newGradientCollector()) {
                    NDArray yHat = net(X); // net function call
    
                    NDArray lossValue = loss.evaluate(new NDList(y), new NDList(yHat));
                    NDArray l = lossValue.mul(batchSize);
                    
                    accuracyVal += Training.accuracy(yHat, y);
                    epochLoss += l.sum().getFloat();
                    
                    gc.backward(l); // gradient calculation
                }
                
                batch.close();
                Training.sgd(params, lr, batchSize); // updater
            }
        
            trainLoss[epoch-1] = epochLoss/trainIter.size();
            trainAccuracy[epoch-1] = accuracyVal/trainIter.size();
    
            epochLoss = 0f;
            accuracyVal = 0f;    
            // testing now
            for (Batch batch : testIter.getData(manager)) {
    
                NDArray X = batch.getData().head();
                NDArray y = batch.getLabels().head();
    
                NDArray yHat = net(X); // net function call
                accuracyVal += Training.accuracy(yHat, y);
            }
        
            testAccuracy[epoch-1] = accuracyVal/testIter.size();
            epochCount[epoch-1] = epoch;
            accuracyVal = 0f;
            System.out.println("Finished epoch " + epoch);
    }
    
    System.out.println("Finished training!");


.. parsed-literal::
    :class: output

    Running epoch 1...... Finished epoch 1
    Running epoch 2...... Finished epoch 2
    Running epoch 3...... Finished epoch 3
    Running epoch 4...... Finished epoch 4
    Running epoch 5...... Finished epoch 5
    Running epoch 6...... Finished epoch 6
    Running epoch 7...... Finished epoch 7
    Running epoch 8...... Finished epoch 8
    Running epoch 9...... Finished epoch 9
    Running epoch 10...... Finished epoch 10
    Finished training!


.. code:: java

    String[] lossLabel = new String[trainLoss.length + testAccuracy.length + trainAccuracy.length];
    
    Arrays.fill(lossLabel, 0, trainLoss.length, "train loss");
    Arrays.fill(lossLabel, trainAccuracy.length, trainLoss.length + trainAccuracy.length, "train acc");
    Arrays.fill(lossLabel, trainLoss.length + trainAccuracy.length,
                    trainLoss.length + testAccuracy.length + trainAccuracy.length, "test acc");
    
    Table data = Table.create("Data").addColumns(
        DoubleColumn.create("epochCount", ArrayUtils.addAll(epochCount, ArrayUtils.addAll(epochCount, epochCount))),
        DoubleColumn.create("loss", ArrayUtils.addAll(trainLoss, ArrayUtils.addAll(trainAccuracy, testAccuracy))),
        StringColumn.create("lossLabel", lossLabel)
    );
    
    render(LinePlot.create("", data, "epochCount", "loss", "lossLabel"),"text/html");


.. raw:: html

    <img id="c594a59e11b441cfa9b7f7ee9c1fad9f_img"></img>
    <div id="c594a59e11b441cfa9b7f7ee9c1fad9f"></div>
    <script>require(['https://cdn.plot.ly/plotly-1.57.0.min.js'], Plotly => {
    var target_c594a59e11b441cfa9b7f7ee9c1fad9f = document.getElementById('c594a59e11b441cfa9b7f7ee9c1fad9f');
    var layout = {
        height: 600,
        width: 800,
        showlegend: true,
        xaxis: {
        title: 'epochCount',
        },
    
        yaxis: {
        title: 'loss',
        },
    
    };
    
    var trace0 =
    {
    x: ["1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","10.0"],
    y: ["0.8057160973548889","0.48186174035072327","0.4258103370666504","0.3915179669857025","0.3733295500278473","0.35088902711868286","0.3367491066455841","0.3280346989631653","0.31841790676116943","0.3089645504951477"],
    showlegend: true,
    mode: 'lines',
    xaxis: 'x',
    yaxis: 'y',
    type: 'scatter',
    name: 'train loss',
    };
    var trace1 =
    {
    x: ["1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","10.0"],
    y: ["0.7028499841690063","0.8218500018119812","0.8430500030517578","0.856083333492279","0.860966682434082","0.8711000084877014","0.8748000264167786","0.8780500292778015","0.8817499876022339","0.8858166933059692"],
    showlegend: true,
    mode: 'lines',
    xaxis: 'x',
    yaxis: 'y',
    type: 'scatter',
    name: 'train acc',
    };
    var trace2 =
    {
    x: ["1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","10.0"],
    y: ["0.7519000172615051","0.8256000280380249","0.8061000108718872","0.8435999751091003","0.8504999876022339","0.8504999876022339","0.8363000154495239","0.8518000245094299","0.8109999895095825","0.8697999715805054"],
    showlegend: true,
    mode: 'lines',
    xaxis: 'x',
    yaxis: 'y',
    type: 'scatter',
    name: 'test acc',
    };
    
    
    var data = [ trace0, trace1, trace2];
    Plotly.newPlot(target_c594a59e11b441cfa9b7f7ee9c1fad9f, data, layout);
    })</script>


小结
----

-  我们看到即使手动实现一个简单的多层感知机也是很容易的。
-  然而，如果有大量的层，从零开始实现多层感知机会变得很麻烦（例如，要命名和记录模型的参数）。

练习
----

1. 在所有其他参数保持不变的情况下，更改超参数\ ``numHiddens``\ 的值，并查看此超参数的变化对结果有何影响。确定此超参数的最佳值。
2. 尝试添加更多的隐藏层，并查看它对结果有何影响。
3. 改变学习速率会如何影响结果？保持模型结构和其他超参数(包括迭代周期数)不变，学习率设置为多少会带来最好的结果？
4. 通过对所有超参数(学习率、迭代周期数、隐藏层数、每层的隐藏单元数)进行联合优化，可以得到的最佳结果是什么？
5. 描述为什么涉及多个超参数更具挑战性。
6. 如果要构建多个超参数的搜索方法，你能想到的最聪明的策略是什么？