Run this notebook online: or Colab:

# 6.3. 填充和跨步¶

(6.3.1)$(n_h-k_h+1) \times (n_w-k_w+1).$

## 6.3.1. 填充¶

Fig. 6.3.1 Two-dimensional cross-correlation with padding. The shaded portions are the input and kernel array elements used by the first output element: $$0\times0+0\times1+0\times2+0\times3=0$$.

(6.3.2)$(n_h-k_h+p_h+1)\times(n_w-k_w+p_w+1).$

%load ../utils/djl-imports

NDManager manager = NDManager.newBaseManager();
NDArray X = manager.randomUniform(0f, 1.0f, new Shape(1, 1, 8, 8));

// 请注意，这里每侧填充1行或1列，因此总共2行或1列
// 添加行或列
Block block = Conv2d.builder()
.setKernelShape(new Shape(3, 3))
.setFilters(1)
.build();

TrainingConfig config = new DefaultTrainingConfig(Loss.l2Loss());
Model model = Model.newInstance("conv2D");
model.setBlock(block);

Trainer trainer = model.newTrainer(config);
trainer.initialize(X.getShape());

NDArray yHat = trainer.forward(new NDList(X)).singletonOrThrow();
// 排除我们不感兴趣的前两个维度：批次和
// 频道
System.out.println(yHat.getShape().slice(2));

(8, 8)


When the height and width of the convolution kernel are different, we can make the output and input have the same height and width by setting different padding numbers for height and width.

// 这里，我们使用一个高度为5、宽度为3的卷积核。这个
// 高度和宽度两侧的填充号分别为2和1，
// 分别

block = Conv2d.builder()
.setKernelShape(new Shape(5, 3))
.setFilters(1)
.build();

model.setBlock(block);

trainer = model.newTrainer(config);
trainer.initialize(X.getShape());

yHat = trainer.forward(new NDList(X)).singletonOrThrow();
System.out.println(yHat.getShape().slice(2));

(8, 8)


## 6.3.2. 跨步¶

Fig. 6.3.2 Cross-correlation with strides of 3 and 2 for height and width respectively. The shaded portions are the output element and the input and core array elements used in its computation: $$0\times0+0\times1+1\times2+2\times3=8$$, $$0\times0+6\times1+0\times2+0\times3=6$$.

(6.3.3)$\lfloor(n_h-k_h+p_h+s_h)/s_h\rfloor \times \lfloor(n_w-k_w+p_w+s_w)/s_w\rfloor.$

block = Conv2d.builder()
.setKernelShape(new Shape(3, 3))
.optStride(new Shape(2,2))
.setFilters(1)
.build();

model.setBlock(block);

trainer = model.newTrainer(config);
trainer.initialize(X.getShape());

yHat = trainer.forward(new NDList(X)).singletonOrThrow();
System.out.println(yHat.getShape().slice(2));

(4, 4)


block = Conv2d.builder()
.setKernelShape(new Shape(3, 5))
.optStride(new Shape(3,4))
.setFilters(1)
.build();

model.setBlock(block);

trainer = model.newTrainer(config);
trainer.initialize(X.getShape());

yHat = trainer.forward(new NDList(X)).singletonOrThrow();
System.out.println(yHat.getShape().slice(2));

(2, 2)


## 6.3.3. 总结¶

• 填充可以增加输出的高度和宽度。这通常用于使输出与输入具有相同的高度和宽度。

• 步幅可以降低输出的分辨率，例如，将输出的高度和宽度降低到输入高度和宽度的 $$1/n$$$$n$$ 是大于 $$1$$ 的整数）。

• 填充和跨步可以有效地调整数据的维度。

## 6.3.4. 练习¶

1. 对于本节的最后一个示例，使用形状计算公式计算输出形状，以查看其是否与实验结果一致。

2. 在本节的实验中尝试其他填充和跨步组合。

3. 对于音频信号，$$2$$ 的步幅对应什么？

4. 大于 $$1$$ 的步幅有什么计算优势。