Run this notebook online: or Colab:

# 2.4. 微分¶

.. _fig_circle_area:

## 2.4.1. 导数和微分¶

(**

(2.4.1)$f'(x) = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h},$

**) :label: eq_derivative

NDManager manager = NDManager.newBaseManager();
Function<Double, Double> f = x -> (3 * Math.pow(x, 2) -4 * x);

public Double numericalLim(Function<Double, Double> f, double x, double h) {
return (f.apply(x+h) - f.apply(x)) / h;
}

double h = 0.1;
for (int i=0; i < 5; i++) {
System.out.println("h=" + String.format("%.5f", h) + ", numerical limit="
+ String.format("%.5f", numericalLim(f, 1, h)));
h *= 0.1;
}
h=0.10000, numerical limit=2.30000
h=0.01000, numerical limit=2.03000
h=0.00100, numerical limit=2.00300
h=0.00010, numerical limit=2.00030
h=0.00001, numerical limit=2.00003

(2.4.2)$f'(x) = y' = \frac{dy}{dx} = \frac{df}{dx} = \frac{d}{dx} f(x) = Df(x) = D_x f(x),$

• $$DC = 0$$$$C$$ 是一个常数）

• $$Dx^n = nx^{n-1}$$幂律（power rule）, $$n$$是任意实数）

• $$De^x = e^x$$

• $$D\ln(x) = 1/x$$

(2.4.3)$\frac{d}{dx} [Cf(x)] = C \frac{d}{dx} f(x),$

(2.4.4)$\frac{d}{dx} [f(x) + g(x)] = \frac{d}{dx} f(x) + \frac{d}{dx} g(x),$

(2.4.5)$\frac{d}{dx} [f(x)g(x)] = f(x) \frac{d}{dx} [g(x)] + g(x) \frac{d}{dx} [f(x)],$

(2.4.6)$\frac{d}{dx} \left[\frac{f(x)}{g(x)}\right] = \frac{g(x) \frac{d}{dx} [f(x)] - f(x) \frac{d}{dx} [g(x)]}{[g(x)]^2}.$

public Figure plotLineAndSegment(double[] x, double[] y, double[] segment,
String trace1Name, String trace2Name,
String xLabel, String yLabel,
int width, int height) {
ScatterTrace trace = ScatterTrace.builder(x, y)
.mode(ScatterTrace.Mode.LINE)
.name(trace1Name)
.build();

ScatterTrace trace2 = ScatterTrace.builder(x, segment)
.mode(ScatterTrace.Mode.LINE)
.name(trace2Name)
.build();

Layout layout = Layout.builder()
.height(height)
.width(width)
.showLegend(true)
.xAxis(Axis.builder().title(xLabel).build())
.yAxis(Axis.builder().title(yLabel).build())
.build();

return new Figure(layout, trace, trace2);
}

NDArray X = manager.arange(0f, 3f, 0.1f, DataType.FLOAT64);
double[] x = X.toDoubleArray();

double[] fx = new double[x.length];
for (int i=0; i < x.length; i++) {
fx[i] = f.apply(x[i]);
}

double[] fg = new double[x.length];
for (int i=0; i < x.length; i++) {
fg[i] = 2*x[i]-3;
}

plotLineAndSegment(x, fx, fg, "f(x)", "Tangent line(x=1)", "x", "f(x)", 700, 500)

## 2.4.2. 偏导数¶

$$y = f(x_1, x_2, \ldots, x_n)$$ 是一个具有 $$n$$ 个变量的函数。$$y$$ 关于第$$i$$ 个参数$$x_i$$偏导数（partial derivative）为：

(2.4.7)$\frac{\partial y}{\partial x_i} = \lim_{h \rightarrow 0} \frac{f(x_1, \ldots, x_{i-1}, x_i+h, x_{i+1}, \ldots, x_n) - f(x_1, \ldots, x_i, \ldots, x_n)}{h}.$

(2.4.8)$\frac{\partial y}{\partial x_i} = \frac{\partial f}{\partial x_i} = f_{x_i} = f_i = D_i f = D_{x_i} f.$

## 2.4.3. 梯度¶

(2.4.9)$\nabla_{\mathbf{x}} f(\mathbf{x}) = \bigg[\frac{\partial f(\mathbf{x})}{\partial x_1}, \frac{\partial f(\mathbf{x})}{\partial x_2}, \ldots, \frac{\partial f(\mathbf{x})}{\partial x_n}\bigg]^\top,$

• 对于所有$$\mathbf{A} \in \mathbb{R}^{m \times n}$$，都有 $$\nabla_{\mathbf{x}} \mathbf{A} \mathbf{x} = \mathbf{A}^\top$$

• 对于所有$$\mathbf{A} \in \mathbb{R}^{n \times m}$$，都有 $$\nabla_{\mathbf{x}} \mathbf{x}^\top \mathbf{A} = \mathbf{A}$$

• 对于所有$$\mathbf{A} \in \mathbb{R}^{n \times n}$$，都有 $$\nabla_{\mathbf{x}} \mathbf{x}^\top \mathbf{A} \mathbf{x} = (\mathbf{A} + \mathbf{A}^\top)\mathbf{x}$$

• $$\nabla_{\mathbf{x}} \|\mathbf{x} \|^2 = \nabla_{\mathbf{x}} \mathbf{x}^\top \mathbf{x} = 2\mathbf{x}$$

## 2.4.4. 链式法则¶

(2.4.10)$\frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx}.$

(2.4.11)$\frac{dy}{dx_i} = \frac{dy}{du_1} \frac{du_1}{dx_i} + \frac{dy}{du_2} \frac{du_2}{dx_i} + \cdots + \frac{dy}{du_m} \frac{du_m}{dx_i}$

## 2.4.5. 小结¶

• 微分和积分是微积分的两个分支，其中前者可以应用于深度学习中无处不在的优化问题。

• 导数可以被解释为函数相对于其变量的瞬时变化率。它也是函数曲线的切线的斜率。

• 梯度是一个向量，其分量是多变量函数相对于其所有变量的偏导数。

• 链式法则使我们能够微分复合函数。

## 2.4.6. 练习¶

1. 绘制函数$$y = f(x) = x^3 - \frac{1}{x}$$和其在$$x = 1$$处切线的图像。

2. 求函数$$f(\mathbf{x}) = 3x_1^2 + 5e^{x_2}$$的梯度。

3. 函数$$f(\mathbf{x}) = \|\mathbf{x}\|_2$$的梯度是什么？

4. 你可以写出函数$$u = f(x, y, z)$$，其中$$x = x(a, b)$$$$y = y(a, b)$$$$z = z(a, b)$$的链式法则吗?