A Derivation of Taylor's Theorem

In my video How to Extend the Sum of Any* Function, I spent a decent amount of time building up a tower of increasingly nested sums that looked similar to this:

$\begin{align*} & \Delta^0 f(x_0)\\ +\ &\sum_{k_1=0}^{x-1} \Delta^1 f(x_0)\\ +\ &\sum_{k_1=0}^{x-1} \sum_{k_2=0}^{k_1-1} \Delta^2 f(x_0)\\ +\ &\sum_{k_1=0}^{x-1} \sum_{k_2=0}^{k_1-1} \sum_{k_3=0}^{k_2-1} \Delta^3 f(x_0)\\ \vdots\ \ &\\ +\ &\underbrace{\sum_{k_1=0}^{x-1} \sum_{k_2=0}^{k_1-1} \cdots \sum_{k_n=0}^{k_{n-1}-1}}_{n\text{ sums}} \Delta^{n} f(x_0). \end{align*}$

I then showed that each line of this expression is simply equal to a binomial coefficient, so the whole thing reduces to

$\sum_{k=0}^n \binom xk \Delta^k f(x_0).$

If you made it to the end of the video, you’ll know the plot twist: This is the discrete version of a Taylor polynomial! Notice the similarity:

$\sum_{k=0}^n \binom xk \Delta^k f(x_0) \qquad \qquad \sum_{k=0}^n \frac{x^n}{n!} f^{(k)}(x_0).$

In this post I’ll show that if we apply the same line of reasoning to integrals, we end up with a derivation of Taylor polynomials, including Lagrange’s form of the error term. Then we’ll see that Taylor’s theorem also applies to complex functions, though with a looser error bound.

Taylor’s Theorem for Real Functions

We’ll assume that $f: \mathbb{R} \to \mathbb{R}$ is $n+1$ times differentiable and that $f^{(n+1)}$ is continuous. This continuity is not required for the theorem to hold, but it makes the derivation easier. Also for brevity we’ll be centering the Taylor polynomials at $0$ (so specifically we’re looking at Maclaurin series), and we’ll always consider $x$ to be positive. It’s easy to modify the result to remove these constraints – in fact we do that and more in the next section.

In my video, we arrived at that tower of nested sums by repeatedly applying the fundamental theorem of discrete calculus:

$f(x_0 + a) = f(x_0) + \sum_{k=0}^{a-1} \Delta f(x_0 + k).$

Here we’ll do the same thing, but with the plain old fundamental theorem of calculus:

$f(x) = f(0) + \int_0^x f'(t) dt. \tag{1}$

To start, let’s apply the fundamental theorem of calculus $(1)$ to the $f'(t)$ in its own definition:

$\begin{aligned} f(x) &= f(0) + \int_0^x \left( f'(0) + \int_0^{t_1} f''(t_2) dt_2 \right) dt_1\\ &= f(0) + \int_0^x f'(0) dt_1 + \int_0^x \int_0^{t_1} f''(t_2) dt_2 dt_1 \end{aligned}$

We’ve introduced a double integral. Let’s see what happens when we again apply $(1)$ to the $f''(t_2)$ in this expression:

$\begin{aligned} f(x) &= f(0) + \int_0^x f'(0) dt_1 + \int_0^x \int_0^{t_1} \left(f''(0) + \int_0^{x_2} f'''(t_3) dt_3 \right) dt_2 dt_1\\ &= f(0) + \int_0^x f'(0) dt_1 + \int_0^x \int_0^{t_1} f''(0) dt_2 dt_1 + \int_0^x \int_0^{t_1} \int_0^{x_2} f'''(t_3) dt_3 dt_2 dt_1 \end{aligned}$

Now we’ve added a triple integral. And we can keep going! We can apply $(1)$ to the $f'''(t_3)$ to introduce a quadruple integral, and so on. After repeating the process $n$ times, we’ll have

$\begin{aligned} f(x) =\ &f(0)\\ +\ &\int_0^x f'(0) dt_1\\ +\ &\int_0^x \int_0^{t_1} f''(0) dt_2 dt_1\\ +\ &\int_0^x \int_0^{t_1} \int_0^{t_2} f'''(0) dt_3 dt_2 dt_1\\ \vdots\ \ &\\ +\ &\underbrace{\int_0^x \int_0^{t_1} \cdots \int_0^{t_{n-1}}}_{n\text{ integrals}} f^{(n)}(0) dt_n \cdots dt_2 dt_1 \\ +\ &\underbrace{\int_0^x \int_0^{t_1} \cdots \int_0^{t_{n}}}_{n+1\text{ integrals}} f^{(n+1)}(t_{n+1}) dt_{n+1} \cdots dt_2 dt_1. \end{aligned}$

Apart from the last line, this is the same as the tower of nested sums at the beginning of this post. Except, of course, all the sums have been replaced with integrals.

Note that, again with the exception of the last line, the integrands are all constants. This makes the nested integrals extremely easy to compute. I’ll leave it to you to verify that

$\underbrace{\int_0^x \int_0^{t_1} \cdots \int_0^{t_{n-1}}}_{n\text{ integrals}} f^{(n)}(0) dt_n \cdots dt_2 dt_1 = \frac{x^n}{n!} f^{(n)}(0).$

Therefore our tower of nested integrals is just

$\begin{equation} \tag{2} \begin{split} f(x) = f(0) + xf'(0) + \frac{x^2}2 f''(0) + \cdots + \frac{x^n}{n!}f^{(n)}(0) \qquad\quad\ \ \\ +\ \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}}\ f^{(n+1)}(t_{n+1})\ dt_{n+1} \cdots dt_2 dt_1. \end{split} \end{equation}$

The top of the right-hand side is the degree $n$ Taylor polynomial of $f$ , centered at $x=0$ . Let’s denote this by $T_n$ :

$T_n(x) = f(0) + xf'(0) + \frac{x^2}2 f''(0) + \cdots + \frac{x^n}{n!}f^{(n)}(0).$

By subtracting $T_n$ from both sides of $(2)$ we obtain

$f(x) - T_n(x) = \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}}\ f^{(n+1)}(t_{n+1})\ dt_{n+1} \cdots dt_2 dt_1. \tag{3}$

So the difference between $f(x)$ and $T_n(x)$ is this last remaining nested integral. In other words, the nested integral is the error in the approximation of $f$ by Taylor polynomials. All that’s left to do is to bound that error term.

Fortunately, we’ve assumed that $f^{(n+1)}$ is continuous. Therefore, by the extreme value theorem, it will have a maximum on $[0, x]$ , so

$\begin{aligned} f(x) - T_n(x) &\leq \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}}\ \max_{s \in [0, x]}f^{(n+1)}(s)\ dt_{n+1} \cdots dt_2 dt_1 \\ &\leq \frac{x^{n+1}}{(n+1)!} \max_{s \in [0, x]}f^{(n+1)}(s). \end{aligned}$

We can also do the same thing with the function’s minimum, after which we will have both lower and upper bounds for the Taylor remainder:

$\frac{x^{n+1}}{(n+1)!} \min_{s \in [0, x]}f^{(n+1)}(s) \leq f(x) - T_n(x) \leq \frac{x^{n+1}}{(n+1)!} \max_{s \in [0, x]}f^{(n+1)}(s).$

Finally, again by the continuity of $f^{(n+1)}$ , the intermediate value theorem tells us that there exists some number $\xi \in [0, x]$ for which

$f(x) - T_n(x) = \frac{x^{n+1}}{(n+1)!} f^{(n+1)}(\xi).$

We have just derived the Lagrange form of the Taylor remainder!

Taylor’s theorem with Lagrange error term:
Suppose $f$ is a real function and $f^{(n+1)}$ is continuous. Then there exists a number $\xi \in [0,x]$ such that

$f(x) = T_n(x) + \frac{x^{n+1}}{(n+1)!} f^{(n+1)}(\xi).$

Bonus: Taylor’s Theorem for Complex Functions

Everything up to and including $(3)$ is perfectly valid even if the output of $f$ is complex. However, the later steps relied on $f$ being real-valued, so the Lagrange form of the remainder is only valid for $f : \mathbb{R} \to \mathbb{R}$ . But we can still find a useful bound for complex functions.

To start, suppose that $f$ is a complex-valued function of a real variable. Take the modulus of both sides of $(3)$ to see that

$\begin{align} |f(x) - T_n(x)| &= \left|\underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}} f^{(n+1)}(t_{n+1}) dt_{n+1} \cdots dt_2 dt_1 \right| \notag\\ &\leq \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}} \left| f^{(n+1)}(t_{n+1})\right| dt_{n+1} \cdots dt_2 dt_1 \notag\\ &\leq \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}} \sup_{s \in [0,x]} \left| f^{(n+1)}(s)\right| dt_{n+1} \cdots dt_2 dt_1 \notag\\ &= \frac{x^{n+1}}{(n+1)!} \sup_{s \in [0,x]} \left| f^{(n+1)}(s)\right|. \tag{4} \end{align}$

With this, we have a bound of the Taylor remainder for $f : \mathbb{R} \to \mathbb{C}$ . It turns out we can also apply this bound for $f : \mathbb{C} \to \mathbb{C}$ . To be precise,

Taylor’s theorem for complex functions:
Suppose $f$ is a holomorphic function on an open set $\Omega \subseteq \mathbb{C}$ . Let $L$ be the line segment connecting two points $z_0$ and $z$ . If $L \subseteq \Omega$ , then

$|f(z) - T_n(z)| \leq \frac{|z - z_0|^{n+1}}{(n+1)!} \sup_{\zeta \in L} \left| f^{(n+1)}(\zeta) \right|,$

where $T_n$ is the degree $n$ Taylor polynomial of $f$ centered at $z_0$ .

To prove this, we just have to translate, rotate, and scale the complex plane to move $L$ to the real axis so that we can use $(4)$ . To this end, note that the change of variables $t \mapsto z_0 + t(z - z_0)$ maps the interval $[0,1]$ to $L$ . Define $g : [0,1] \to \mathbb{C}$ by

$g(t) = f(z_0 + t(z-z_0)),$

and let $S_n$ denote the degree $n$ Taylor polynomial of $g$ centered at 0. Clearly $g(1) = f(z)$ and

$\begin{aligned} S_n(1) &= \sum_{k=0}^n \frac{g^{(k)}(0)}{k!}\\ &= \sum_{k=0}^n \frac{f^{(k)}(z_0)}{k!} (z - z_0)^k\\ &= T_n(z). \end{aligned}$

Apply $(4)$ to $|g(1) - S_n(1)|$ to see that

$\begin{aligned} |f(z) - T_n(z)| &= |g(1) - S_n(1)|\\ &\leq \frac{1^{n+1}}{(n+1)!} \sup_{s \in [0,1]} \left| g^{(n+1)}(s)\right|\\ &= \frac{1}{(n+1)!} \sup_{s \in [0,1]} |z-z_0|^{n+1} \left| f^{(n+1)}(z_0 + s(z - z_0))\right|\\ &= \frac{|z-z_0|^{n+1}}{(n+1)!} \sup_{\zeta \in L} \left| f^{(n+1)}(\zeta)\right|. \end{aligned}$

$\square$

Strangely, people tend not to mention this form of the theorem for complex functions. Taylor’s theorem is almost always stated strictly in terms of functions from $\mathbb{R}$ to $\mathbb{R}$ . It seems to be an accepted fact that other theorems from complex analysis yield stronger results. For example, Wikipedia claims that the usefulness of Taylor’s theorem is “dwarfed” by other general results in complex analysis. But unless I am missing something, this is not universally true.

There are cases where a function is large, but its $(n+1)$ st derivative is small. For example, suppose that we want to determine the accuracy of Taylor polynomials for $f(z) = z^{3.5}$ centered very far from the origin. Since $f^{(4)}(z)$ is small when $z$ is far from the origin, Taylor’s theorem tells us that a degree 3 Taylor approximation will have a very small error. This result does not follow from the Cauchy inequalities or any obvious corollary of Cauchy’s integral formula, though these results are sometimes said to be more powerful than Taylor’s theorem.

Previous: Higher-Order Monotonicity

Next: Proof of the Fractional Sum Definition of the Zeta Function