A Derivation of Taylor's Theorem

In my video How to Extend the Sum of Any* Function, I spent a decent amount of time building up a tower of increasingly nested sums that looked similar to this:

Δ0f(x0)+ k1=0x1Δ1f(x0)+ k1=0x1k2=0k11Δ2f(x0)+ k1=0x1k2=0k11k3=0k21Δ3f(x0)  + k1=0x1k2=0k11kn=0kn11n sumsΔnf(x0).\begin{align*} & \Delta^0 f(x_0)\\ +\ &\sum_{k_1=0}^{x-1} \Delta^1 f(x_0)\\ +\ &\sum_{k_1=0}^{x-1} \sum_{k_2=0}^{k_1-1} \Delta^2 f(x_0)\\ +\ &\sum_{k_1=0}^{x-1} \sum_{k_2=0}^{k_1-1} \sum_{k_3=0}^{k_2-1} \Delta^3 f(x_0)\\ \vdots\ \ &\\ +\ &\underbrace{\sum_{k_1=0}^{x-1} \sum_{k_2=0}^{k_1-1} \cdots \sum_{k_n=0}^{k_{n-1}-1}}_{n\text{ sums}} \Delta^{n} f(x_0). \end{align*}

I then showed that each line of this expression is simply equal to a binomial coefficient, so the whole thing reduces to

k=0n(xk)Δkf(x0).\sum_{k=0}^n \binom xk \Delta^k f(x_0).

If you made it to the end of the video, you’ll know the plot twist: This is the discrete version of a Taylor polynomial! Notice the similarity:

k=0n(xk)Δkf(x0)k=0nxnn!f(k)(x0).\sum_{k=0}^n \binom xk \Delta^k f(x_0) \qquad \qquad \sum_{k=0}^n \frac{x^n}{n!} f^{(k)}(x_0).

In this post I’ll show that if we apply the same line of reasoning to integrals, we end up with a derivation of Taylor polynomials, including Lagrange’s form of the error term. Then we’ll see that Taylor’s theorem also applies to complex functions, though with a looser error bound.

Taylor’s Theorem for Real Functions

We’ll assume that f:RRf: \mathbb{R} \to \mathbb{R} is n+1n+1 times differentiable and that f(n+1)f^{(n+1)} is continuous. This continuity is not required for the theorem to hold, but it makes the derivation easier. Also for brevity we’ll be centering the Taylor polynomials at 00 (so specifically we’re looking at Maclaurin series), and we’ll always consider xx to be positive. It’s easy to modify the result to remove these constraints – in fact we do that and more in the next section.

In my video, we arrived at that tower of nested sums by repeatedly applying the fundamental theorem of discrete calculus:

f(x0+a)=f(x0)+k=0a1Δf(x0+k).f(x_0 + a) = f(x_0) + \sum_{k=0}^{a-1} \Delta f(x_0 + k).

Here we’ll do the same thing, but with the plain old fundamental theorem of calculus:

f(x)=f(0)+0xf(t)dt.(1)f(x) = f(0) + \int_0^x f'(t) dt. \tag{1}

To start, let’s apply the fundamental theorem of calculus (1)(1) to the f(t)f'(t) in its own definition:

f(x)=f(0)+0x(f(0)+0t1f(t2)dt2)dt1=f(0)+0xf(0)dt1+0x0t1f(t2)dt2dt1\begin{aligned} f(x) &= f(0) + \int_0^x \left( f'(0) + \int_0^{t_1} f''(t_2) dt_2 \right) dt_1\\ &= f(0) + \int_0^x f'(0) dt_1 + \int_0^x \int_0^{t_1} f''(t_2) dt_2 dt_1 \end{aligned}

We’ve introduced a double integral. Let’s see what happens when we again apply (1)(1) to the f(t2)f''(t_2) in this expression:

f(x)=f(0)+0xf(0)dt1+0x0t1(f(0)+0x2f(t3)dt3)dt2dt1=f(0)+0xf(0)dt1+0x0t1f(0)dt2dt1+0x0t10x2f(t3)dt3dt2dt1\begin{aligned} f(x) &= f(0) + \int_0^x f'(0) dt_1 + \int_0^x \int_0^{t_1} \left(f''(0) + \int_0^{x_2} f'''(t_3) dt_3 \right) dt_2 dt_1\\ &= f(0) + \int_0^x f'(0) dt_1 + \int_0^x \int_0^{t_1} f''(0) dt_2 dt_1 + \int_0^x \int_0^{t_1} \int_0^{x_2} f'''(t_3) dt_3 dt_2 dt_1 \end{aligned}

Now we’ve added a triple integral. And we can keep going! We can apply (1)(1) to the f(t3)f'''(t_3) to introduce a quadruple integral, and so on. After repeating the process nn times, we’ll have

f(x)= f(0)+ 0xf(0)dt1+ 0x0t1f(0)dt2dt1+ 0x0t10t2f(0)dt3dt2dt1  + 0x0t10tn1n integralsf(n)(0)dtndt2dt1+ 0x0t10tnn+1 integralsf(n+1)(tn+1)dtn+1dt2dt1.\begin{aligned} f(x) =\ &f(0)\\ +\ &\int_0^x f'(0) dt_1\\ +\ &\int_0^x \int_0^{t_1} f''(0) dt_2 dt_1\\ +\ &\int_0^x \int_0^{t_1} \int_0^{t_2} f'''(0) dt_3 dt_2 dt_1\\ \vdots\ \ &\\ +\ &\underbrace{\int_0^x \int_0^{t_1} \cdots \int_0^{t_{n-1}}}_{n\text{ integrals}} f^{(n)}(0) dt_n \cdots dt_2 dt_1 \\ +\ &\underbrace{\int_0^x \int_0^{t_1} \cdots \int_0^{t_{n}}}_{n+1\text{ integrals}} f^{(n+1)}(t_{n+1}) dt_{n+1} \cdots dt_2 dt_1. \end{aligned}

Apart from the last line, this is the same as the tower of nested sums at the beginning of this post. Except, of course, all the sums have been replaced with integrals.

Note that, again with the exception of the last line, the integrands are all constants. This makes the nested integrals extremely easy to compute. I’ll leave it to you to verify that

0x0t10tn1n integralsf(n)(0)dtndt2dt1=xnn!f(n)(0).\underbrace{\int_0^x \int_0^{t_1} \cdots \int_0^{t_{n-1}}}_{n\text{ integrals}} f^{(n)}(0) dt_n \cdots dt_2 dt_1 = \frac{x^n}{n!} f^{(n)}(0).

Therefore our tower of nested integrals is just

f(x)=f(0)+xf(0)+x22f(0)++xnn!f(n)(0)  + 0x0t10tnn+1 integrals f(n+1)(tn+1) dtn+1dt2dt1.\begin{equation} \tag{2} \begin{split} f(x) = f(0) + xf'(0) + \frac{x^2}2 f''(0) + \cdots + \frac{x^n}{n!}f^{(n)}(0) \qquad\quad\ \ \\ +\ \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}}\ f^{(n+1)}(t_{n+1})\ dt_{n+1} \cdots dt_2 dt_1. \end{split} \end{equation}

The top of the right-hand side is the degree nn Taylor polynomial of ff, centered at x=0x=0. Let’s denote this by TnT_n:

Tn(x)=f(0)+xf(0)+x22f(0)++xnn!f(n)(0).T_n(x) = f(0) + xf'(0) + \frac{x^2}2 f''(0) + \cdots + \frac{x^n}{n!}f^{(n)}(0).

By subtracting TnT_n from both sides of (2)(2) we obtain

f(x)Tn(x)=0x0t10tnn+1 integrals f(n+1)(tn+1) dtn+1dt2dt1.(3)f(x) - T_n(x) = \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}}\ f^{(n+1)}(t_{n+1})\ dt_{n+1} \cdots dt_2 dt_1. \tag{3}

So the difference between f(x)f(x) and Tn(x)T_n(x) is this last remaining nested integral. In other words, the nested integral is the error in the approximation of ff by Taylor polynomials. All that’s left to do is to bound that error term.

Fortunately, we’ve assumed that f(n+1)f^{(n+1)} is continuous. Therefore, by the extreme value theorem, it will have a maximum on [0,x][0, x], so

f(x)Tn(x)0x0t10tnn+1 integrals maxs[0,x]f(n+1)(s) dtn+1dt2dt1xn+1(n+1)!maxs[0,x]f(n+1)(s).\begin{aligned} f(x) - T_n(x) &\leq \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}}\ \max_{s \in [0, x]}f^{(n+1)}(s)\ dt_{n+1} \cdots dt_2 dt_1 \\ &\leq \frac{x^{n+1}}{(n+1)!} \max_{s \in [0, x]}f^{(n+1)}(s). \end{aligned}

We can also do the same thing with the function’s minimum, after which we will have both lower and upper bounds for the Taylor remainder:

xn+1(n+1)!mins[0,x]f(n+1)(s)f(x)Tn(x)xn+1(n+1)!maxs[0,x]f(n+1)(s).\frac{x^{n+1}}{(n+1)!} \min_{s \in [0, x]}f^{(n+1)}(s) \leq f(x) - T_n(x) \leq \frac{x^{n+1}}{(n+1)!} \max_{s \in [0, x]}f^{(n+1)}(s).

Finally, again by the continuity of f(n+1)f^{(n+1)}, the intermediate value theorem tells us that there exists some number ξ[0,x]\xi \in [0, x] for which

f(x)Tn(x)=xn+1(n+1)!f(n+1)(ξ).f(x) - T_n(x) = \frac{x^{n+1}}{(n+1)!} f^{(n+1)}(\xi).

We have just derived the Lagrange form of the Taylor remainder!

Taylor’s theorem with Lagrange error term:
Suppose ff is a real function and f(n+1)f^{(n+1)} is continuous. Then there exists a number ξ[0,x]\xi \in [0,x] such that

f(x)=Tn(x)+xn+1(n+1)!f(n+1)(ξ).f(x) = T_n(x) + \frac{x^{n+1}}{(n+1)!} f^{(n+1)}(\xi).

Bonus: Taylor’s Theorem for Complex Functions

Everything up to and including (3)(3) is perfectly valid even if the output of ff is complex. However, the later steps relied on ff being real-valued, so the Lagrange form of the remainder is only valid for f:RRf : \mathbb{R} \to \mathbb{R}. But we can still find a useful bound for complex functions.

To start, suppose that ff is a complex-valued function of a real variable. Take the modulus of both sides of (3)(3) to see that

f(x)Tn(x)=0x0t10tnn+1 integralsf(n+1)(tn+1)dtn+1dt2dt10x0t10tnn+1 integralsf(n+1)(tn+1)dtn+1dt2dt10x0t10tnn+1 integralssups[0,x]f(n+1)(s)dtn+1dt2dt1=xn+1(n+1)!sups[0,x]f(n+1)(s).\begin{align} |f(x) - T_n(x)| &= \left|\underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}} f^{(n+1)}(t_{n+1}) dt_{n+1} \cdots dt_2 dt_1 \right| \notag\\ &\leq \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}} \left| f^{(n+1)}(t_{n+1})\right| dt_{n+1} \cdots dt_2 dt_1 \notag\\ &\leq \underbrace{\int_0^x \int_0^{t_1}\cdots\int_0^{t_n}}_{n+1\text{ integrals}} \sup_{s \in [0,x]} \left| f^{(n+1)}(s)\right| dt_{n+1} \cdots dt_2 dt_1 \notag\\ &= \frac{x^{n+1}}{(n+1)!} \sup_{s \in [0,x]} \left| f^{(n+1)}(s)\right|. \tag{4} \end{align}

With this, we have a bound of the Taylor remainder for f:RCf : \mathbb{R} \to \mathbb{C}. It turns out we can also apply this bound for f:CCf : \mathbb{C} \to \mathbb{C}. To be precise,

Taylor’s theorem for complex functions:
Suppose ff is a holomorphic function on an open set ΩC\Omega \subseteq \mathbb{C}. Let LL be the line segment connecting two points z0z_0 and zz. If LΩL \subseteq \Omega, then

f(z)Tn(z)zz0n+1(n+1)!supζLf(n+1)(ζ),|f(z) - T_n(z)| \leq \frac{|z - z_0|^{n+1}}{(n+1)!} \sup_{\zeta \in L} \left| f^{(n+1)}(\zeta) \right|,

where TnT_n is the degree nn Taylor polynomial of ff centered at z0z_0.

To prove this, we just have to translate, rotate, and scale the complex plane to move LL to the real axis so that we can use (4)(4). To this end, note that the change of variables tz0+t(zz0)t \mapsto z_0 + t(z - z_0) maps the interval [0,1][0,1] to LL. Define g:[0,1]Cg : [0,1] \to \mathbb{C} by

g(t)=f(z0+t(zz0)),g(t) = f(z_0 + t(z-z_0)),

and let SnS_n denote the degree nn Taylor polynomial of gg centered at 0. Clearly g(1)=f(z)g(1) = f(z) and

Sn(1)=k=0ng(k)(0)k!=k=0nf(k)(z0)k!(zz0)k=Tn(z).\begin{aligned} S_n(1) &= \sum_{k=0}^n \frac{g^{(k)}(0)}{k!}\\ &= \sum_{k=0}^n \frac{f^{(k)}(z_0)}{k!} (z - z_0)^k\\ &= T_n(z). \end{aligned}

Apply (4)(4) to g(1)Sn(1)|g(1) - S_n(1)| to see that

f(z)Tn(z)=g(1)Sn(1)1n+1(n+1)!sups[0,1]g(n+1)(s)=1(n+1)!sups[0,1]zz0n+1f(n+1)(z0+s(zz0))=zz0n+1(n+1)!supζLf(n+1)(ζ).\begin{aligned} |f(z) - T_n(z)| &= |g(1) - S_n(1)|\\ &\leq \frac{1^{n+1}}{(n+1)!} \sup_{s \in [0,1]} \left| g^{(n+1)}(s)\right|\\ &= \frac{1}{(n+1)!} \sup_{s \in [0,1]} |z-z_0|^{n+1} \left| f^{(n+1)}(z_0 + s(z - z_0))\right|\\ &= \frac{|z-z_0|^{n+1}}{(n+1)!} \sup_{\zeta \in L} \left| f^{(n+1)}(\zeta)\right|. \end{aligned}

\square

Strangely, people tend not to mention this form of the theorem for complex functions. Taylor’s theorem is almost always stated strictly in terms of functions from R\mathbb{R} to R\mathbb{R}. It seems to be an accepted fact that other theorems from complex analysis yield stronger results. For example, Wikipedia claims that the usefulness of Taylor’s theorem is “dwarfed” by other general results in complex analysis. But unless I am missing something, this is not universally true.

There are cases where a function is large, but its (n+1)(n+1)st derivative is small. For example, suppose that we want to determine the accuracy of Taylor polynomials for f(z)=z3.5f(z) = z^{3.5} centered very far from the origin. Since f(4)(z)f^{(4)}(z) is small when zz is far from the origin, Taylor’s theorem tells us that a degree 3 Taylor approximation will have a very small error. This result does not follow from the Cauchy inequalities or any obvious corollary of Cauchy’s integral formula, though these results are sometimes said to be more powerful than Taylor’s theorem.