Higher-Order Monotonicity

Remember in calculus class, when you learned that a function $f$ is “increasing” if $f'(x) \geq 0$ , and that it’s “concave up” (or convex) if $f''(x) \geq 0$ ? What about all the higher-order cases, where $f'''(x) \geq 0$ , and beyond? Apparently nobody cared enough to name those cases.

Well, let’s name them! If $f^{(n)}(x) \geq 0$ , then we will say that $f$ is $n$ -increasing (or positive $n$ -monotone). So, nonnegative functions are $0$ -increasing, while “increasing” functions are $1$ -increasing, and convex functions are $2$ -increasing.

Although we just defined these using derivatives, $1$ -increasing and $2$ -increasing (convex) functions can be defined more generally, without any requirement of differentiability.

A function $f$ is $1$ -increasing if $f(a) \leq f(b)$ whenever $a < b$ .
A function is $2$ -increasing (convex) if for any two points on the function’s graph, the line segment connecting these points lies above the graph.

In this post, we will see a different way of looking at these definitions which elegantly extends to any order of monotonicity.

Rethinking Convexity

When we turn the above definition of convexity into a formula, we get something like this:

Convexity (Definition 1):
A function is $2$ -increasing, or convex, if for every triple $a < b < c$ ,

$f(b) \leq f(a) + (b-a) \frac{f(c) - f(a)}{c-a}.$

The right-hand side of this formula is the height of the line segment connecting $(a, f(a))$ to $(c, f(c))$ , at the $x$ coordinate of $b$ , so this is a correct interpretation of the function lying under the line segment. Here’s an example of a convex function, where you can play around with the three points.

Intuitively, a function should be convex if it “curves up”, and never curves down. I think the above definition captures this in a clever way. If the function lies below the line segment connecting two points, it makes sense that it ultimately has to “curve up” as it goes from one end to another. You may object that it could squiggle up and down as much as it wants under the line segment, but that worry is quelled by the fact that the function has to lie under every line segment. If it ever curved down, you could find a line segment that the function would rise above, so it wouldn’t be convex as per the definition.

But despite this definition’s visual clarity, it feels a little… asymmetrical. The numbers $a$ and $c$ represent the endpoints of the line segment, and $b$ is the point at which we check that the function lies below the line segment. It just feels odd to me that the three points play different roles. It seems like there should be some symmetrical way to phrase this where $a$ , $b$ , and $c$ all play the same part, regardless of which is between the other two.

But more importantly, this definition seems too specialized. It is not obvious by this definition that $2$ -monotonicity is a “sequel” to $1$ -monotonicity, and it is not clear at all how this definition could be modified to capture $3$ -monotonicity. The whole “line segment” idea is specific to convexity and convexity alone. But fortunately, there’s an equivalent way to define convexity that avoids this issue.

Forget Lines. Use Parabolas!

The reason for this definition’s asymmetry is that it involves three points, but a line segment is determined by two points. Thus one point must be the odd one out, playing a different role than the other two. But while a line is determined by two points, a parabola is determined by three points.

So let’s think about the unique parabola that passes through $(a, f(a))$ , $(b, f(b))$ , and $(c, f(c))$ . Here is an interactive demonstration:

No matter what you do, you can never get the parabola to curve down. This is why the function is convex: Every parabola passing through three of its points is convex.

You can convince yourself that this is indeed equivalent to the line-based definition. If the middle point lies under the line segment connecting the other two points, then the parabola must “curve up” to touch all three points. Conversely, if the parabola curves up, the middle point will always lie under the line connecting the other two points.

So this gives us a new definition of convexity:

Convexity (Definition 2):
A function $f$ is $2$ -increasing, or convex, if for every choice of three distinct numbers $a$ , $b$ , and $c$ , the leading coefficient of the unique parabola passing through $(a, f(a))$ , $(b, f(b))$ , and $(c, f(c))$ is nonnegative.

This definition is symmetrical! We did not have to specify the order of $a$ , $b$ , and $c$ ; all three numbers play the same role.

Extending the Definition to All Orders

Symmetry isn’t the only perk of this new definition. It’s also easy to extend to any order! We said that a function was $2$ -increasing if for any $3$ points, the degree $2$ polynomial passing through all those points has a nonnegative leading coefficient. Well, let’s adapt this to the other orders.

$\mathbf{0}$ -Increasing (Nonnegative):
A function $f$ is $0$ -increasing (nonnegative) if for every number $a$ , the leading coefficient of the unique degree $0$ polynomial passing through $(a, f(a))$ is nonnegative.

This is trivially true. A degree $0$ polynomial is a constant, and the constant which passes through $(a, f(a))$ is obviously just $f(a)$ . So this constant is nonnegative if and only if $f$ is nonnegative.

What about $1$ -increasing functions?

$\mathbf{1}$ -Increasing:
A function $f$ is $1$ -increasing (or just increasing) if for every choice of two distinct numbers $a$ and $b$ , the leading coefficient of the unique degree $1$ polynomial passing through $(a, f(a))$ and $(b, f(b))$ is nonnegative.

What do you know, it works! A degree $1$ polynomial is just a line, and line is increasing whenever its leading coefficient is nonnegative. Also a line segment is increasing if and only if its left point is below its right point. Clearly this matches the condition that $f(a) \leq f(b)$ whenever $a < b$ .

So continuing this pattern, the general definition is clear!

n-Increasing:
A function $f$ is $n$ -increasing if for every choice of distinct numbers $x_0, x_1, \dots, x_n$ , the leading coefficient of the unique degree $n$ polynomial passing through $(x_i, f(x_i))$ for all $i = 0, 1, \dots, n$ is nonnegative.

Divided Differences

Alright, we have a derivative-free definition of all orders of monotonicity. But it relies on us being able to find the polynomial that passes through a collection of points. There are a couple ways of doing this, this simplest of which is Lagrange interpolation. But there’s another way discovered by Newton that relies on a concept called divided differences. I will denote divided differences by the name of a function, followed by one or several numbers in square brackets. They are defined recursively as follows:

$\begin{aligned} f[x_0] &= f(x_0),\\ f[x_0, x_1, \dots, x_{n}] &= \frac{f[x_1, x_2, \dots, x_{n}] - f[x_0, x_1, \dots, x_{n-1}]}{x_n - x_0}. \end{aligned}$

For example,

$f[a, b] = \frac{f(b) - f(a)}{b - a},$

$f[a, b, c] = \frac{\frac{f(c) - f(b)}{c - b} - \frac{f(b) - f(a)}{b - a}}{c - a}.$

I won’t go into the details here, but Newton found that $f[x_0, x_1, \dots, x_n]$ is equal to the leading coefficient of the polynomial passing through $f$ at all the points $x_0, x_1, \dots, x_n$ . This gives us a nice notation for our definition:

n-Increasing (With Divided Differences):
A function $f$ is $n$ -increasing if for every choice of distinct numbers $x_0, x_1, \dots, x_n$ ,

$f[x_0, x_1, \dots, x_n] \geq 0.$

Does This Match the Derivative Definition?

We started by defining a function $f$ as $n$ -increasing if $f^{(n)}(x) \geq 0$ . Then we found a more general definition using divided differences which does not require $f$ to be differentiable. But are these two definitions compatible?

Yes they are! If $f$ is $n$ -differentiable on an interval, then the two definitions are equivalent on that interval. Let’s prove it.

First, assume that $f^{(n)}$ is nonnegative. There is a generalization of the mean value theorem called the mean value theorem for divided differences, which states that there will always exist some number $\xi$ between the min and max values of $x_0, x_1, \dots, x_n$ such that

$f[x_0, x_1, \dots, x_n] = \frac{f^{(n)}(\xi)}{n!}.$

It follows that if $f^{(n)}$ is always nonnegative, then all forward differences of $f$ involving $n+1$ points will be nonnegative.

For the converse, assume that every divided difference of $f$ involving $n+1$ points is nonnegative. We can use the recursive definition of divided differences, along with their linearity, to show that

$f'[x_1, \dots, x_n] = \lim_{h \to 0} \frac{f[x_1+h, x_2+h, \dots, x_n+h] - f[x_1, x_2, \dots, x_n]}h$

$\begin{align} = \lim_{h \to 0} \bigg(&\frac{f[x_1+h, x_2+h, \dots, x_n+h] - f[x_1, x_2+h, \dots, x_n+h]}h\\ &+\frac{f[x_1, x_2+h, \dots, x_n+h] - f[x_1, x_2, \dots, x_n+h]}h \nonumber\\ &+\quad \vdots \nonumber\\ &+\frac{f[x_1, x_2, \dots, x_n+h] - f[x_1, x_2, \dots, x_n]}h \bigg) \nonumber\\ = \lim_{h \to 0} (& f[x_1+h, x_2+h, \dots, x_n+h, x_1]\\ &+ f[x_1, x_2+h, \dots, x_n+h, x_2] \nonumber\\ &+ \quad \vdots \nonumber\\ &+ f[x_1, x_2, \dots, x_n+h, x_n]) \nonumber\\ \geq 0. \quad\ & \end{align}$

In $(1)$ , we build a telescoping sum by removing one “ $+\,h$ ” at a time. In $(2)$ , we use the definition of divided differences to see that each difference quotient is a divided difference of $n+1$ points. Then, by assumption, every such divided difference is nonnegative, so the inequality $(3)$ holds.

We’ve shown that all forward differences of $f'$ involving $n$ points are nonnegative. By repeating this process, we can eventually see that every forward difference of $f^{(n)}$ involving $1$ point is nonnegative. But forward differences involving $1$ point are defined as the function itself, so $f^{(n)}(x)$ is always nonnegative. This completes the proof.

Final Notes

This definition of higher-order monotonicity was first used by Eberhard Hopf, who wrote about it here, and Tiberiu Popoviciu, whose paper can be found here. (I have not read these sources, but they were referenced in the article “A new look at Popoviciu’s concept of convexity for functions of two variables” by Sorin G. Gal and Constantin P. Niculescu.)

I would love to learn more about the relationship between $n$ -monotonicity and differentiability. It is known that, on open sets, $1$ -monotone functions have left-sided and right-sided limits everywhere, and $2$ -monotone functions are left and right differentiable everywhere (and therefore continuous). I wonder if these observations can be extended to higher orders. Are $3$ -monotone functions on open sets necessarily differentiable? Are they left and right 2-differentiable? Does the pattern continue? I haven’t been able to prove any of this, but I would like to look into it more. The answers may be in the papers I linked in the previous paragraph. Maybe one day I will find the time to pull out Google Translate and read them.

Previous: Transfinite Series Rearrangement