Higher-Order Monotonicity

Remember in calculus class, when you learned that a function ff is “increasing” if f(x)0f'(x) \geq 0, and that it’s “concave up” (or convex) if f(x)0f''(x) \geq 0? What about all the higher-order cases, where f(x)0f'''(x) \geq 0, and beyond? Apparently nobody cared enough to name those cases.

Well, let’s name them! If f(n)(x)0f^{(n)}(x) \geq 0, then we will say that ff is nn-increasing (or positive nn-monotone). So, nonnegative functions are 00-increasing, while “increasing” functions are 11-increasing, and convex functions are 22-increasing.

Although we just defined these using derivatives, 11-increasing and 22-increasing (convex) functions can be defined more generally, without any requirement of differentiability.

In this post, we will see a different way of looking at these definitions which elegantly extends to any order of monotonicity.

Rethinking Convexity

When we turn the above definition of convexity into a formula, we get something like this:

Convexity (Definition 1):
A function is 22-increasing, or convex, if for every triple a<b<ca < b < c,

f(b)f(a)+(ba)f(c)f(a)ca.f(b) \leq f(a) + (b-a) \frac{f(c) - f(a)}{c-a}.

The right-hand side of this formula is the height of the line segment connecting (a,f(a))(a, f(a)) to (c,f(c))(c, f(c)), at the xx coordinate of bb, so this is a correct interpretation of the function lying under the line segment. Here’s an example of a convex function, where you can play around with the three points.

Intuitively, a function should be convex if it “curves up”, and never curves down. I think the above definition captures this in a clever way. If the function lies below the line segment connecting two points, it makes sense that it ultimately has to “curve up” as it goes from one end to another. You may object that it could squiggle up and down as much as it wants under the line segment, but that worry is quelled by the fact that the function has to lie under every line segment. If it ever curved down, you could find a line segment that the function would rise above, so it wouldn’t be convex as per the definition.

But despite this definition’s visual clarity, it feels a little… asymmetrical. The numbers aa and cc represent the endpoints of the line segment, and bb is the point at which we check that the function lies below the line segment. It just feels odd to me that the three points play different roles. It seems like there should be some symmetrical way to phrase this where aa, bb, and cc all play the same part, regardless of which is between the other two.

But more importantly, this definition seems too specialized. It is not obvious by this definition that 22-monotonicity is a “sequel” to 11-monotonicity, and it is not clear at all how this definition could be modified to capture 33-monotonicity. The whole “line segment” idea is specific to convexity and convexity alone. But fortunately, there’s an equivalent way to define convexity that avoids this issue.

Forget Lines. Use Parabolas!

The reason for the this definition’s asymmetry is that it involves three points, but a line segment is determined by two points. Thus one point must be the odd one out, playing a different role than the other two. But while a line is determined by two points, a parabola is determined by three points.

So let’s think about the unique parabola that passes through (a,f(a))(a, f(a)), (b,f(b))(b, f(b)), and (c,f(c))(c, f(c)). Here is an interactive demonstration:

No matter what you do, you can never get the parabola to curve down. This is why the function is convex: Every parabola passing through three of its points is convex.

You can convince yourself that this is indeed equivalent to the line-based definition. If the middle point lies under the line segment connecting the other two points, then the parabola must “curve up” to touch all three points. Conversely, if the parabola curves up, the middle point will always lie under the line connecting the other two points.

So this gives us a new definition of convexity:

Convexity (Definition 2):
A function ff is 22-increasing, or convex, if for every choice of three distinct numbers aa, bb, and cc, the leading coefficient of the unique parabola passing through (a,f(a))(a, f(a)), (b,f(b))(b, f(b)), and (c,f(c))(c, f(c)) is nonnegative.

This definition is symmetrical! We did not have to specify the order of aa, bb, and cc; all three numbers play the same role.

Extending the Definition to All Orders

Symmetry isn’t the only perk of this new definition. It’s also easy to extend to any order! We said that a function was 22-increasing if for any 33 points, the degree 22 polynomial passing through all those points has a nonnegative leading coefficient. Well, let’s adapt this to the other orders.

0\mathbf{0}-Increasing (Nonnegative):
A function ff is 00-increasing (nonnegative) if for every number aa, the leading coefficient of the unique degree 00 polynomial passing through (a,f(a))(a, f(a)) is nonnegative.

This is trivially true. A degree 00 polynomial is a constant, and the constant which passes through (a,f(a))(a, f(a)) is obviously just f(a)f(a). So this constant is nonnegative if and only if ff is nonnegative.

What about 11-increasing functions?

1\mathbf{1}-Increasing:
A function ff is 11-increasing (or just increasing) if for every choice of two distinct numbers aa and bb, the leading coefficient of the unique degree 11 polynomial passing through (a,f(a))(a, f(a)) and (b,f(b))(b, f(b)) is nonnegative.

What do you know, it works! A degree 11 polynomial is just a line, and line is increasing whenever its leading coefficient is nonnegative. Also a line segment is increasing if and only if its left point is below its right point. Clearly this matches the condition that f(a)f(b)f(a) \leq f(b) whenever a<ba < b.

So continuing this pattern, the general definition is clear!

n-Increasing:
A function ff is nn-increasing if for every choice of distinct numbers x0,x1,,xnx_0, x_1, \dots, x_n, the leading coefficient of the unique degree nn polynomial passing through (xi,f(xi))(x_i, f(x_i)) for all i=0,1,,ni = 0, 1, \dots, n is nonnegative.

Divided Differences

Alright, we have a derivative-free definition of all orders of monotonicity. But it relies on us being able to find the polynomial that passes through a collection of points. There are a couple ways of doing this, this simplest of which is Lagrange interpolation. But there’s another way discovered by Newton that relies on a concept called divided differences. I will denote divided differences by the name of a function, followed by one or several numbers in square brackets. They are defined recursively as follows:

f[x0]=f(x0),f[x0,x1,,xn]=f[x1,x2,,xn]f[x0,x1,,xn1]xnx0.\begin{aligned} f[x_0] &= f(x_0),\\ f[x_0, x_1, \dots, x_{n}] &= \frac{f[x_1, x_2, \dots, x_{n}] - f[x_0, x_1, \dots, x_{n-1}]}{x_n - x_0}. \end{aligned}

For example,

f[a,b]=f(b)f(a)ba,f[a, b] = \frac{f(b) - f(a)}{b - a},

f[a,b,c]=f(c)f(b)cbf(b)f(a)baca.f[a, b, c] = \frac{\frac{f(c) - f(b)}{c - b} - \frac{f(b) - f(a)}{b - a}}{c - a}.

I won’t go into the details here, but Newton found that f[x0,x1,,xn]f[x_0, x_1, \dots, x_n] is equal to the leading coefficient of the polynomial passing through ff at all the points x0,x1,,xnx_0, x_1, \dots, x_n. This gives us a nice notation for our definition:

n-Increasing (With Divided Differences):
A function ff is nn-increasing if for every choice of distinct numbers x0,x1,,xnx_0, x_1, \dots, x_n,

f[x0,x1,,xn]0.f[x_0, x_1, \dots, x_n] \geq 0.

Does This Match the Derivative Definition?

We started by defining a function ff as nn-increasing if f(n)(x)0f^{(n)}(x) \geq 0. Then we found a more general definition using divided differences which does not require ff to be differentiable. But are these two definitions compatible?

Yes they are! If ff is nn-differentiable on an interval, then the two definitions are equivalent on that interval. Let’s prove it.

First, assume that f(n)f^{(n)} is nonnegative. There is a generalization of the mean value theorem called the mean value theorem for divided differences, which states that there will always exist some number ξ\xi between the min and max values of x0,x1,,xnx_0, x_1, \dots, x_n such that

f[x0,x1,,xn]=f(n)(ξ)n!.f[x_0, x_1, \dots, x_n] = \frac{f^{(n)}(\xi)}{n!}.

It follows that if f(n)f^{(n)} is always nonnegative, then all forward differences of ff involving n+1n+1 points will be nonnegative.

For the converse, assume that every divided difference of ff involving n+1n+1 points is nonnegative. We can use the recursive definition of divided differences, along with their linearity, to show that

f[x1,,xn]=limh0f[x1+h,x2+h,,xn+h]f[x1,x2,,xn]hf'[x_1, \dots, x_n] = \lim_{h \to 0} \frac{f[x_1+h, x_2+h, \dots, x_n+h] - f[x_1, x_2, \dots, x_n]}h

=limh0(f[x1+h,x2+h,,xn+h]f[x1,x2+h,,xn+h]h+f[x1,x2+h,,xn+h]f[x1,x2,,xn+h]h++f[x1,x2,,xn+h]f[x1,x2,,xn]h)=limh0(f[x1+h,x2+h,,xn+h,x1]+f[x1,x2+h,,xn+h,x2]++f[x1,x2,,xn+h,xn])0. \begin{align} = \lim_{h \to 0} \bigg(&\frac{f[x_1+h, x_2+h, \dots, x_n+h] - f[x_1, x_2+h, \dots, x_n+h]}h\\ &+\frac{f[x_1, x_2+h, \dots, x_n+h] - f[x_1, x_2, \dots, x_n+h]}h \nonumber\\ &+\quad \vdots \nonumber\\ &+\frac{f[x_1, x_2, \dots, x_n+h] - f[x_1, x_2, \dots, x_n]}h \bigg) \nonumber\\ = \lim_{h \to 0} (& f[x_1+h, x_2+h, \dots, x_n+h, x_1]\\ &+ f[x_1, x_2+h, \dots, x_n+h, x_2] \nonumber\\ &+ \quad \vdots \nonumber\\ &+ f[x_1, x_2, \dots, x_n+h, x_n]) \nonumber\\ \geq 0. \quad\ & \end{align}

In (1)(1), we build a telescoping sum by removing one “+h+\,h” at a time. In (2)(2), we use the definition of divided differences to see that each difference quotient is a divided difference of n+1n+1 points. Then, by assumption, every such divided difference is nonnegative, so the inequality (3)(3) holds.

We’ve shown that all forward differences of ff' involving nn points are nonnegative. By repeating this process, we can eventually see that every forward difference of f(n)f^{(n)} involving 11 point is nonnegative. But forward differences involving 11 point are defined as the function itself, so f(n)(x)f^{(n)}(x) is always nonnegative. This completes the proof.

Final Notes

This definition of higher-order monotonicity was first used by Eberhard Hopf, who wrote about it here, and Tiberiu Popoviciu, whose paper can be found here. (I have not read these sources, but they were referenced in the article “A new look at Popoviciu’s concept of convexity for functions of two variables” by Sorin G. Gal and Constantin P. Niculescu.)

I would love to learn more about the relationship between nn-monotonicity and differentiability. It is known that, on open sets, 11-monotone functions have left-sided and right-sided limits everywhere, and 22-monotone functions are left and right differentiable everywhere (and therefore continuous). I wonder if these observations can be extended to higher orders. Are 33-monotone functions on open sets necessarily differentiable? Are they left and right 2-differentiable? Does the pattern continue? I haven’t been able to prove any of this, but I would like to look into it more. The answers may be in the papers I linked in the previous paragraph. Maybe one day I will find the time to pull out Google Translate and read them.