A Rigorous Derivation of the Bubble Sort Curve

Update: Before I wrote this post, a variation of the result had been proven independently by Krzysztof Bisewski, H. M. Jansen, and Yoni Nazarathy. Their paper “Nonparametric Testing via Partial Sorting” uses the bubble sort curve to define a generalization of the Kolmogorov-Smirnov test.

This post follows my video, The Bubble Sort Curve. You’ll probably want to watch that first. If not, you should at least be familiar with bubble sort.

My video answers question, “What is the shape formed by these diagrams?” Normally for a problem this silly, I would be perfectly happy with a intuitive but non-rigorous argument like the one in my video. But for some reason I couldn’t rest until I had found a precise, mathematical way to state (and then solve) the problem. After spending more time on this than I care to admit, I found a true proof and wrote it down.

If you want to see all the details, you can read it here.

But I don’t necessarily expect you to enjoy reading through pages of proofs, and all the technical details might obscure the intuition behind the main ideas. So I’ve summarized the method in this blog post. Full disclosure: This math goes a bit beyond my education level, so I don’t know the standard methods and terminologies for this kind of problem. I may revisit this once I’ve learned more, but for now, let’s get to it!

A Summary of the Solution

Diagrams as Sets of Points

In my video, the diagrams are bar plots. This is the most common visualization, but it has a couple of downsides when compared to the scatter plot representation. First, points are just plain simpler than bars. Second, the scatter plot gives a more “complete” representation of the shape. Bar plots make visible the shapes formed by the top of the diagrams, but they give the false impression that, under this topmost contour, the shape is solid. On the other hand, scatter plots have no vertical bias, so they could potentially form all kinds of interesting two-dimensional shapes. Here are the bar and scatter plots of a partially sorted list.

A comparison of the bar plot and the scatter plot representations

Alright, so maybe the scatter plot’s shape isn’t that much more interesting in this case. But it does more clearly show that the curved part has body, while the sorted part on the right is just a one dimensional line. So with the scatter plot we see more than just a curve. We see a two dimensional shape which changes over time. This means that the shape of the diagram will be a connected set rather than a real-valued function. For the example above, it looks like this:

A two dimensional representation of the shape approximated by bubble sort's scatter plots

Anyway, we’ll normalize the diagrams just as we did in the video: 1×1×11 \times 1 \times 1. One unit wide, one unit tall, and one unit of time. Then we can consider the diagrams to be finite sets of points in the unit cube, [0,1]3[0, 1]^3. To get into the nitty-gritty, if we have a list of NN real numbers in [0,1][0, 1], we define the diagram to be the set

D={(nN,H(n,I),IN):n,I{0,1,,N1}},D = \left\{ \left(\frac nN, H(n, I), \frac IN\right) : n, I \in \{0, 1, \cdots, N - 1\} \right\},

where H(n,I)H(n, I) is the height of the nnth item after II iterations have been applied.

If we really want to have fun, we can interpret the time dimension as a third spacial dimension. This allows us to see the entire duration of the algorithm in one 3D object.

A three-dimensional visualization of a scatter plot diagram

Admittedly, it’s a bit cluttered when projected to your 2D screen. To help with that, I’ve added a gradient and some stripes perpendicular to the time direction, which lets you see the curves. Though this visualization is mostly for fun, it might help once we get to the definition, since we will be treating the time dimension no differently from the spacial dimensions.

Uniformly Generated Lists

In my video, all the lists were generated by randomly shuffling the list of the first NN natural numbers. But I don’t see any reason to consider this to be the one true method of generating lists. Maybe instead we could throw NN darts at [0,1][0, 1] and list the numbers we hit. Will bubble sort still make the same shape? Probably. But it’s clear that not every method of generating lists will work, so there must be some criterion for curve-forming ways of generating lists.

We’ll call this criterion uniformity. A method of generating lists is uniform if in the initial state of the list, the points are spread uniformly over the unit square. Simple enough! But what precisely does it mean for the points to be spread uniformly?

Let’s say we have a list of NN items, and we’ve drawn some rectangle RR inside the unit square. Let kk denote the number of points which are inside of RR in the initial state of the list. Our method of generating lists is uniform if

kNpArea(R)asN\frac kN \overset p\to \text{Area}(R) \quad \text{as} \quad N \to \infty

for every possible rectangle RR. The symbol (p)\left(\overset p\to\right) represents convergence in probability. This means that, though we can’t expect k/Nk/N to perfectly equal the rectangle’s area for any particular list, it becomes more and more likely to be closer and closer to the area as NN increases.

Why does this definition make sense? Well, the unit square has area 11, so a rectangle of area AA takes up A/1A/1 of the unit square. So if the points are uniformly distributed across the unit square, we expect that proportionally A/1A/1 of them should be inside the rectangle.

I won’t spell out the proofs here, but you can rest assured that the common ways of generating lists are indeed uniform.

Arbitrarily Likely/Unlikely Statements

Update: I have since learned that this is a basic concept from probability theory: asymptotically almost surely.

The diagrams approach the shape as the size of the lists increases to infinity. But it doesn’t make any sense to talk about sorting an infinite list. Instead, we can reason about probabilities involving plain old finite lists of length NN, and then take the limit of these probabilities as NN \to \infty.

In particular, we are interested in statements which are pretty much guaranteed to be true for really big lists. Let’s say ϕ\phi is a statement about a list. Not a statement about a particular list, but a statement which may be true for some lists and false for others. We will say that ϕ\phi is arbitrarily likely if

limNP(Given a list of N items, ϕ is true)=1.\lim_{N \to \infty} P(\text{Given a list of $N$ items, $\phi$ is true}) = 1.

On the other hand, a statement ϕ\phi is arbitrarily unlikely if

limNP(Given a list of N items, ϕ is true)=0.\lim_{N \to \infty} P(\text{Given a list of $N$ items, $\phi$ is true}) = 0.

These definitions give us the flexibility to talk about the shape of the diagrams more realistically. We can’t say that the diagrams always form the same shape because there will always be rare pathological shuffles which don’t. But if those bad shuffles are arbitrarily unlikely, we can legitimately ignore them.

The definition of uniformity above can be translated into an arbitrarily likely statement. For a uniform method of generating lists, for any rectangle in the unit square, it is arbitrarily likely that k/Nk/N is arbitrarily close to the area of the rectangle.

The Definition

Alright, we can finally define what it means to say that the diagrams approach a shape. We will classify each point of the unit cube as either included or excluded.

A point X0[0,1]3\mathbf{X}_0 \in [0, 1]^3 is included if for all ε>0\varepsilon > 0, it is arbitrarily likely that there is a point XD\mathbf{X} \in D such that XX0<ε|\mathbf{X} - \mathbf{X}_0| < \varepsilon.

On the other hand, a point X0[0,1]3\mathbf{X}_0 \in [0, 1]^3 is excluded if there is an ε>0\varepsilon > 0 such that it is arbitrarily unlikely that there is a point XD\mathbf{X} \in D such that XX0<ε|\mathbf{X} - \mathbf{X}_0| < \varepsilon.

Basically, an included point will pretty much always be extremely close to a point from the diagram. An excluded point will pretty much never be extremely close to a point from the diagram.

Finally, if every point in [0,1]3[0, 1]^3 is either included or excluded, then we say that the diagrams approach the set of included points.

Note that the definition starts with an “if”. Right now, we don’t know if bubble sort forms any shape at all by this definition, since we haven’t proven that every point of the unit square is either included or excluded. In fact, there are some algorithms which do not approach any shape as per this definition. For example, the shape of quicksort has different proportions each time, so most points are neither included nor excluded. But, as we’ll see, bubble sort does not have that problem.

Shearing the Diagram

Here’s a list of 10 natural numbers. Take a look at the way the items move over the first couple of iterations.

The first three states of a list of 10 integers

Notice how, although many items move left, nothing ever moves more than one space to the left in a single iteration. In this way, bubble sort almost behaves nicely. It would be great if items never moved to the left, because then we would only have one direction of movement to worry about.

Well, we can make that be the case! Just shift the entire list one space to the right after each iteration.

The same list as above, but shifted one space to the right after each iteration

Of course, this modifies the 3D diagram. But it’s just a shear transformation; each point (x,y,t)(x, y, t) moves to (x+t,y,t)(x + t, y, t). Since this mapping is continuous, it preserves included and excluded points. Therefore if we can find which points are included and excluded in this sheared diagram, we can just undo the shear, mapping the included and excluded points back onto the ordinary diagram.

Here’s the visual difference between the regular diagram and the sheared version.

A list at three different values of time, represented by an ordinary diagram and a sheared diagram

As you can see, the only difference is that the curved part sticks to the right side of the unit square now, rather than the left side. (We’ll ignore that the sorted section begins to escape from the square. Our focus here is the curved part.) This may seem like an unimportant change, but stick with me here.

Drawing Boxes

The secret to proving the bubble sort curve is to draw boxes over the 2D time-cross sections of the sheared diagram. Then, count the number of points in each box. We will call this number the box’s value. Finally, observe how the boxes’ values change as we run iterations. This diagram shows the change in value of three boxes over an iteration:

Three boxes displaying their values before and after an iteration

One box’s value increased by 11, one box’s value decreased by 11, and one box’s value stayed the same. It turns out these are the only possible ways the value can change in a single iteration. In fact, it can be proved that the change in a box’s value is dictated by three simple rules.

Take another look at the boxes above, and you can see that their values do indeed change according to these rules. This is why it’s so important to shear the diagram. If we drew boxes over the original diagram, we would not be able to find any simple rules (believe me, I’ve tried). The proof for these rules is not particularly glorious; it involves iterating through all the configurations of points at certain cardinal directions, and analyzing the behavior using theorems which didn’t make the cut for this blog post.

Now here’s what’s so great about these rules. We can partition the unit square into a grid of boxes and count the number of points in each box at the initial state of the list. Then we can forget about the points of the diagram, and instead just apply these box rules for each iteration! The values of the boxes will change exactly as they would if we were actually running bubble sort.

Here is a program which has the box rules programmed into it. Every “pixel” represents a box, and the pixel’s brightness corresponds to its value. Each pixel starts with the same value. It forms a perfect bubble sort curve!

Two Facts About Boxes

It turns out that we do not need an entire grid of boxes. We just need a couple specific types of boxes. First, boxes which touch the northwest corner of the unit square, which I will denote with BB'. Second, boxes in the southeast corner of these northwest boxes, which I will denote with BB.

A small box B which shares its southeast corner with a larger box B'. B' shares its northwest corner with the unit square.

Since it’s impossible for any points to be north, west, or northwest of BB', the box will lose one point each iteration until it is empty. Let’s let kk represent the value of BB' in the initial state of the list. Since we’re working with uniformly generated lists, it is arbitrarily likely that kN\frac kN becomes arbitrarily close to Area(B)\text{Area}(B').

Now, with the way we’ve normalized the diagrams, we will have performed II iterations at time t=INt = \frac IN. Therefore if Area(B)<t\text{Area}(B') < t, it is arbitrarily likely that kN<IN\frac kN < \frac IN. Multiplying both sides by NN, we can see that it is arbitrarily likely that I>kI > k. That is, the number of iterations we’ve performed is greater than the initial value of BB'. This gives us our first fact:

Another fact which is a little less obvious is that of all the points which leave BB', the ones in its southeast corner are the last to go. That is, BB is not empty until BB' is empty. So by following similar reasoning, we can arrive at the second fact:

Finding the shape of the sheared diagram

These two facts allow us to find the included and excluded points of the sheared diagram. Let’s start with the excluded points.

The Excluded Points

Let’s say that a point X0=(x0,y0,t0)\mathbf{X}_0 = (x_0, y_0, t_0) satisfies x0(1y0)<t0x_0(1 - y_0) < t_0. This puts it above the curve defined by x(1y)=t0x(1 - y) = t_0, as shown below.

A diagram representing the rectangle that contains an excluded point

We can make the box BB' just barely big enough to contain X0\mathbf{X}_0. This box’s width is slightly more than x0x_0 and its height is slightly more than 1y01 - y_0. Therefore its area is just a bit greater than x0(1y0)x_0(1 - y_0). But we have a bit of wiggle room to make it small enough that its area will remain smaller than t0t_0.

But then, by the first rule above, it is arbitrarily likely that BB' is empty before time t0t_0. Thus, there is some padding around X0\mathbf{X}_0 which is arbitrarily unlikely to contain any points from the diagram. Therefore X0\mathbf{X}_0 is excluded!

The Included Points

On the other hand, if x0(1y0)t0x_0(1 - y_0) \geq t_0, then X0\mathbf{X}_0 is below (or on) the curve, and the area of BB' is definitely greater than t0t_0.

A diagram representing the rectangles that contain an included point

But now we can add a box BB in the southeast corner of BB' which contains X0\mathbf{X_0}. The second fact about boxes tells us that it is arbitrarily likely that at time t0t_0, the box BB will not yet be empty. Therefore, we can adjust the dimensions of the boxes so that BB becomes arbitrarily small, and it will always be arbitrarily likely to contain at least one point from the diagram. Thus, it is arbitrarily likely that a point from the diagram is arbitrarily close to X0\mathbf{X}_0, so X0\mathbf{X}_0 is included!

Wrapping it Up

We’ve found that, after shearing the diagram, each point X=(x,y,t)\mathbf{X} = (x, y, t) in the unit cube is included if x(1y)tx(1 - y) \geq t, and excluded otherwise. So now we just have to undo the shear. Since the shear mapped (x,y,t)(x, y, t) to (x+t,y,t)(x + t, y, t), the inclusion and exclusion of points for the non-sheared diagram is really determined by whether

(x+t)(1y)t. (x + t)(1 - y) \geq t.

This can be rearranged into

yxx+t,y \leq \frac x{x + t},

which should look pretty familiar if you’ve seen my video. And that’s the gist of the proof!

Of course, this is only the curved portion of the diagram. The diagonal line still has to be dealt with. But that’s not very exciting, so I’ll leave that out. Once we work through all the details, we get as our final answer that the shape of bubble sort is

{(x,y,t)[0,1]3:yxx+t, x1t}{(x,x,t):x,t[0,1]}\left\{\left(x, y, t\right) \in [0, 1]^3 : y \leq \frac x{x + t},\ x \leq 1 - t\right\} \cup \{(x, x, t) : x, t \in [0, 1]\}

… which is a mess of symbols. But it’s just a symbolic representation of the shape.

Here’s something more visual – a 2D representation with a slider for time.

And for fun, here’s a 3D representation to go along with the 3D point diagram earlier.