The Density of Bubble Sort Plots

In my previous post, I shared a precise definition of what it means for a sorting algorithm’s scatter plot to approach a shape. I then showed that, as per this definition, bubble sort really does approach the curve derived in my video. Loosely speaking, the shape of a sorting algorithm was defined to be the set of points that is almost always “filled up” by the points in the algorithm’s scatter plots. But this definition ignores an interesting detail: how densely are different parts of the shape filled?

In this post, we’ll answer that question for bubble sort. We’ll be referring to the topics discussed in the other post, so you might want to read it if it feels like I pull something out of thin air.

What am I talking about?

Here’s a scatter plot of bubble sort, with the points categorized into three colors.

A bubble sort scatter plot, color-coded by point density

Clearly, the sorted section (green) is the most dense; the points are so tightly packed that it looks like a solid line. On the other hand, the interior of the unsorted section (blue) is the least dense. But the interesting part is the curve (red). The red points are way more densely packed than the blue points, but by how much exactly? How can we measure this?

Well to be honest, I don’t know the standard mathematical formulation of this question. I only know the most basic rudiments of probability theory, and the tools to tackle this problem are beyond any class I’ve taken. But if you’ll bear with me as I reinvent the wheel, I think you’ll agree that my answer is reasonable enough.

Unlike in my previous post, I think it is natural to consider the diagrams to be 2D when talking about the point density. We’ll consider the diagrams to have a width and height of 1; they will be contained in the unit square [0,1]2[0, 1]^2. We will also let t[0,1]t \in [0, 1] represent the progress of the algorithm.

Defining Density

The idea is not too complex. Draw a little disc over the diagram. Find the proportion of the diagram’s points which are inside the disc. Divide this by the disc’s area to get its density. Then, by finding the densities of smaller and smaller discs, we can zero in on the density at a specific point.

To be more specific, we can let rr be the radius of the disc, let NN be the number of point in the diagram, and let kk be the number of diagram points which are inside the disc. The density is then

kπr2N.\frac k{\pi r^2 N}.

In reality, kk would be a random variable and we would need to take some kind of probabilistic limit as NN \to \infty. But I don’t want to get too complicated, so let’s not worry about the technicalities.

This definition actually has a big problem: Some parts of bubble sort’s diagrams will be infinitely dense! This is because we’re assuming that the points are spread out over a 2D region, like they are in the blue section of the diagram above. But the red and green points are tightly compressed into a 1D curve. Dividing by the disc’s area only works when k/Nk/N is proportional to its area.

But in these one-dimensional sections, we’ll find that k/Nk/N is proportional to the disc’s diameter. So the fix is simple: for the red and green points, use the disc’s diameter instead of its area. Then the density will be

k2rN.\frac k{2rN}.

For each part of the diagram, we’ll use the appropriate definition of density, keeping in mind that every diameter-based density is “infinitely greater” than every area-based density.

The Easy Parts: Green and Blue

Let’s start with the sorted section (the green points). Since this is clearly a one-dimensional line, we will use the one-dimensional version of density. So, picking some point on the line, draw a disc of radius rr. For a list of NN items, the horizontal distance between the points is 1/N1/N. Since the points are all lined up at a 45°45\degree angle, they only intersect the disc over a width of rsin(45°)=r2r\cdot\sin(45\degree) = r\sqrt2.

A disc drawn over the sorted part of the diagram

Therefore kk, the number of items in the disc, will be about Nr2Nr\sqrt2. So the density is

k2rN=rN22rN=22.\frac k{2rN} = \frac{rN\sqrt2}{2rN} = \frac{\sqrt2}2.

Since this doesn’t depend on rr, it will remain unchanged as r0r \to 0, so 2/2\sqrt2/2 is our final answer for the density of the sorted section.

The blue points are also pretty easy. These are the points which are lower than some point before them. I won’t go into detail here, but if you’re familiar with bubble sort, you can reason that these points move precisely one space to the left each iteration as a higher point passes over them. But since these blue points all move one space to the left — no more, no less — they don’t move relative to each other. This means that the density of the blue section will not change from what it was in the initial state of the list!

An important fact from my previous post is that, in the initial state of the list, the points are spread uniformly over the unit square. That is, if you draw some shape in the unit square, then the proportion of the diagrams points which are inside the shape will converge to the shape’s area as you consider larger and larger lists. So, letting the shape be our circle of radius rr, we find that the density is

kN1πr2=πr21πr2=1.\frac kN \cdot \frac1{\pi r^2} = \pi r^2 \cdot \frac1{\pi r^2} = 1.

(Remember that for the blue part we used the two-dimensional density, but for the green part we used the one-dimensional density. Even though the number we got for the blue part, 11, is greater than the number we got for the green part, 2/2\sqrt2/2, the green part is still infinitely more dense than the blue part.)

The Fun Part: The Red Curve

Here’s where we really start needing some ideas from my previous post. We’re going to shift the diagram to the right by the amount of time that has passed. That is, we’ll move every point (x,y)(x, y) to (x+t,y)(x + t, y). This makes it so that no points ever move left as the algorithm runs. But more importantly, it allows us to draw boxes over the diagram, and the numbers of items in the boxes will change according to some simple rules. These rules are listed in here, but for our purposes we’ll only need one consequence of them:

No points can leave a box unless there are no points above it.

Two boxes, B and C, where C is directly above B

In other words, if we have two boxes BB and CC like this, no points will be able to leave BB until CC is completely empty.

With this in mind, let’s draw some boxes over the curve:

Three boxes aligned over the curve

I chose these boxes specifically so that two corners of box BB lie on the curve. Let’s think about this situation for a minute. Since the curved region shrinks toward the bottom right corner as tt increases, it was passing through CC until this exact moment. This means that, up until now, CC was not empty, so no points could have left BB.

Also, AA is empty at this point in time. Since the points only move laterally, all of the points from AA must have moved into BB. But they can’t have left BB because CCs former non-emptiness disallowed it. So all the points which started in either AA or BB have been crammed into BB.

Now because of the initial uniformity of the points, the proportions of the points which started in AA and BB approach the respective areas of AA and BB. Since all these points have moved into BB, we have that

Number of points in BN=Area(A)+Area(B).\frac{\text{Number of points in $B$}}N = \text{Area}(A) + \text{Area(B)}.

With this fact under our belt, we can make BB smaller and smaller, which allows us to zero in on the point density of the curve.

Getting Differential

We’ll pick some point (x,y)(x, y) on the curve, and draw a disc with tiny radius rr centered there. We can then construct the boxes AA, BB, and CC as above, with BB inscribed in the disc.

A small disc centered on the curve, with a zoomed view showing the alignment of the three boxes

Now, let’s give names to some of the dimensions in this diagram. We’ll call BB’s width dxdx, we’ll call its height dydy, and we’ll its diagonal dsds.

The rectangle B with sides labeled dx and dy, and diagonal labeled ds

Since the red diagonal line is just a really zoomed-in picture of the curve, these values are the curve’s differentials. This red segment is also a diameter of the disc, so

2r=ds=dx2+dy2.2r = ds = \sqrt{dx^2 + dy^2}.

Now, we can start thinking about the density. Let’s again let kk represent the number of points in the disc. We can see that

kN=Number of points in BN+O(r2).\frac kN = \frac{\text{Number of points in $B$}}N + O(r^2).

The O(r2)O(r^2) comes from the points that are in the disc, but not in BB. These points are all blue, so we already saw that their contribution is proportional to r2r^2, evident by the fact that the area-based density definition gave a finite value. These points will turn out to be negligible when we use the diameter-based definition of density, so we can just use big O notation and not worry about the specifics.

Applying the result of the previous section, we have

kN=Area(A)+Area(B)+O(r2).\frac kN = \text{Area}(A) + \text{Area}(B) + O(r^2).

But what are these areas? Well, both boxes have a height of dydy, the width of AA is xdx/2x - dx/2, and the width of BB is dxdx. So, keeping in mind that both dxdx and dydy are O(r)O(r), we have

Area(A)+Area(B)=(xdx2)dy+dxdy=xdy+12dxdy=xdy+O(r2).\begin{aligned} \text{Area}(A) + \text{Area}(B) &= \left(x - \frac{dx}2\right)dy + dx dy\\ &= xdy + \frac12 dx dy\\ &= xdy + O(r^2). \end{aligned}

This means that the density of the disc is

k2rN=xdy+O(r2)2r=xdyds+O(r).\frac k{2rN} = \frac{xdy + O(r^2)}{2r} = x\frac{dy}{ds} + O(r).

As r0r \to 0, the O(r)O(r) part will vanish, so we can just forget about it. Now we just need to find a more explicit expression for

xdyds.x \frac{dy}{ds}.

Solving the Equation

As shown in my previous post, the curve is given by the equation

x(1y)=t.x(1 - y) = t.

Fixing tt, we can differentiate using the product rule:

(1y)dxxdy=0.(1 - y)dx - xdy = 0.

We can rearrange this to get

dxdy=x1y.\frac {dx} {dy} = \frac x {1 - y}.

Now we can calculate the density:

xdyds=xdydx2+dy2=x(dxdy)2+1=xx2(1y)2+1=x(1y)x2+(1y)2=tx2+(1y)2.\begin{aligned} x \frac{dy}{ds} &= \frac{xdy}{\sqrt{dx^2 + dy^2}}\\[3ex] &= \frac x {\displaystyle \sqrt{\left(\frac{dx}{dy}\right)^2 + 1}}\\[7ex] &= \frac x {\displaystyle \sqrt{\frac{x^2}{(1 - y)^2} + 1}}\\[7ex] &= \frac {x(1 - y)} {\sqrt{x^2 + (1 - y)^2}}\\[3ex] &= \frac t {\sqrt{x^2 + (1 - y)^2}}. \end{aligned}

Don’t forget that we had shifted the diagram by (x,y)(x+t,y)(x, y) \mapsto (x + t, y). So taking that into consideration, we can see that the density of a point on the original curve is given by

t(x+t)2+(1y)2.\frac t {\sqrt{(x + t)^2 + (1 - y)^2}}.

And that’s it!

The left image below is a list of 25, ⁣60025,\!600 items at t=0.15t = 0.15. The darkness of each pixel is proportional to the number of points within a small distance of it. On the right is the theoretical curve, whose darkness set by the formula we just found.

A partially sorted list of 25600 items, with darkness representing density The ideal curve, with darkness set by the theoretical density formula

These images don’t have a lot of contrast, so it’s not super easy to compare them. But the fact that they appear to have similar gradients gives me some confidence that the solution is correct.

That’s All, Folks

I originally thought that this would be a really hard problem to solve, but surprisingly it wasn’t too bad (…for me at least. Whether my rambling is comprehensible to anyone else is a different story.) I do wish I knew the proper way to frame this question, and it’s still a bit strange that I needed two different definitions of density. If anyone knows more about this, feel free to reach out!

Anyway, that’s all for now. Hope this was interesting!