the unreasonable effectiveness of dithering

A long time ago, a beloved college professor of mine opened his lecture with a joke:

Three scientists are working late in the lab one night, and they get into a debate: what’s the greatest invention of all time?

“The wheel,” says one. “It’s the basis of all other mechanical engineering.”

“The microchip,” says the second. “It enabled the Information Age and revolutionized life as we know it.”

“Antibiotics,” says the third. “Untold billions of lives saved.”

The three continue debating this for some time. A janitor happens by, and overhearing the chat, he pipes up with, “The thermos.”

“The thermos?” says the first scientist. “What’s so great about a thermos?”

“Well,” says the janitor, as if it were obvious, “a thermos keeps hot things hot and cold things cold.”

“So?”

“So?” says the janitor, “How does it always know?”

My professor, a diehard Gibsonian psychologist, was teeing up a point about how we can explain much of human perception without ever invoking a cryptic interior process like “knowing”. A thermos doesn’t have to know or decide anything about its contents, it just obeys the laws of physics, as all things must.

All of this is, of course, my way of saying I want to talk a little bit about dithering, the process of making a small color palette look like a much bigger one. Let’s start with some pictures (click/tap any of the images in this post to go to a zoomable file).

It’s 1990. VGA and its decadent 256 colors are the cutting edge, but there are still plenty of computers being used in 16, 4, or even 2-color modes. To display anything like a photograph on your vintage Macintosh, with its 2-color display, you’ve got to borrow a trick from the pointillists. The image above shows the following:

  1. My original grayscale photo of Washington Tower.

  2. The simple thresholding approach: just figure out whether each gray value is closer to black or white, and go with that. It’s okay for certain drawings, but bad for anything with shades of gray, especially photos.

  3. Bayer ordered dithering, common on PCs of the era.

  4. Atkinson dithering, of the kind you might have seen on a Macintosh in 1984. It’s my personal favorite.

  5. Floyd-Steinberg dithering, also often seen on PCs.

  6. A drip effect I created by messing with the dithering kernel. More on this below.

The effect is striking, even today. It’s hard to believe that there aren’t any shades of gray in the dithered images, especially on a modern high DPI display, which is why I included detail insets of the tower’s central window. That dithering works so well is a testament to ingenuity; that dithering works at all is a testament to evolution. Dithering exploits the way human vision works in ways so fundamental that they’re easy to take for granted. For example, the visual system prioritizes the low frequency information (big objects) in a scene over the high frequency information (small details). We perceive the gist first, with the details coming in milliseconds later—we literally see the tower before the pixels.1

As a kid, dithering always seemed like a magic trick to me. “How does it always know?” The computer can’t display shades of gray, so how does it know that the patch of gray sky needs to become so many dots of black or white? How does it work out the transitions? The question felt especially baffling for Bayer dithering, which really looked to my teenaged eyes like someone had meticulously layered a cross stitch pattern onto a photograph.

If you want a detailed explanation of how dithering algorithms work, here’s a great one (and another and another). I’m less interested in the math and more interested in the perception, but here’s a brief rundown.

Ordered dithering (top-right in the tower images above) looks like someone laid a pattern over the image because essentially, that’s exactly what’s happening. A small grid of numbers—the dithering kernel—is tiled across the original image to tweak its values up or down, and then the image’s modified values are matched to the available color palette. The image at left shows ordered dithering of a grayscale gradient with a 2×2 kernel, 4×4, and 8×8. The larger the kernel, the more dithering patterns are possible, though in practice the 8×8 is as far as you need to go. The darkest part of the gradient stays solid black, because none of the kernel’s modifications are enough to move them toward white. But as the original image gets lighter, the kernel tips more and more pixels toward white. Ordered dithering can be done in parallel, because each pixel of the image is modified independently by exactly one pixel of the kernel (though of course, parallel processing was a distant dream in the dithering era).

The three other images of the tower are examples of diffusion dithering, which is a very different approach. Diffusion algorithms move across the image one pixel at a time, left to right, up and down.2 The current pixel is changed to its closest match in the available palette. Then the difference between the old and new colors—the quantization error—is distributed to the neighboring pixels according to the weights specified in the kernel. Once a pixel is matched to the palette, it is never touched again, but before that happens, an image pixel might get adjusted by the kernel multiple times. In ordered dithering, each pixel is modified independently, but in diffusion dithering the fate of every pixel depends, at least a little, on what happened to the pixels that came before it.

The Atkinson and Floyd-Steinberg dithering techniques are the same diffusion approach with different kernels. Both have carefully chosen values such that, given a field of 50% gray, they’ll produce a checkerboard of black and white. That’s a good test for a kernel, but by no means the only criterion, as the image below shows.

  1. The original image in 256 shades of gray. The inset shows what the kernel does to a 50% gray image at double magnification. Since there’s no kernel here, it’s just a gray square. Thrilling.

  2. Atkinson dithering. You might have noticed that Atkinson dithering creates a higher contrast image compared to the other algorithms. That’s because the Atkinson kernel only compensates for 75% of the quantization error, discarding the rest. Thus, bright areas tend to stay a little brighter, dark areas stay a little darker, and interestingly, the kernel’s effect on a 50% gray field produces a chunkier checkerboard than the other kernels.

  3. Floyd-Steinberg dithering. Applying this kernel to the 50% gray image produces a perfect checkerboard of alternating black and white pixels. The weights in the kernel add up to 1, so changes in brightness are fully compensated, though the weights themselves seem kind of arbitrary.

  4. A custom kernel that pushes the quantization error to the pixel immediately to the right. I’m compensating for all the quantization error, but my kernel is too simple, and the artifacts from it are obvious and ugly.

  5. A custom kernel that pushes the error to the pixel immediately below. Same idea as #4, just in a different direction. Same ugliness, although arguably worse for this image in particular, as you really lose definition in the staircase. My simple kernel is imposing a little too much of its own structure on the image, and its strong vertical component cancels out the horizontals of the steps.

  6. A custom kernel that distributes the quantization error evenly to three pixels surrounding the current one. It almost works. It even produces a checkerboard on the 50% gray image, just like Floyd-Steinberg. But it still doesn’t look great. The resulting image looks over-sharpened, and the sky has an odd stucco-like texture to it.

Human perception is a messy thing, cobbled together over millions of years of evolution, and evolution very rarely lines up with mathematical elegance. Just as it’s impossible to divide a music scale into numerically equal ratios that still sound good together, there’s no obvious recipe for a good dithering kernel. You just have to feel it out, along with some educated guesses. My custom “drip” kernel (bottom-right in the first image) takes badness to the extreme. I provide a kernel with a single negative weight, so rather than diffusing the error around the neighboring pixels, it passes the error to the pixel below the current one.

All this, and I haven’t even talked about color.

  1. My original color photograph of Washington Tower. It uses 61,607 unique colors in the modern sRGB color gamut (out of 16.8 million possible colors), or what we used to call “True Color”.

  2. An Atkinson dithered version in the default VGA 256-color palette. This image uses 166 unique colors, less than 1% of the original’s. Yet the only obvious differences are a bit of noise in the sky and some subtle color changes.

  3. An Atkinson dithered version in the EGA 16-color palette, though this image uses only 14 of them. I feel like I’m looking at an image out of Encarta.

  4. An Atkinson dithered version in the CGA cyan/magenta 4-color palette.

  5. An Atkinson dithered version in the CGA red/green 4-color palette.

  6. An Atkinson dithered version in the CGA cyan/red 4-color palette.

The cyan/red image looks incredible to me. Come on, tell me that’s not a work of art. It reminds me of the kinds of photos you get from infrared filters. Of course, the original CGA spec had a maximum resolution of 320×200 in 4-color mode, so it never would have been able to display an image this big, but we can dream.

These images also give us a sense of why VGA was such a big deal. Look at how close it is to the modern sRGB image, despite using just a fraction of the colors. VGA had a color gamut of 262,144 (643) colors, but could only display 256 colors on the screen at a time. Its default palette was carefully chosen to represent the color spectrum at perceptually equal intervals, which of course does not mean the colors are spaced in a way that is numerically equal. You can’t just write a few nested loops to generate the period-accurate 256 colors, though LLMs will try to tell you otherwise. The only way to replicate the palette accurately is to create your own lookup table.

Looking at the insets of the CGA images, you’ll notice that the tower is made mostly of colored pixels, despite two of the three CGA palettes having access to black and white. It’s not what I expected, and certainly not what I would have done if I were dithering by hand, but it’s what the algorithm dictates. Weirdest of all, it works. Yet another example of how effective these algorithms are. The tower undeniably looks gray in contrast to the sky and the ground. And the red/green palette doesn’t even have a pure white to work with! The closest it has is a pure yellow. Rather than ruining the picture, it instead gives the whole image a yellowish cast. There are a couple of perceptual processes in play here. These are all examples of color constancy (we perceive colors stably despite changes in ambient illumination) and simultaneous color contrast (perception of colors is affected by what’s around them).

How effective are the dithering algorithms at stretching those meager palettes? Here’s what you’d get if you just swapped in the best-matching color without dithering:

Funniest of all to me is that in the VGA-256 version, the palette’s generous 16 shades of gray render the tower quite realistically against a flat, badly reproduced sky. It’s amazing how much a little dithering will get you.

I’d like to talk about one more important perceptual phenomenon, but to do that I’ll need a different image. Here, look at these pumpkins:

From left to right: original, VGA-256, and EGA-16. The insets zoom in on the light pink pumpkin near the bottom of the image. In the dithered versions, the brightest part of the pumpkin is actually just pure white. Yet without the zoom (and even with it, honestly) those white pixels still look kind of pink. This is an example of neon color spreading, the phenomenon in which bright colors seem to leak into the surrounding area. Yet another perceptual process that dithering exploits, or maybe just reveals.

Depending on the image and the palette, sometimes you get unexpected effects. For example, here are two versions of Washington Tower, both dithered with the same algorithm and the same 2-color palette of dark blue (#2200aa) and pure yellow (#ffff00). The center image is based on the original color image, while the image on the right is based on the grayscale image. The grayscale dither looks a lot like the black and white versions up top, not much more to say there. But this blue-yellow palette is a bad match to the original color version and forces the algorithm into creating some odd effects. Notice how much darker the sky is, because although dark blue is pretty far off from the sky’s actual light blue color, it’s still a closer match in color space than pure yellow. Dithering darkens the sky and keeps accumulating error until the algorithm hits the edge of the tower, where there’s a very slight fringe of brighter pixels thanks to camera and compression artifacts. By this point, so much quantization error has accumulated that the algorithm dumps it all into those fringe pixels, creating a striking edge highlight. Lastly, notice the orange tree on the right side of the frame. The color-based dither makes it look much brighter, partly due to simultaneous contrast (because it’s surrounded by darker pixels), and partly because the original orange leaves are a closer match to pure yellow than the grayscale version.

And of course, once you’ve dithered an image down to a few colors, palette swaps are easy.

I’ll bet you thought I wasn’t going to get any more mileage out of that “drip” kernel.

Programming these various algorithms (in P5) was a great learning exercise, and as a former vision scientist, I of course find it fascinating to see the ways in which artistic techniques intuitively exploit the facts of our perceptual processes. For more on that, check out this talk on the intersection of art and vision science by the great Marge Livingstone. The heyday of dithering may be behind us, but I think it’s overdue for a comeback. I’d go so far as to say it’s an unreasonably effective technique. Born of a need to stretch the limits of early graphics hardware, dithering manages to transcend its practical origins and becomes an art form all its own.

  1. A study of experts indicates that experienced radiologists can guess whether a scan has an abnormality given as little as 250 milliseconds to look at it. Their accuracy isn’t fantastic, mind you, and you’d definitely want them to take a longer look, but it’s without a doubt above random chance. 

  2. Alternately, diffusion algorithms can implement boustrophedon ordering: going left-to-right on the odd rows and right-to-left on the even rows (flipping the kernel accordingly, of course).