Computers and the Internet
Images are all around us, from application icons to animated GIFs to photos. Image files can take up a lot of space, so computers employ a range of algorithms to compress image files.
For the simplest of images, computers can use a compression algorithm called run-length encoding (RLE).
Before we explore image compression, let's see how we can represent an image in binary without any compression.
Here's a simple image, a 16x16 heart icon:
Let's zoom in and overlay a grid on top, so that it's easy to see exactly which pixels are red and which pixels are white:
The heart icon is made up of only two colors, red and white, so a computer could represent it in binary by mapping red pixels to
and white pixels to . This is called a bitmap, since it's mapping pixels to bits.
Using this method, the heart icon would be represented like so:
0001100000110000 0011110001111000 0111111011111100 1111111111111110 1111111111111110 1111111111111110 1111111111111110 1111111111111110 0111111111111100 0011111111111000 0001111111110000 0000111111100000 0000011111000000 0000001110000000 0000000100000000
Imagine that you had to read the bits above out to someone who was copying them down. After a while, you might say things like "five zeroes" instead of "zero zero zero zero zero". Well, the computer can do that too...
RLE compression algorithm
In run-length encoding, the computer replaces each row with numbers that say how many consecutive pixels are the same color, always starting with the number of white pixels.
For example, the first row contains 3 white pixels, 2 red pixels, 5 white pixels, 2 red pixels, then 4 white pixels:
This would be represented as follows:
The fourth row is interesting because it starts with a red pixel. Run-length encodings start with the number of white pixels, so this is how it'd be represented:
Check your understanding
Here's the second row of the bitmap for the heart icon:
How would that row be represented in RLE?
When a computer uses run-length encoding, it should be able to perfectly recreate the image from the compressed representation—and so should we, if we follow the computer's strategy.
Let's try it. Here's a representation of a black & white icon using RLE:
4, 9, 3 4, 7, 2, 1, 2 4, 7, 2, 1, 2 4, 9, 3 4, 7, 5 4, 7, 5 5, 5, 6 0, 15, 1 1, 13, 2
The first row has 4 white pixels, then 9 black pixels, then 3 white pixels. That looks like:
The next row has 4 white pixels, then 7 black, 2 white, 1 black, and 2 white. That looks like:
When we keep going, the final icon is a cup and saucer:
Check your understanding
The following is a compression of a 6x6 black and white icon, using RLE:
2,2,2 2,2,2 0,6 0,6 2,2,2 2,2,2
What mathematical symbol does that icon resemble?
We've claimed that run-length encoding can save us space when storing simple images—but how much space?
To find out, I wrote a program to encode black & white bitmaps with RLE. This table summarizes the results on three heart icons of increasing size:
|Image||Dimensions||Uncompressed||After RLE||Space savings|
Take a look at that final column, the space savings. Notice a pattern? We save much more space as the size increases since the runs are much longer.
What about images of the same size? This table summarizes the results of compressing three large icons with RLE:
|Image||Dimensions||Uncompressed||After RLE||Space savings|
Now you can see why I picked a heart icon as my example: it compresses very well, thanks to its many runs of black or white. RLE compression still halves the size of the other icons, but it doesn't save nearly as much space.
In fact, sometimes RLE can't save any space at all...
Limits of RLE
How about this 16x16 icon?
Let's zoom in, so we can visualize each pixel:
Every pixel in that icon is a different color, and there are no runs.
Run-length-encoding can't compress an image like that at all. That's an example that I made up just for this article (generated by this program), so it might not seem that common.
As it turns out, photographic images are similar to that icon—the real world is filled with details that interrupt runs.
The Khan Academy team page includes this lovely photo of a dog looking at a computer screen:
At normal resolution, it looks like there are blocks of similar color, like in the dog's fur or the grey in the computer screen.
Let's zoom in to the pixels:
Now you can see that even the seemingly simple computer screen is a vast array of similar-but-not-quite-the-same colors. A run-length-encoding of the pixels won't do much to reduce the file size.
Uses for RLE
RLE compression was a very popular technique when most computer images were icons with limited color palettes.
These days, our images are more complex and don't contain as many runs of the same color.
Where is RLE used today? Fax machines still use RLE for compressing the faxed documents, since they only need to represent black and white letters. JPEG images do use RLE during a final stage of compression, but they first use a more complex algorithm to compress the photographic details. We'll explore that more soon.
Want to join the conversation?
- Sooo...unless two or more consecutive pixels have the same exact colour...they cannot be compressed?(16 votes)
- Specifically with Run length Encoding compression algorithms, that is true. The article says that other compression methods exist.(38 votes)
- When RLE is used to replace 00001111 with (4,4), the "4"s still need to be converted to binary for storage, right? So in that case wouldn't you need to use 2 bytes of storage to represent the two "4"s (00000100,00000100)? I guess I'm not seeing where the storage space savings come from when the computer has to store the replacement symbol as binary code.(17 votes)
- That's a really good question.
Those are conceptual examples meant to show the general idea of run length encoding, in a way that makes sense to a human. Your computer needs more than 1 bit to represent a pixel and that's when the savings come in.(20 votes)
- Why are there 16 rows in the heart grid but only 15 bitmaps?(14 votes)
- If the average row is less than 8 pixels, how can it be compressed using numbers? Using the example in your article 0011110001111000 is 2,4,3,4,3. If any number/letter takes 8 bits, why wouldn't the compressed picture be 40 bits vs 16 bit for the uncompressed picture?(7 votes)
- The both of you are correct. The smaller the file is the less efficient the compression becomes. IF you look at the table with the compression ratio, you see that the compression ratio for a 16x16 bit image is just a little bit over 10%. At 8x8 bit you'd probably need more storage for the compressed image.
The same problem applies if the picture is complex, the more complex the image is the less efficient the RLE compression becomes. Or as you've figured out it might actually increase the size of the file.(16 votes)
- When you have more than 2 colors, and therefore need more than 1 binary digit to represent colors, do computers use hexadecimal numbers like (FF,FF,FF) as mentioned earlier in the course?(4 votes)
- seeing RLE numbers, how to know that pixel is white or black? no any hint for it(2 votes)
- If two or more of the color is, so should be how to compress?(2 votes)
- Say Red = R, Blue = Blue, Green = G and you have the picture
where every color stands for a pixel of that color, then you could encode it to
- How is it possible that objects of the same dimensions have different values after compression by RLE and different space savings. I don't quite understand it .(2 votes)
- RLE compresses consecutive pixels of the same color, meaning that if same-colored pixels are repeated after one another, they can be compressed into a single number.
In the red heart bitmap, rows 1-4 can be compressed to the following:
Row 1 --> 3, 2, 5, 2, 4
Row 2 --> 2, 4, 3, 4, 3
Row 3 --> 1, 6, 1, 6, 2
Row 4 --> 15, 1
Where the first column represents the number of repeated white pixels, the second column represents repeated red pixels, the third column white, and so on.
Now to answer your question:
Some images can be compressed more than other images of the same size depending on how much of the same pixels are repeated in a row.
For example the 128x128 heart image can be compressed from 16,384 pixels to 2,898 number values, while a 128x128 checker board wouldn't be compressed at all because none of the same pixels are repeated consecutively.
I hope this helped! :-)(3 votes)
- How would you identify more than two colors?(1 vote)
- The computer would probably have a map that shows all of the different colors and they order they appear in the compressed file.(4 votes)