If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Try parallel computing yourself

Now that we've discussed parallel computing in theory, let's actually see it in action. Normally, when an engineer wants to run a parallel computing solution, they will use dedicated high-performance computers.
However, thanks to modern web technology, we can also do parallel computing in our browser. That's right, you can watch tasks perform in parallel from the comfort of your own home or classroom. Ready?
Visit the link below to try a parallelized cat detection program:
👉🏽 Cat detection

Configuring the program

The goal of a parallel computing solution is to improve efficiency. It's helpful to have parameters that we can change and observe the effects.
This program provides two parameters:
  • Number of worker threads: In order to execute tasks in parallel, this program is using a browser technology called web workers. The webpage detects how many
    your computer can run concurrently based on what your hardware reports and suggests using that many workers. However, it also lets you try fewer workers so that you can see the effect on the speedup.
  • Number of images: Generally, there's a bigger benefit from parallel processing on larger data sets, so the program defaults to processing the max number of images. If you'd like, you can ask it to process fewer images and observe the difference in performance.

Monitoring the execution

Watching a parallel program execute is like watching a relay race. How long will the program take? Which worker will complete the most work the fastest? It's very exciting.
You can watch the workers progress in the chart on the webpage. The program starts off with a short setup, a sequential portion of initializing the images array and queuing up the tasks. Then the workers are off to the races!
Animated GIF that displays progress of 4 web workers processing multiple images.
On many computers, you can also monitor your CPU activity at the same time so that you can see how your CPU is being utilized and how the work is spread across the cores of your CPU.
Here's what my laptop reports when the program runs with four workers:
Two side-by-side screenshots. The first screenshot is titled "CPU History" and shows four rows of bar graphs, where the graph starts off only taking around 30% of the area and then takes 100% of the area. The second screenshot is titled "Activity Monitor" and displays a list of processes with columns for "% CPU", "CPU Time", and "Threads". The first process is "Google Chrome Helper" and shows a "% CPU" of 294.6%.
Once the workers start going, the CPU history shows that usage shoots up to 100% across the 4 cores. The activity monitor shows that Chrome processes are using more than 320% of the CPU (each core has its own 100%) and system processes are using the rest.
When I'm just using my laptop to write an article, the activity monitor typically reports that most of the CPU is not being utilized. This parallelized program is definitely putting it to work.

Calculating speedup

Exactly how much more efficient is this program when it's run in parallel? Let's find out by calculating the speedup: the ratio of the time taken to run the program sequentially to the time taken to run the parallelized program. Since we have the option to try the program with varying numbers of parallel workers (as much as our hardware allows), we can calculate the speedup per each number of workers.
First, we run the program with the maximum number of images for each number of workers and record the duration each time.
Here are four runs from my laptop:
WorkersDuration (seconds)
Running the program sequentially is basically the same as running the program with a single worker, so we can calculate the speedup by dividing the first duration by each of the other durations.
WorkersDuration (seconds)Speedup
We can also graph the speedup to visualize how it changes as the number of workers increases:
A graph with number of workers on the x-axis and speedup on the y-axis. Four points are plotted: [1, 1], [2, 1.64], [3, 1.87], [4, 1.95].
🔍 Try this from the computer you're using now. How do the results compare? If there are big differences, what do you think is responsible for those differences?

Factors that affect performance

My computer got close to a 2x speedup but nowhere near a 4x speedup, which is what we might have expected with 4 workers. Why not?
There are many factors that can affect the amount of time the computer takes to complete the program:


Even though my computer reports that it can run four threads concurrently, I discovered that my CPU only has two cores:
Screenshot of Apple system information screen with title of "Hardware Overview" and the following table:
Model Name:MacBook Pro
Model Identifier:MacBookPro14,2
Processor Name:Intel Core i5
Processor Speed:3.1 GHz
Number of Processors:1
Total Number of Cores:2
Hardware details from my Apple laptop system overview
Those two cores use a technology called hyperthreading, however. Intel invented hyperthreading to enable a single CPU core to run two threads concurrently. Since Intel is a very popular manufacturer of CPUs, many personal computers now come with hyperthreaded CPUs.
Hyperthreading works well when two threads are doing different kinds of computation. For example, one task could be doing arithmetic operations while the other task is processing input. Those two tasks are utilizing different parts of the CPU and can be sped up by hyperthreading. However, if two tasks are running identical instructions, hyperthreading can't speed them up.
The fact that my laptop has only two (hyperthreaded) physical cores is the most likely explanation for why the speedup approaches two but never gets close to four.
🔍 If you see similar behavior on your machine, do a little investigation to find out how many physical cores the CPU has.

Other CPU activity

When this program runs from a web browser on a computer, it's competing for CPU time with other processes.
Before I started the program on my laptop, the CPU was already running over 400 processes, a mix of system processes and user applications:
Screenshot from Apple Activity Monitor. The center shows an area chart titled "CPU Load".
The left side displays this table:
The right side displays this table:
It might be confusing to hear that a computer with 2 cores can run over 400 processes at once. Most of the time, when a computer runs multiple processes "at once", it's actually switching rapidly between them, so quickly that the user doesn't notice. When a computer runs two processes truly in parallel, then it no longer needs to switch between them.
The program can't complete as quickly when the CPU is also executing instructions from other processes, but it's hard to know exactly how much the program's duration is affected. That uncertainty affects our speedup measurements, since the run with 4 workers might have been more or less affected by other CPU activity than the run with 1 worker.
🔍 For the most accurate measurements, quit as many other applications as possible and wait until your CPU monitor shows very low levels of activity. Then hit that button and see what happens when more of your computer's CPU resources are freed up to work on the program.

User interface updates

The webpage that runs this program includes many visual elements: the constantly updating chart, the images and their loading indicators, the status text. Whenever a webpage needs to update a visual element, the CPU is doing work to calculate the new pixels and render them to the screen. That additional work slows down the execution time.
As an experiment, I disabled the UI updates in the program and saw the duration go from 30 seconds to 22 seconds, a significant decrease.
🔍 Try for yourself on this UI-less version of the cat detection program.

Improving the performance

Now that we've thoroughly explored the performance of this parallelized program, we have a better idea how to improve the performance. If we were running this program in a production environment, like for a company or research project, then we might make these changes:
  • Use hardware with as many physical CPU cores as possible. More physical cores means more tasks that can truly run in parallel.
  • Run the program on a dedicated machine, a computer that isn't running other user processes. It will still be running a few systems processes to keep the operating systems running, but nowhere near as many as a typical home computer runs.
  • Run the program from the command line, not a webpage. Eliminating the graphical user interface removes the need for any UI updates.
🤔 What other ideas do you have for improving the program?

🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!

Want to join the conversation?

  • sneak peak green style avatar for user Big Daniel
    is it possible for a computer to have 2(or more) CPUs in it?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user trungdo3224
    Why when i run the program, differents outputs of how many cats are found? (with 1,2,4 i got 20 cats found but with 8 threads i got 22)
    (1 vote)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Martin
      Everytime you click on the start processing button it will load a new set of pictures, its possible that you simply got more cats in one run through then the other.
      Another possibility is a bug. If you overload your system the program seems to start having trouble counting, I just managed to get 69 cats detected despite there only being 44 pictures.
      (6 votes)
  • leaf red style avatar for user layaz7717
    Why can you speed up different tasks using hyperthreading, but not identical tasks?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • starky ultimate style avatar for user Dzaka H. Athif
      I guess that's because inside a physical core there is individual processor for different types of job, for example, there is a processor for arithmetic, there is processor for displaying output. So, if assuming there are 2 arithmetic task with one core (which only has one arithmetic processor), the core can only execute the task sequentially. Please correct me if I'm wrong.
      (4 votes)
  • blobby green style avatar for user Jingwen
    My MacBook Pro (M2 Max) reports that it can run 8 threads now, and the total number of cores is 12.
    So what's the main difference between 2 cores (4 threads) and 12 cores (8 threads) ?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leaf red style avatar for user layaz7717
    How would someone set up a cat detector? That sounds very complex, given the wide variation in cat appearances and the similarities between many species (like domestic cats versus tigers).
    (1 vote)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Astro8333
      It's mostly done using AI. It does a lot of comparison. It will take an image, and compare it to other images in its database. The way it compares is by taking sections of the image, such as the nose, and comparing it (pixel by pixel) to the other images, as the nose for cats is usually the same. The way it compares it to a lion is that it takes the background and compares it, it will also into account the size of the animal. (in comparison to the background). And also, it looks at the different lion images in its database, and will determine if it looks more like a cat or a lion.
      (1 vote)
  • male robot hal style avatar for user hani issa
    Hyperthreading is cheating
    (0 votes)
    Default Khan Academy avatar avatar for user