If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Computer Science Principles>Unit 4

Lesson 4: Parallel and distributed computing

# Distributed computing

When solving problems, we don't need to limit our solutions to running on a single computer. Instead we can use distributed computing to distribute the problem across multiple networked computing devices.

### Distribution of parallel processes

Distributed computing is often used in tandem with parallel computing. Parallel computing on a single computer uses multiple processors to process tasks in parallel, whereas distributed parallel computing uses multiple computing devices to process those tasks.
Consider our example program that detects cats in images. In a distributed computing approach, a managing computer would send the image information to each of the worker computers and each worker would report back their results.

### Evaluating the performance

Distributed computing can improve the performance of many solutions, by taking advantage of hundreds or thousands of computers running in parallel. We can measure the gains by calculating the speedup: the time taken by the sequential solution divided by the time taken by the distributed parallel solution. If a sequential solution takes $60$ minutes and a distributed solution takes $6$ minutes, the speedup is $10$.
The performance of distributed solutions can also suffer from their distributed nature, however. The computers must communicate over the network, sending messages with input and output values. Every message sent back and forth takes some amount of time, and that time adds to the overall time of the solution. For a distributed computing solution to be worth the trouble, the time saved by distributing the operations must be greater than the time added by the communication overhead.
In the simplest distributed computing architecture, the managing computer needs to communicate with each worker:
In more complex architectures, worker nodes must communicate with other worker nodes. This is necessary when using distributed computing to train a deep learning network, for example.${}^{1}$
One way to reduce the communication time is to use cluster computing: co-located computers on a local network that all work on similar tasks. In a computer cluster, a message does not have to travel very far and more importantly, does not have to travel over the public Internet.
Cluster computing has its own limitations; setting up a cluster requires physical space, hardware operations expertise, and of course, money to buy all the devices and networking infrastructure.
Fortunately, many companies now offer cloud computing services which give programmers everywhere access to managed clusters. The companies manage the hardware operations, provide tools to upload programs, and charge based on usage.

### Distribution of functionality

Another form of distributed computing is to use different computing devices to execute different pieces of functionality.
For example, imagine a zoo with an array of security cameras. Each security camera records video footage in a digital format. The cameras send their video data to a computer cluster located in the zoo headquarters, and that cluster runs video analysis algorithms to detect escaped animals. The cluster also sends the video data to a cloud computing server which analyzes terabytes of video data to discover historical trends.
Each computing device in this distributed network is working on a different piece of the problem, based on their strengths and weaknesses. The security cameras themselves don't have enough processing power to detect escaped animals or enough storage space for the other cameras' footage (which could help an algorithm track movement). The local cluster does have a decent amount of processing power and extra storage, so it can perform the urgent task of escaped animal detection. However, the cluster defers the task which requires the most processing and storage (but isn't as time sensitive) to the cloud computing server.
This form of distributed computing recognizes that the world is filled with a range of computing devices with varying capabilities, and ultimately, some problems are best solved by utilizing a network of those devices.
In fact, you're currently participating in a giant example of distributed computing: the web. Your computer is doing a lot of processing to read this website: sending HTTP requests to get the website data, interpreting the JavaScript that the website loads, and constantly updating the screen as you scroll the page. But our servers are also doing a lot of work while responding to your HTTP requests, plus we send data out to high-powered analytics servers for further processing.
Every application that uses the Internet is an example of distributed computing, but each application makes different decisions about how it distributes the computing. For another example, smart home assistants do a small amount of language processing locally to determine that you've asked them for help but then send your audio to high-powered servers to parse your full question.
The Internet enables distributed computing at a worldwide scale, both to distribute parallel computation and to distribute functionality. Computer scientists, programmers, and entrepreneurs are constantly discovering new ways to use distributed computing to take advantage of such a massive network of computers to solve problems.

## Want to join the conversation?

• What exactly is a cloud computing service, or more generally, "the cloud"?

Furthermore, are public Internet connections more dangerous than private ones (i.e. can people see what you are doing and should you avoid doing things like checking your bank account on public Internet)?

Thirdly, why don't clusters have to run on public Internet connections?

Lastly, if computers are far apart, can't they run on private Internet instead of public Internet, helping to rule out some of the security issues that come with long-distance distributed computing?

Thanks!
• 1) The cloud (to simplify greatly) represents a collection of computing resources accessed over the Internet. Instead of playing video games on a console, imagine users pressing keys, the keys are sent over the Internet to the cloud, the cloud processes them, and sends back the results of the keys by streaming the new images to the TV. This is the new idea behind Google's Stadia and Microsoft xCloud. The "cloud" in this case is hardware (a gaming console) accessed over the Internet.
Similarly, watching movies over Netflix or Hulu is a cloud computing service, in which entertainment is consumed over the Internet instead of buying a movie from the store and playing it on a DVD player.

2) Public internet connections can be more dangerous (see this link for more: https://www.khanacademy.org/computing/computers-and-internet/xcae6f4a7ff015e7d:online-data-security), but generally using HTTPS over the public internet is just as (if not more) secure as a private Internet connection.

3) Clusters are usually used internally at a company, hence there is little need for them to have a public Internet connection as they only communicate locally. Computer networks are independent of the Internet ( a collection of networks), which is why clusters can run without being tied to the Internet.

4) Absolutely. One way is using a VPN (also discussed in certain places in the above link). However, through the use of encryption and authentication, using the public internet has become much safer than people might think.

Hope this helps!
• I have a question: can distributed computing be run on programs that don’t support parallel programming? Or can it only be used when certain steps of an algorithm must be performed simultaneously?