If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Redundancy and fault tolerance

In the Internet Protocol (IP), computers split messages into packets and those packets hop from router to router on the way to their destination:
Diagram of laptop computer sending packet to server computer. A network of 9 routers is shown between the laptop and the server, with various lines connecting them. There's a path from the laptop, through the routers, to the server, highlighted with green arrows.
What happens if a network path is no longer available, like due to a natural disaster physically destroying it or a cybercriminal hijacking it? Is the packet doomed to never reach its destination?
Diagram with router on left and 3 routers on right. The left router has a line going to each of the right routers, and the lines are labeled 1, 2, and 3. The second line, labeled 2, is shown as cut-off halfway due to a fire.

Redundancy in routing

Fortunately, there are often many possible paths a packet can go down to reach the same destination. The availability of multiple paths increases the redundancy of a network.
Consider this simplified network connecting routers in four major cities:
Diagram with four routers and four lines connecting them. Line goes from Oakland to Austin, line goes from Austin to New York, line goes from Austin to Tampa, line goes from New York to Tampa. The lines are bidirectional.
Lines are also described in this table:
FromTo
OaklandAustin
AustinNew York
AustinTampa
New YorkTampa
There are multiple paths from the Oakland router to the New York router.
The first and shortest path goes from Oakland to Austin to New York:
Diagram with four routers labeled Oakland, Austin, Tampa, and New York. There are four lines connecting them.
  • Line goes from Oakland to Austin
  • Line goes from Austin to New York
  • Line goes from Austin to Tampa
  • Line goes from New York to Tampa
The lines from Oakland to Austin and from Austin to New York are highlighted green and end in an arrow.
A slightly longer path goes from Oakland to Austin to Tampa to New York:
Diagram with four routers labeled Oakland, Austin, Tampa, and New York. There are four lines connecting them.
  • Line goes from Oakland to Austin
  • Line goes from Austin to New York
  • Line goes from Austin to Tampa
  • Line goes from New York to Tampa
The 3 lines from Oakland to Austin, Austin to Tampa, and Tampa to New York are highlighted green and end in an arrow.
Why is this redundancy so important? If the connection between the Austin and New York router is no longer available, then there's still another way for the packet to reach its destination.
Diagram with four routers and four lines connecting them. Line goes from Oakland to Austin, line goes from Austin to Tampa, line goes from New York to Tampa. The lines are bidirectional. A partial line is shown from Austin to New York, but it is cut-off with a fire.
Lines are also described in this table:
FromTo
OaklandAustin
AustinTampa
New YorkTampa
The redundancy of the paths in the network increases the number of possible ways that a packet can reach its destination.
Check your understanding
The ARPANET was the precursor to the Internet, the network where Internet technology was first tested out. It got started in 1969 with just four computers connected to each other.
This is a map of ARPANET in 1969:
Diagram with four routers and four lines connecting them. There are no arrows on any of the lines.
  • Line goes from Utah to SRI
  • Line goes from SRI to UCLA
  • Line goes from SRI to UCSB
  • Line goes from UCSB to UCLA
How many routes are there between Utah and UCLA?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3/5
  • a simplified improper fraction, like 7/4
  • a mixed number, like 1 3/4
  • an exact decimal, like 0.75
  • a multiple of pi, like 12 pi or 2/3 pi

Fault tolerance

A fault-tolerant system is one that can experience failure (or multiple failures) in its components, but still continue operating properly.
The Internet is a massive and complex system with millions of components that can break at any time—and many of those components do break. But as of 2020, nobody has managed to break the entire Internet.
A big contributor to the fault tolerance of the Internet is the redundancy in network routing paths.
Consider the number of underseas cables connecting the eastern side of the United States to the western side of Europe:
A map of undersea cables crossing the Atlantic ocean. Shows more than 10 cables connecting the East Coast of the United States with various points in Europe.
If one of those cables is damaged, there are multiple other cables that can carry Internet traffic over the Atlantic ocean.
Or, to put it another way, there is no single point of failure between the coasts. A single point of failure is a component in the system that will bring down the entire system if it fails. When we're trying to make sure a system is fault tolerant, we look for single points of failure and find ways to add redundancy at those points.
Now consider the meager number of undersea cables between these Polynesian islands in the South Pacific:
A map of undersea cables in the South Pacific, showing one cable connecting Fiji to Tonga and one cable connecting Cook Islands and French Polynesia.
If a cable is cut between Cook Islands and French Polynesia, how will that affect the Internet on those islands?
In some cases, a cable cut can bring down an entire country. In 2019, a ship anchor dragging along the sea floor cut the cable to Tonga and cut their Internet access off for 11 days. 1
It doesn't take much to cut a cable. In 2011, a grandmother in the country of Georgia accidentally damaged a cable with her shovel, resulting in all of Armenia losing Internet access for 5 hours. 2
Cable cuts happen relatively frequently—"around every 3 days", according to networks analyst Stephan Beckert. 3 Most of the time, the average Internet user doesn't even notice when cuts happen and the cable gets fixed up by one of many cable repair ships. 4 When we do notice the cable cuts, that usually means there's a single point of failure and it's time to add redundancy to the system.
Why don't we start off with redundancy everywhere? As you might guess, it's expensive. The underseas cable that connects Tonga to Fiji was estimated to cost about $30 million, and that's a relatively short cable. 5 When Google installed a high speed fiber optic cable between the US and Tokyo, it cost $300 million dollars. 6
When it's too expensive to duplicate a resource, it may be possible to find ways for the system to gracefully degrade in the face of failure. During the Tonga outage, satellite providers rushed to provide Internet access. 7 They may not have been able to provide the same speeds as the fiber cable connection, but any Internet connection is better than no Internet connection at all.
🤔 Consider the fault tolerance of the infrastructure around you. How much redundancy is in the electrical system of your home or computer lab? Are there any single points of failures? What would be the least expensive way to increase the redundancy?
Check your understanding
The 1970 ARPANET was not very fault tolerant. With so few connections between nodes, a failure could easily disrupt the ARPANET.
Diagram with 5 routers and 5 lines connecting them. There are no arrows on any of the lines.
  • Line goes from Utah to SRI
  • Line goes from SRI to UCSB
  • Line goes from SRI to UCLA
  • Line goes from UCSB to UCLA
  • Line goes from UCLA to BBN
If a computer wanted to send a message from Utah to BBN, which connections definitely needed to stay available?
Choose all answers that apply:


🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!

Want to join the conversation?

  • old spice man blue style avatar for user 𝕊𝕠𝕣𝕥𝕙𝕚𝕠𝕦𝕤
    The Check Your Understanding question asks, "If a computer wanted to send a message from Utah to BBN, which connections definitely needed to stay available?" It says to "Choose all answers that apply:". When I chose SRI~>UCLA it was marked as incorrect. How could a computer send a message from Utah to BBN if the connection from SRI~>UCLA is down? There are no other routes to get to BBN.
    (5 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Shane McGookey
      If we eliminate the SRI -> UCLA route, it is still possible to get from Utah -> BBN. The remaining route is certainly less efficient, but it does exist. To lead you in the right direction, consider the route:

      Utah -> SRI -> UCSB -> ...

      What do you think would come next?
      (15 votes)
  • blobby green style avatar for user Michael Karabicak
    Let's say a war broke out between China and the USA. Is it more difficult to cut off the entire Internet connection between the USA and Europe, or to protect it from a possible attack?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user anonymous
      Cutting off the entire Internet connection between the USA and Europe or protecting it from a possible attack would both involve complex and challenging tasks. Let's break down each scenario:

      Cutting Off Internet Connection:
      Completely severing the Internet connection between the USA and Europe would be a monumental undertaking. The global Internet infrastructure is highly interconnected, with numerous undersea cables, satellite links, and data centers that span continents. Disconnecting such a large and diverse network would require substantial effort and cooperation from various parties, including telecommunications companies, internet service providers, and government agencies.

      While it's theoretically possible to cut off certain connections or routes, doing so on a large scale would likely have significant economic, social, and technical implications. The global nature of the internet means that traffic can be rerouted and find alternative paths, making complete isolation challenging.

      Protecting Internet Connection from Attack:
      Protecting the Internet connection between the USA and Europe from a possible attack would also be a complex task. The internet is vulnerable to a variety of cyberattacks, ranging from Distributed Denial of Service (DDoS) attacks to more sophisticated cyber warfare techniques. To defend against these attacks, governments and private entities need robust cybersecurity measures in place.

      This would involve implementing advanced security protocols, firewalls, intrusion detection and prevention systems, and regular security audits. Collaboration between various cybersecurity agencies and organizations would be necessary to monitor and respond to potential threats effectively.

      In both scenarios, geopolitical, technical, economic, and social factors would play significant roles:

      Geopolitical Considerations: Cutting off or protecting internet connections can have far-reaching diplomatic and political implications. It's not just a technical decision; it involves negotiations, international agreements, and potentially violating principles of free information exchange.

      Technical Complexity: The global internet is incredibly complex, and making large-scale changes or ensuring its security demands deep technical knowledge and coordination.

      Economic Impact: Both cutting off the internet and protecting it can have severe economic consequences. The Internet is crucial for global trade, communication, and finance. Disrupting it or implementing stringent security measures could impact industries and economies.

      Social Considerations: The Internet is a vital tool for communication and information sharing. Disrupting or securing it could impact the daily lives of billions of people.

      Ultimately, the feasibility and difficulty of each scenario would depend on a wide range of factors, including the intent of the parties involved, the specific methods employed, and the global response to such actions.
      (11 votes)
  • male robot johnny style avatar for user Movy
    What does the article reference in this quote? "But as of 2020, nobody has managed to break the entire Internet."
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Emmanuel N
    How can we improve redundancy in our community? what factors are responsible for fault redundancy.
    (4 votes)
    Default Khan Academy avatar avatar for user
    • hopper cool style avatar for user Iron Programming
      Just from my thoughts;
      If a community shared their network connections, then even if your house lost connection your neighbors might still work. If there is only one line (say from sea to your island) then you are more vulnerable, so the more possible options, it would of course increase the redundancy in your community.

      The main thing is that can be done is have as many lines as possible, but if it does the crash the best that can be hoped for is a quick fix & like they said "During the Tonga outage, satellite providers rushed to provide Internet access", so satellites can sometimes provide slower, but still internet, connection.

      These were just a couple of my thoughts, but I'm a learning student as well.

      Hope this helps,
      - Convenient Colleague
      (7 votes)
  • male robot hal style avatar for user anonymous
    why are the white dots not interconnected in the image
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Mario 3000
    What happens if a power outage happens in the middle of the message going to the receiver? Would the message forget which computer to go to and someone else receives it?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user pspri16
    What is exact meaning of redundancy?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • stelly green style avatar for user ava.tucker
    What if the safety critical system has a failure?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • winston default style avatar for user Jcim Grant
    Does fault tolerance play a big role in computer gaming?
    (0 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user David Hodgkins
    I never see how the original ARPANET was connected. Dedicated wiring, or phone lines, or satellite?
    (0 votes)
    Default Khan Academy avatar avatar for user