If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Computers and the Internet

### Course: Computers and the Internet>Unit 3

Lesson 5: Transporting packets

# Transmission Control Protocol (TCP)

AP.CSP:
CSN‑1 (EU)
,
CSN‑1.C (LO)
,
CSN‑1.C.1 (EK)
,
CSN‑1.C.2 (EK)
,
CSN‑1.C.3 (EK)
,
CSN‑1.C.4 (EK)
The Transmission Control Protocol (TCP) is a transport protocol that is used on top of IP to ensure reliable transmission of packets.
TCP includes mechanisms to solve many of the problems that arise from packet-based messaging, such as lost packets, out of order packets, duplicate packets, and corrupted packets.
Since TCP is the protocol used most commonly on top of IP, the Internet protocol stack is sometimes referred to as TCP/IP.

## Packet format

When sending packets using TCP/IP, the data portion of each IP packet is formatted as a TCP segment.
Diagram of a TCP segment within an IP packet. The IP packet contains header and data sections. The IP data section is the TCP segment, which itself contains header and data sections.
Each TCP segment contains a header and data. The TCP header contains many more fields than the UDP header and can range in size from 20 to 60 bytes, depending on the size of the options field.
The TCP header shares some fields with the UDP header: source port number, destination port number, and checksum. To remember how those are used, review the UDP article.

## From start to finish

Let's step through the process of transmitting a packet with TCP/IP.

### Step 1: Establish connection

When two computers want to send data to each other over TCP, they first need to establish a connection using a three-way handshake.
Diagram of two computers with arrows between.
• Arrow goes from Computer 1 to Computer 2 with "SYN" label.
• Arrow goes from Computer 2 to Computer 1 with "ACK SYN" label.
• Arrow goes from Computer 1 to Computer 2 with "ACK" label.
The first computer sends a packet with the SYN bit set to 1 (SYN = "synchronize?"). The second computer sends back a packet with the ACK bit set to 1 (ACK = "acknowledge!") plus the SYN bit set to 1. The first computer replies back with an ACK.
The SYN and ACK bits are both part of the TCP header:
A diagram of the TCP header with rows of fields. Each row is 32 bits long. The first row contains a 16-bit source port number and 16-bit destination port number. The second row contains a 32-bit sequence number. The third row contains a 32-bit acknowledgement number. The fourth row contains a 4-bit data offset number, 6 bits that are marked as reserved, 6 control bits (URG, ACK, PSH, RST, SYN, and FIN), and a 16-bit window size number. The fifth row contains a 16-bit checksum and 16-bit urgent pointer. The header ends with options and padding which can be of variable length.
The ACK and SYN bits are highlighted on the fourth row of the header.
In fact, the three packets involved in the three-way handshake do not typically include any data. Once the computers are done with the handshake, they're ready to receive packets containing actual data.

### Step 2: Send packets of data

When a packet of data is sent over TCP, the recipient must always acknowledge what they received.
Diagram of two computers with arrows between.
• Arrow goes from Computer 1 to Computer 2 and shows a box of binary data and the label "Sequence #1".
• Arrow goes from Computer 2 to Computer 1 with "ACK" label.
The first computer sends a packet with data and a sequence number. The second computer acknowledges it by setting the ACK bit and increasing the acknowledgement number by the length of the received data.
The sequence and acknowledgement numbers are part of the TCP header:
A diagram of the TCP header with rows of fields. Each row is 32 bits long. The first row contains a 16-bit source port number and 16-bit destination port number. The second row contains a 32-bit sequence number. The third row contains a 32-bit acknowledgement number. The fourth row contains a 4-bit data offset number, 6 bits that are marked as reserved, 6 control bits (URG, ACK, PSH, RST, SYN, and FIN), and a 16-bit window size number. The fifth row contains a 16-bit checksum and 16-bit urgent pointer. The header ends with options and padding which can be of variable length.
The 32-bit sequence and acknowledgement numbers are highlighted.
Those two numbers help the computers to keep track of which data was successfully received, which data was lost, and which data was accidentally sent twice.

### Step 3: Close the connection

Either computer can close the connection when they no longer want to send or receive data.
Diagram of two computers with arrows between.
• Arrow goes from Computer 1 to Computer 2 with "FIN" label.
• Arrow goes from Computer 2 to Computer 1 with "ACK FIN" label.
• Arrow goes from Computer 1 to Computer 2 with "ACK" label.
A computer initiates closing the connection by sending a packet with the FIN bit set to 1 (FIN = finish). The other computer replies with an ACK and another FIN. After one more ACK from the initiating computer, the connection is closed.

## Detecting lost packets

TCP connections can detect lost packets using a timeout.
Diagram demonstrating re-transmission of a packet from one computer to another computer. Arrow goes from first computer to second computer and is labeled with "sequence #1" and a string of binary data. A stopwatch is shown in various stages after the arrow, first with 0 time passed, then half the time passed, then all time passed and in an alarm state. The another arrow goes from the first laptop to second laptop, labeled the same as the first.
After sending off a packet, the sender starts a timer and puts the packet in a retransmission queue. If the timer runs out and the sender has not yet received an ACK from the recipient, it sends the packet again.
The retransmission may lead to the recipient receiving duplicate packets, if a packet was not actually lost but just very slow to arrive or be acknowledged. If so, the recipient can simply discard duplicate packets. It's better to have the data twice than not at all!

## Handling out of order packets

TCP connections can detect out of order packets by using the sequence and acknowledgement numbers.
Diagram of two computers with arrows between.
• Arrow goes from Computer 1 to Computer 2 and shows a box of binary data with the label "Seq #1".
• Arrow goes from Computer 2 to Computer 1 with the label "Ack #37".
• Arrow goes from Computer 1 to Computer 2 and shows a box of binary data with the label "Seq #73".
• Arrow goes from Computer 2 to Computer 1 with the label "Ack #37".
When the recipient sees a higher sequence number than what they have acknowledged so far, they know that they are missing at least one packet in between. In the situation pictured above, the recipient sees a sequence number of #73 but expected a sequence number of #37. The recipient lets the sender know there's something amiss by sending a packet with an acknowledgement number set to the expected sequence number.
Sometimes the missing packet is simply taking a slower route through the Internet and it arrives soon after.
Diagram of TCP packets arriving out of order. Two computers are shown with arrows going back and forth, with their vertical location indicating the time of sending and arrival:
• An arrow labeled "Seq #1" starts from Computer 1 and ends soon after at Computer 2.
• An arrow labeled "Ack #37" starts from Computer 2 and ends soon after at Computer 1.
• An arrow labeled "Seq #37" starts from Computer 1 and doesn't end until much later at Computer 2.
• An arrow labeled "Seq #73" starts from Computer 1 and ends soon after at Computer 2 (before the arrow for "Seq #37").
• An arrow labeled "Ack #37" starts from Computer 2 and ends soon after at Computer 1 (before the arrow for "Seq #37").
Other times, the missing packet may actually be a lost packet and the sender must retransmit the packet.
Diagram of TCP packets arriving out of order. Two computers are shown with arrows going back and forth, with their vertical location indicating the time of sending and arrival:
• An arrow labeled "Seq #1" starts from Computer 1 and ends soon after at Computer 2.
• An arrow labeled "Ack #37" starts from Computer 2 and ends soon after at Computer 1.
• An arrow labeled "Seq #37" starts from Computer 1 and ends before reaching Computer 2, with an X indicating it was lost.
• An arrow labeled "Seq #73" starts from Computer 1 and ends soon after at Computer 2.
• An arrow labeled "Ack #37" starts from Computer 2 and ends soon after at Computer 1.
• An arrow labeled "Seq #37" starts from Computer 1 and ends soon after at Computer 2.
In both situations, the recipient has to deal with out of order packets. Fortunately, the recipient can use the sequence numbers to reassemble the packet data in the correct order.
A diagram of TCP data reassembly.

## Want to join the conversation?

• When handling out-of-order packets, how does sending the expected acknowledgement number indicate to the sender that something is amiss? How would the sender know if it had to re-send the package if it was lost?
• Imagine you want to send the letters of the alphabet to a friend over the Internet.

You send ('a', 1), ('b', 2), ('c', 3), one by one to your friend. The numbers are used in case the packets/messages arrive out of order.

Now, suppose the friend gets ('b', 2), but then ('d', 4). It's missing 'c' because it expects a continuous increase of numbers and 3 is missing. So your friend asks you to resend the letter at position 3 (this is the idea behind the expected acknowledgement number).

As mentioned in the article, it may be just that ('c', 3) is taking longer to arrive and so in that case, the sender sends a duplicate message, but duplicates are typically dropped by the receiver.

A helpful way to think about these numbers is that they synchronize the data so both parties have the same "view" of it.

Imagine if we didn't have a universal notion of time. Then, the sender's view of time would be different from the receivers. Hence, they synchronize their "view" of time by communicating numbers.

Hope this helps!
• What does the article mean "setting the ACK bit and increasing the acknowledgement number by the length of the data received"?
• Say you want to send a message that's 32 bytes long.
So you check if the receiver is there. The receiver answers with ACK #1. That means something like "hullo I can hear you".
Now you send the message of 32 bytes and the receiver responds with ACK #33, that means "I got your message and received a total of 33 bytes from you". (the additional byte comes from the introduction).

Now if you send another message thats 100 bytes long, the receiver would respond with ACK #133, meaning "I got 133 bytes from you".

That way you can see if the connection works without the receiver sending you the entire message back and you having to read it.
• Hi. Following up on Carita's question below? How does the sender know that a packet is missing if the recipient only responds with "Ack [expected packet number]"? Is an Ack for a missing packet somehow different from an Ack for a received packet to trigger the sender to resend the missing packet?
• Good question, this is a central concern in protocol development: how to deal with ambiguity.

A sender also keeps their sequence number to synchronize, so if they receive an Ack[packet_number] not matching their current sequence number, they can distinguish between a missing packet and sending new packets.

An Ack for a missing packet is not any different from an Ack for a received packet. It is the fact that the sender has their own current sequence number, which in some sense is their own "time", that enables them to distinguish.

Hope this helps!
• Hello,
I'm looking at the Transmission Control Protocol page (I'm trying to ask a question under that page, but all I'm seeing is the comments from the User Datagram Protocol page, so I'm not sure where this is actually going to post). It keeps mentioning a sequence number. What exactly is a sequence number, and how does it change as more data is sent? For example, in the last section, it has the numbers 1 and 37 for the sequence, and 73 and 37 for the acknowledgement. Where are these numbers coming from?
• I believe that these numbers represent different packages and the order they were sent in - ex: you send a 3 text messages and they're flagged as a sequence of message 1,2 and 3 in the order they were sent
• How we can get to know what we are using TCP or UDP?
• Wireshark is a free tool that enables you to inspect the Internet packets (UDP or TCP based) flowing in and out of your device. Here's a tutorial I used at some point to get started: (https://www2.cs.siu.edu/~cs441/lectures/Wireshark%20Tutorial.pdf)

Additionally, some well-known Internet activity defaults to a particular protocol. For instance, surfing the web generally uses TCP, whereas live streams use UDP.

Hope this helps!
• Do the computers run TCP or UDP first? or do they happen at the same time?
(1 vote)
• TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are two different protocols that run independently depending upon how a developer wishes to communicate network traffic.

For example, if you were developing an online video game where quick traffic and fast response was essential, you might be privy to using UDP which is unreliable and connectionless but very fast. If, however, you were dependent upon reliability and less so on speed, you would opt toward the use of TCP.

TCP and UDP are both transport layer protocols built upon the Internet Protocol (IP) which sits at the network layer. You make a choice between the two depending upon your needs, rather than them happening concurrently within one packet or one happening "first or second."