Main content
Computers and the Internet
Transmission Control Protocol (TCP)
The Transmission Control Protocol (TCP) is a transport protocol that is used on top of IP to ensure reliable transmission of packets.
TCP includes mechanisms to solve many of the problems that arise from packet-based messaging, such as lost packets, out of order packets, duplicate packets, and corrupted packets.
Since TCP is the protocol used most commonly on top of IP, the Internet protocol stack is sometimes referred to as TCP/IP.
Packet format
When sending packets using TCP/IP, the data portion of each IP packet is formatted as a TCP segment.
Each TCP segment contains a header and data. The TCP header contains many more fields than the UDP header and can range in size from 20 to 60 bytes, depending on the size of the options field.
The TCP header shares some fields with the UDP header: source port number, destination port number, and checksum. To remember how those are used, review the UDP article.
From start to finish
Let's step through the process of transmitting a packet with TCP/IP.
Step 1: Establish connection
When two computers want to send data to each other over TCP, they first need to establish a connection using a three-way handshake.
The first computer sends a packet with the SYN bit set to 1 (SYN = "synchronize?"). The second computer sends back a packet with the ACK bit set to 1 (ACK = "acknowledge!") plus the SYN bit set to 1. The first computer replies back with an ACK.
The SYN and ACK bits are both part of the TCP header:
In fact, the three packets involved in the three-way handshake do not typically include any data. Once the computers are done with the handshake, they're ready to receive packets containing actual data.
Step 2: Send packets of data
When a packet of data is sent over TCP, the recipient must always acknowledge what they received.
The first computer sends a packet with data and a sequence number. The second computer acknowledges it by setting the ACK bit and increasing the acknowledgement number by the length of the received data.
The sequence and acknowledgement numbers are part of the TCP header:
Those two numbers help the computers to keep track of which data was successfully received, which data was lost, and which data was accidentally sent twice.
Step 3: Close the connection
Either computer can close the connection when they no longer want to send or receive data.
A computer initiates closing the connection by sending a packet with the FIN bit set to 1 (FIN = finish). The other computer replies with an ACK and another FIN. After one more ACK from the initiating computer, the connection is closed.
Detecting lost packets
TCP connections can detect lost packets using a timeout.
After sending off a packet, the sender starts a timer and puts the packet in a retransmission queue. If the timer runs out and the sender has not yet received an ACK from the recipient, it sends the packet again.
The retransmission may lead to the recipient receiving duplicate packets, if a packet was not actually lost but just very slow to arrive or be acknowledged. If so, the recipient can simply discard duplicate packets. It's better to have the data twice than not at all!
Handling out of order packets
TCP connections can detect out of order packets by using the sequence and acknowledgement numbers.
When the recipient sees a higher sequence number than what they have acknowledged so far, they know that they are missing at least one packet in between. In the situation pictured above, the recipient sees a sequence number of #73 but expected a sequence number of #37. The recipient lets the sender know there's something amiss by sending a packet with an acknowledgement number set to the expected sequence number.
Sometimes the missing packet is simply taking a slower route through the Internet and it arrives soon after.
Other times, the missing packet may actually be a lost packet and the sender must retransmit the packet.
In both situations, the recipient has to deal with out of order packets. Fortunately, the recipient can use the sequence numbers to reassemble the packet data in the correct order.
🙋🏽🙋🏻♀️🙋🏿♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!
Want to join the conversation?
- When handling out-of-order packets, how does sending the expected acknowledgement number indicate to the sender that something is amiss? How would the sender know if it had to re-send the package if it was lost?(7 votes)
- Imagine you want to send the letters of the alphabet to a friend over the Internet.
You send ('a', 1), ('b', 2), ('c', 3), one by one to your friend. The numbers are used in case the packets/messages arrive out of order.
Now, suppose the friend gets ('b', 2), but then ('d', 4). It's missing 'c' because it expects a continuous increase of numbers and 3 is missing. So your friend asks you to resend the letter at position 3 (this is the idea behind the expected acknowledgement number).
As mentioned in the article, it may be just that ('c', 3) is taking longer to arrive and so in that case, the sender sends a duplicate message, but duplicates are typically dropped by the receiver.
A helpful way to think about these numbers is that they synchronize the data so both parties have the same "view" of it.
Imagine if we didn't have a universal notion of time. Then, the sender's view of time would be different from the receivers. Hence, they synchronize their "view" of time by communicating numbers.
Hope this helps!(32 votes)
- What does the article mean "setting the ACK bit and increasing the acknowledgement number by the length of the data received"?(5 votes)
- Say you want to send a message that's 32 bytes long.
So you check if the receiver is there. The receiver answers with ACK #1. That means something like "hullo I can hear you".
Now you send the message of 32 bytes and the receiver responds with ACK #33, that means "I got your message and received a total of 33 bytes from you". (the additional byte comes from the introduction).
Now if you send another message thats 100 bytes long, the receiver would respond with ACK #133, meaning "I got 133 bytes from you".
That way you can see if the connection works without the receiver sending you the entire message back and you having to read it.(11 votes)
- Hi. Following up on Carita's question below? How does the sender know that a packet is missing if the recipient only responds with "Ack [expected packet number]"? Is an Ack for a missing packet somehow different from an Ack for a received packet to trigger the sender to resend the missing packet?(3 votes)
- Good question, this is a central concern in protocol development: how to deal with ambiguity.
A sender also keeps their sequence number to synchronize, so if they receive an Ack[packet_number] not matching their current sequence number, they can distinguish between a missing packet and sending new packets.
An Ack for a missing packet is not any different from an Ack for a received packet. It is the fact that the sender has their own current sequence number, which in some sense is their own "time", that enables them to distinguish.
Hope this helps!(6 votes)
- Hello,
I'm looking at the Transmission Control Protocol page (I'm trying to ask a question under that page, but all I'm seeing is the comments from the User Datagram Protocol page, so I'm not sure where this is actually going to post). It keeps mentioning a sequence number. What exactly is a sequence number, and how does it change as more data is sent? For example, in the last section, it has the numbers 1 and 37 for the sequence, and 73 and 37 for the acknowledgement. Where are these numbers coming from?(4 votes)- I believe that these numbers represent different packages and the order they were sent in - ex: you send a 3 text messages and they're flagged as a sequence of message 1,2 and 3 in the order they were sent(3 votes)
- How we can get to know what we are using TCP or UDP?(2 votes)
- Wireshark is a free tool that enables you to inspect the Internet packets (UDP or TCP based) flowing in and out of your device. Here's a tutorial I used at some point to get started: (https://www2.cs.siu.edu/~cs441/lectures/Wireshark%20Tutorial.pdf)
Additionally, some well-known Internet activity defaults to a particular protocol. For instance, surfing the web generally uses TCP, whereas live streams use UDP.
Hope this helps!(5 votes)
- Do the computers run TCP or UDP first? or do they happen at the same time?(1 vote)
- TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are two different protocols that run independently depending upon how a developer wishes to communicate network traffic.
For example, if you were developing an online video game where quick traffic and fast response was essential, you might be privy to using UDP which is unreliable and connectionless but very fast. If, however, you were dependent upon reliability and less so on speed, you would opt toward the use of TCP.
TCP and UDP are both transport layer protocols built upon the Internet Protocol (IP) which sits at the network layer. You make a choice between the two depending upon your needs, rather than them happening concurrently within one packet or one happening "first or second."(3 votes)
- What is meant by the term "offset" mentioned in the TCP segment illustration?(2 votes)
- What is meant by the term "window size" mentioned in the TCP segment in the illustrations of the above article?(2 votes)
- What is meant by the term "padding" in the TCP segment under the IP data in the illustrations of the above article?(2 votes)
- Why bring in Transmission Control Protocol when it can lead to bigger problems than it's used to having?(1 vote)
- TCP gives a reliable network connection, ensuring that all packets arrive (if possible) and are assembled in the correct order. Generally, these benefits outweigh its extra network usage which is why TCP is usually used instead of UDP or just IP.(2 votes)