Hacking 4 Ever: The TCP Level

Two separate protocols are involved in handling TCP/IP datagrams. TCP (the
"transmission control protocol") is responsible for breaking up the message into
datagrams, reassembling them at the other end, resending anything that gets lost, and
putting things back in the right order. IP (the "internet protocol") is responsible for
routing individual datagrams. It may seem like TCP is doing all the work. And
in small networks that is true. However in the Internet, simply getting a datagram to its
destination can be a complex job. A connection may require the datagram to go through
several networks at Rutgers, a serial line to the John von Neuman Supercomputer Center
a couple of Ethernets there, a series of 56Kbaud phone lines to another NSFnet site, and
more Ethernets on another campus. Keeping track of
the routes to all of the destinations and handling incompatibilities among different
transport media turns out to be a complex job.
Note that the interface between TCP and IP is fairly simple. TCP simply hands IP a
datagram with a destination. IP doesn't know how this datagram relates to any datagram
before it or after it. It may
have occurred to you that something is missing here. We have talked about Internet
addresses, but not about how you keep track of multiple connections to a given system.
Clearly it isn't enough to get a
datagram to the right destination. TCP has to know which connection this datagram is
part of.
This task is referred to as "demultiplexing." In fact, there are several levels of
demultiplexing going on in TCP/IP. The information needed to do this demultiplexing is
contained in a series of "headers". A header is simply a few extra octets tacked onto the
beginning of a datagram by some protocol in order to keep track of it. It's a lot like
putting a letter into an envelope and putting an address on the outside of the envelope.
Except with modern networks it happens several times. It's like you put the letter into a
little
envelope, your secretary puts that into a somewhat bigger envelope, the campus mail
center puts that envelope into a still bigger one, etc.
Here is an overview of the headers that get stuck on a message that passes through a
typical TCP/IP network:
We start with a single data stream, say a file you are trying to send to some other
computer:
TCP breaks it up into manageable chunks. (In order to do this, TCP has to know how
large a datagram your network can handle. Actually, the TCP's at each end say how big a
datagram they can handle, and then they pick the smallest size.)
TCP puts a header at the front of each datagram. This header actually contains at least 20
octets, but the most important ones are a source and destination "port number" and a
"sequence number". The port
numbers are used to keep track of different conversations. Suppose 3 different people are
transferring files. Your TCP might allocate port numbers 1000, 1001, and 1002 to these
transfers. When you are sending a datagram, this becomes the "source" port number,
since you are the source of the datagram. Of course the TCP at the other end has assigned
a port number of its own for the conversation. Your TCP has to know the port number
used by the other end as well. (It finds out when the connection starts, as we will explain
below.) It puts this in the "destination" port field. Of course if the other end sends a
datagram back to you, the source and destination port numbers will be reversed, since
then it will be the source and you will be the destination.
Each datagram has a sequence number. This is used so that the other end can make sure
that it gets the datagrams in the right order, and that it hasn't missed any. (See the TCP
specification for
details.) TCP doesn't number the datagrams, but the octets. So if there are 500 octets of
data in each datagram, the first datagram might be numbered 0, the second 500, the next
1000, the next 1500,
etc.
Finally, I will mention the Checksum. This is a number that is computed by adding up all
the octets in the datagram (more or less - see the TCP spec). The result is put in the
header. TCP at the other end computes the checksum again. If they disagree, then
something bad happened to the datagram in transmission, and it is thrown away.
The window is used to control how much data can be in transit at any one time. It is not
practical to wait for each datagram to be acknowledged before sending the next one. That
would slow things down
too much. On the other hand, you can't just keep sending, or a fast computer might
overrun the capacity of a slow one to absorb data. Thus each end indicates how much
new data it is currently prepared to
absorb by putting the number of octets in its "Window" field. As the computer receives
data, the amount of space left in its window decreases. When it goes to zero, the sender
has to stop. As the receiver processes the data, it increases its window, indicating that it is
ready to accept more data. Often the same datagram can be used to acknowledge receipt
of a set of data and to give permission for
additional new data (by an updated window).
The "Urgent" field allows one end to tell the other to skip ahead in its processing to a
particular octet. This is often useful for handling asynchronous events, for example when
you type a control character or other command that interrupts output. The other fields are
beyond the scope of this document.

The TCP Level

0 comments

Archives