Hacking 4 Ever: Well-Known Sockets And The Applications Layer

So far, we have described how a stream of data is broken up into datagrams, sent to
another computer, and put back together. However something more is needed in order to
accomplish anything useful. There
has to be a way for you to open a connection to a specified computer, log into it, tell it
what file you want, and control the transmission of the file. (If you have a different
application in mind, e.g. computer mail, some analogous protocol is needed.) This is done
by "application protocols".
The application protocols run "on top" of TCP/IP. That is, when they want to send a
message, they give the message to TCP. TCP makes sure it gets delivered to the other
end. Because TCP and IP take care of all the networking details, the applications
protocols can treat a network connection as if it were a simple byte stream, like a terminal
or phone line. Before going into more details about applications
programs, we have to describe how you find an application.
Suppose you want to send a file to a computer whose Internet address is 128.6.4.7. To
start the process, you need more than just the Internet address. You have to connect to the
FTP server at the other
end. In general, network programs are specialized for a specific set of tasks. Most
systems have separate programs to handle file transfers, remote terminal logins, mail, etc.
When you connect to
128.6.4.7, you have to specify that you want to talk to the FTP server. This is done by
having "well-known sockets" for each server. Recall that TCP uses port numbers to keep
track of individual conversations. User programs normally use more or less random port
numbers. However specific port numbers are assigned to the programs that sit waiting for
requests.
For example, if you want to send a file, you will start a program called "ftp". It will open
a connection using some random number, say 1234, for the port number on its end.
However it will specify port
number 21 for the other end. This is the official port number for the FTP server. Note that
there are two different programs involved. You run ftp on your side. This is a program
designed to accept commands
from your terminal and pass them on to the other end. The program that you talk to on the
other machine is the FTP server. It is designed to accept commands from the network
connection, rather than an
interactive terminal. There is no need for your program to use a well-known socket
number for itself. Nobody is trying to find it. However the servers have to have well-
known numbers, so that people can open connections to them and start sending them
commands. The official port numbers for each program are given in "Assigned
Numbers".
Note that a connection is actually described by a set of 4 numbers: the Internet address at
each end, and the TCP port number at each end. Every datagram has all four of those
numbers in it. (The Internet addresses are in the IP header, and the TCP port numbers are in the TCP header.) In ord

to keep things straight, no two connections can have the same set of numbers. However
is enough for any one number
to be different. For example, it is perfectly possible for two different users on a machin
to be sending files to the same other machine. This could result in connections with the
following parameters:
Since the same machines are involved, the Internet addresses are the same. Since they are
both doing file transfers, one end of the connection involves the well-known port number
for FTP. The only thing
that differs is the port number for the program that the users are running. That's enough of
a difference. Generally, at least one end of the connection asks the network software to
assign it a port number
that is guaranteed to be unique. Normally, it's the user's end, since the server has to use a
well-known number.
Now that we know how to open connections, let's get back to the applications programs.
As mentioned earlier, once TCP has opened a connection, we have something that might
as well be a simple wire. All
the hard parts are handled by TCP and IP. However we still need some agreement as to
what we send over this connection. In effect this is simply an agreement on what set of
commands the application will
understand, and the format in which they are to be sent. Generally, what is sent is a
combination of commands and data. They use context to differentiate.
For example, the mail protocol works like this: Your mail program opens a connection to
the mail server at the other end. Your program gives it your machine's name, the sender
of the message, and the
recipients you want it sent to. It then sends a command saying that it is starting the
message. At that point, the other end stops treating what it sees as commands, and starts
accepting the message. Your end then starts sending the text of the message. At the end
of the message, a special mark is sent (a dot in the first column). After that, both ends
understand that your program is again sending commands. This is the simplest way to do
things, and the one that most applications use.
File transfer is somewhat more complex. The file transfer protocol involves two different
connections. It starts out just like mail. The user's program sends commands like "log me
in as this user", "here is
my password", "send me the file with this name". However once the command to send
data is sent, a second connection is opened for the data itself. It would certainly be
possible to send the data on the
same connection, as mail does. However file transfers often take a long time. The
designers of the file transfer protocol wanted to allow the user to continue issuing
commands while the transfer is going
on. For example, the user might make an inquiry, or he might abort the transfer. Thus the
designers felt it was best to use a separate connection for the data and leave the original
command connection for
commands. (It is also possible to open command connections to two different computers,
and tell them to send a file from one to the other. In that case, the data couldn't go over
the command
connection.)
Remote terminal connections use another mechanism still. For remote logins, there is just
one connection. It normally sends data. When it is necessary to send a command (e.g. to
set the terminal type or to change some mode), a special character is used to indicate that
the next character is a command. If the user happens to type that special character as data,
two of them are sent.
We are not going to describe the application protocols in detail in this document. It's
better to read the RFC's yourself. However there are a couple of common conventions
used by applications that will be

described here. First, the common network representation: TCP/IP is intended to be
usable on any computer. Unfortunately, not all computers agree on how data is
represented. There are differences in
character codes (ASCII vs. EBCDIC), in end of line conventions (carriage return, line
feed, or a representation using counts), and in whether terminals expect characters to be
sent individually or a line
at a time. In order to allow computers of different kinds to communicate, each
applications protocol defines a standard representation.
Note that TCP and IP do not care about the representation. TCP simply sends octets.
However the programs at both ends have to agree on how the octets are to be interpreted.
The RFC for each application specifies the standard representation for that application.
Normally it is "net ASCII". This uses ASCII characters, with end of line denoted by a
carriage return followed by a line feed. For remote
login, there is also a definition of a "standard terminal", which turns out to be a half-
duplex terminal with echoing happening on the local machine. Most applications also
make provisions for the two
computers to agree on other representations that they may find more convenient. For
example, PDP-10's have 36-bit words. There is a way that two PDP-10's can agree to
send a 36-bit binary file. Similarly,
two systems that prefer full-duplex terminal conversations can agree on that. However
each application has a standard representation, which every machine must support.
Keep in mind that it has become common practice for some corporations to change a
services port number on the server side. If your client software is not configured with the
same port number, connection will not be successful. We will discuss later in this text
how you can perform port scanning on an entire IP address to see which ports are active.

Well-Known Sockets And The Applications Layer

0 comments

Archives