Virtual OS/2 International Consumer Education
VOICE Home Page: http://www.os2voice.org
March 2002

[Newsletter Index]
[Previous Page] [Next Page]
[Feature Index]

editor@os2voice.org


Understanding TCP/IP port numbers

By Peter Moylan © March 2002

Originally I wrote this article in the context of a specific problem: is it possible to make an FTP server work when it is behind a firewall? As a result, you'll find many references to FTP here. This should, however, also be of interest to people who have no particular interest in FTP, but who want a better understanding of how networking works.

Some comments on the jargon

If you're doing any sort of networking at all, you are probably running TCP/IP. This is not the only possible networking protocol. For sharing files around a LAN, for example, it is more common to use a protocol like Netbios rather than TCP/IP. In general, however, TCP/IP is probably the most popular way to do networking. If you want the well-known client-server applications like FTP, HTTP, SMTP, and so on, then TCP/IP is compulsory.

The Internet Protocol (IP) provides the basic transport of packets from one node to another. Each IP packet includes a header that includes, among other things, a source and destination address. These addresses are known as IP addresses.

Terminal Control Protocol (TCP) is a layer that sits on top of the IP layer. It adds some higher-level functions, but it depends on IP for the actual transfer of command and data packets. You can find out more about TCP and TCP/IP applications by typing the command 'TCPhelp' at an OS/2 command prompt.

IP addresses

Each node in a network is identified by an IP address, which is a 32-bit number. (The IP version 6 protocol turns this into a 128-bit number, but version 6 is not yet supported by the OS/2 version of TCP/IP.) In the simplest case, a computer has just one IP address, but it is possible - and becoming more common - for a computer to have more than one network interface, and therefore more than one IP address.

The IP addresses of the form

are special, in that they are private to the local area network (LAN) in which they are used. For most LANs the Class C addresses would be the most appropriate, but the Class A and Class B ranges are available for very large LANs. The advantage of these addresses is that you can assign them to nodes in a LAN without having to go through the official process of having an IP address allocated to you. The disadvantage is that these addresses are invisible outside the LAN; that is, these nodes cannot communicate with the global Internet. The way around this disadvantage is to use a firewall that implements Network Address Translation (NAT). NAT will be discussed in a later section.

At any given time, a network node might have several connections in progress with various machines around the network. This means that we need a way of labelling the different connections, so that we can tell them apart. The method used is to assign port numbers to (our end of) the connections. The port number has nothing to do with hardware I/O ports. Instead, it is simply a numbering system to label the connections. A port number is an unsigned 16-bit number. One end of a connection is uniquely identified by the pair (IP address, port number). A connection is defined by two of these pairs, one for each end.

The software mechanism for setting this up is known as a "socket". You can think of the socket as a data structure that keeps track of the two IP addresses and two port numbers involved in the transfer, together with whatever other state information is needed (data buffers, byte counters, etc.).

When a socket is first created, we know only the IP address and port number at our own end. Once a connection is established, we get to find out the IP address and port number at the other end. Naturally, one of the two ends has to be the one to initiate the connection. The usual mechanism is that the server goes into a "listening" state where it waits for client connections, and then the client end actually establishes the connection.

Port numbers in the range 0 to 49151 are reserved for "server" ports. More precisely, those in the range 0 to 1023 are defined by official standards; those in the range 1024 to 49151 have a less official status, but they are still considered to be "registered ports" which are allocated to known applications. A list of these reserved port numbers can be found in the file MPTN\ETC\SERVICES.

Port numbers in the range 49152 to 65535 form a pool of "available" ports which are used whenever a new port is needed. Typically one of these ports is allocated for a short-term connection, and then deallocated once the operation is done.

In a client/server protocol, the clients need to have some way of finding the servers. For this reason, servers always listen on "well known ports" which are reserved for this purpose. The FTP protocol uses two connection channels, one for commands and one for data. Consequently it has two well known ports: port 21 for the commands, and port 20 for the data. This, of course, is at the server end. At the client end, the client can use whatever ports it wants, and normally the client will choose its ports from the pool of available ports.

The FTP protocol allows for two kinds of data connection. In so-called "passive FTP" the data transfer is initiated from the client end. In non-passive FTP, also known as "port FTP", the transfer is initiated from the server end. In both cases the command connection uses port 21 at the server end. The difference between the two methods lies in the way the data ports are allocated.

Passive FTP

Passive FTP is initiated when the client issues a PASV command. (The effect of this lasts for just one file transfer. If there is a second file to be transferred, the client has to issue a second PASV command.) The server responds to this command with a line like
      227 Entering passive mode (127,0,0,1,203,197)

The first four of those numbers specifies the IP address of the server. The remaining two specify a port number. In effect the PASV command is saying "please choose a data port, and tell me what port it is". The server chooses the port, normally from the big pool of available port numbers, and reports its number back to the client. The server then listens at that port for the client to initiate a data connection. Of course the client must also choose a data port at its own end; the server finds out which port it is after the data connection is established.

In passive FTP, the command channel and the data channel behave in the same way. The server listens at a known port. The client knows what the port number is, so it can initiate a connection to that port.

Port (non-passive) FTP

The earliest drafts of the FTP standards did not have any passive FTP. The "normal" way to transfer data, and even now the most common way, is to have the data connection established from the server end. (Exception: most web browsers, as distinct from standalone FTP clients, seem to use passive FTP exclusively. The standalone FTP clients usually give the user the choice.) For port FTP to work, the server has to know which port the client is listening on.

This is specified with a PORT command from the client. An example of this command is

       PORT 127,0,0,1,203,201

This specifies the IP address of the client, and the port number that the client wishes to use for the data transfer. That is, the client chooses a port number, it listens on that port, and the server makes a connection to that port. In both this and the PASV example, the IP address was 127.0.0.1. That's because I used the loopback connection on my computer to generate the examples. In practice, the IP address will be whatever address belongs to your own machine.

Meanwhile, the server must also choose a port number at its own end. By standard convention, this is almost always port 20.

Note that the client must always give either a PASV command or a PORT command before it starts an upload or download. Which of these it gives controls whether we use a passive or non-passive transfer.

The effect of a firewall

A firewall might or might not include the NAT option. In this section, I will ignore NAT, but we'll come back to it in the following section.

The function of a firewall is to allow some packets to pass through, while refusing to let others pass through. This is done by a set of rules created by the system administrator. The administrator has to decide which classes of traffic are legal.

I don't have much experience with firewalls, so I can't give an expert description of what happens here. I believe, however, that the rules are usually based on port numbers. Traffic to/from certain ports is allowed, while other ports are blocked. The rules would normally have to be asymmetric, in the sense that the rules for outgoing packets would be quite different from the rules for incoming packets.

Consider the case where an FTP client is behind a firewall, and is talking to a server that is not behind a firewall. A typical choice of firewall rules would make non-passive FTP illegal, because non-passive FTP requires the server - the machine outside the firewall - to initiate the data connection. A firewall is often set up in such a way that machines outside the firewall are not allowed to initiate a connection. In fact, this is a very large part of the motivation for adding passive FTP to the FTP standard. If the client is behind a firewall, then normally it should use only passive FTP.

Conversely, if the server is behind the firewall and the client is not, then passive FTP is likely to be blocked and port FTP is the only sensible option.

If the client is behind a firewall, and the server is behind a different firewall, then we are in trouble. The 'firewall' concept was not designed with this situation in mind. Servers should not be behind a firewall, except of course for servers that are supposed to be private to the LAN. If you have a firewall protecting your LAN, then you should normally put your public server applications on the same machine that is running the firewall software. If you do this, then technically the server is outside the firewall.

If, for any reason, you really have to have your server behind a firewall, then you had better be an expert in designing the firewall rules. You should read the preceding sections very carefully, to see which ports should be enabled in the firewall rules. Actually, that part is easy. The difficult part is to do this in such a way that you allow the FTP server to function, but without compromising the security of your LAN. If you make the rules too permissive, you might as well not have a firewall.

Firewalls implementing NAT

The best sort of firewall is one that offers the option of Network Address Translation (NAT). This is a feature that allows your machines inside the LAN to have their own IP addresses, but which lets them appear to have a different IP address as seen by the outside world. In the best case, all the machines in your LAN can share the same "external" IP address.

An IP data packet has a header that specifies, among other things, a source and destination IP address. That says who is sending the packet, and who is supposed to receive it. With the NAT feature in place, the firewall alters the source IP address, to make it appear as if the packet came from a different address. That is for outgoing traffic. For incoming traffic, the firewall intercepts the packets destined for the "fake" IP address, and sends them to the real intended recipient.

Consider the case where you have an FTP client behind the firewall, and an FTP server outside the firewall. When the client connects to the server, the server does not see the client's true IP address. Instead, it sees the address of the firewall. The server doesn't even know that a firewall is present. It simply interacts with that address, just as it would with any client. As far as the server is concerned, the client is the firewall machine. However, the firewall passes on the server's responses to the true client.

Similarly, if a server is behind a firewall then every client outside the firewall thinks that the server is at the same address as the firewall machine. The firewall modifies the addresses, and passes the traffic on to the true address of the server.

All of this would work well except for two little details. As we have seen in earlier examples, the PASV and PORT commands of the FTP protocol send IP addresses as data. These are the true IP addresses, not the addresses as altered by the NAT software. This can result in data being sent to the wrong address.

The logical solution to this problem would be for the NAT software to intercept the PASV and PORT commands, and alter the numbers in those lines. Some firewall software is smart enough to do this. Unfortunately, many firewalls are not able to make this adjustment.

In the past, people didn't normally think in terms of putting an FTP server behind a firewall. Now that we have cable modems, ADSL, and various other ways to get high-speed data links, that option is becoming more common. To deal with this complication, it might be necessary for the FTP server to fake its response to the PASV command, by giving the address of the firewall rather than its own IP address.

Likewise, an FTP client that is behind a firewall should in principle adjust its PORT command parameters to allow for the firewall. (In the case of PORT, the problem must be solved at the client end rather than at the server end.) In practice, I have never heard of an FTP client that does this. This means that an FTP client that is behind a firewall must always use passive FTP.

References:

Peter's web site - http://eepjm.newcastle.edu.au


[Feature Index]
editor@os2voice.org
[Previous Page] [Newsletter Index] [Next Page]
VOICE Home Page: http://www.os2voice.org