Overview of KeepAlive
Any discussion of so-called "keep-alive" functionality must start by answering
the question: "What is does 'keep-alive' mean?" As one specification
succintly states:
A "keep-alive" mechanism periodically probes the other end of a connection when the connection is otherwise idle, even when there is no data to be sent.Keepalive mechanisms appear in many different protocols, under various names. Protocols may be layered over the top of another protocol; for example, HTTPS consists of HTTP layered over SSL/TLS, itself layered over TCP/IP. Each protocol layer may have its own form of such keepalive functionality.
TCP KeepAlive
For TCP, the definitive specification for keepalive functionality is
RFC 1122, Section
4.2.3.6. While RFC 1122 includes a good discussion of why TCP keepalives are
meant to be off by default, in practice TCP keepalives do have value,
especially when dealing with network equipment (such as routers, firewalls,
NATs) between the client and the server; such equipment might terminate the
connection when the connection has been idle for too long. The use of TCP
keepalives can help to prevent such network equipment from breaking the
connection needlessly.
How TCP KeepAlives Works
OK, so you want to use the TCP keepalive functionality in your program. The
question is "How exactly does the TCP keepalive feature work?" Good question.
Answering this requires three different numeric values: the idle time,
the number of probes to send (the probe count), and the interval
time between each probe. Remember, though, that the
SO_KEEPALIVE
TCP socket option must be enabled on the socket
in order for the TCP keepalive feature to be used.
First, let's assume that you have created a TCP connection, and have transferred data back and forth on that connection. All of the data have been transferred, but you have not closed the connection, so now it is idle. How long does that connection sit idle, with no data transferred at the TCP layer, before one end of the connection or the other starts to wonder whether the connection is still alive? This amount of time where the connection sits idle is the TCP keepalive idle time; the default idle time is two hours (per RFC 1122).
Our TCP connection has been sitting idle now for amount of time given by the idle time value; what happens then? At this point, the end of the TCP connection with TCP keepalive enabled sends out a "probe". This probe is just a small TCP packet which requires a response from the other side. Once the probe has been sent, the amount of time given by the interval time value passes. If we hear nothing back from the remote peer within the interval time, we send another probe. This process repeats until either a) we receive a response back from the peer, or b) the probe count value has been reached.
Let us assume that our TCP connection was idle for so long that TCP keepalive probes were sent, and still no response was received. What happens then? At this point, the connection is broken. When the programs at either end of the connection next try to read or write data on that connection, the read/write attempts will fail.
When To Use TCP KeepAlive
When the TCP client is connected directly to the TCP server, it usually does
not matter whether one end or the other uses TCP keepalive. As long as
one of them does, a broken connection can be detected.
client <-------------------------------> serverHowever, if the TCP client connects to the TCP server via proxies/routers/firewalls/NAT, the picture changes. When this happens (and it is the common scenario), then both sides may need to use TCP keepalive to learn when their side of the proxied connection is broken:
client <-----------> NAT <-------------> serverIn this situation, there are actually two different TCP connections involved: between the client and the NAT, and between the NAT and the server. Each TCP connection may break independently of the other, which is why both ends of the connection (client and server) may need to use TCP keepalives. Use of TCP keepalives also helps here because when the router/firewall/NAT receives the TCP keepalive probe, it may (depending on the network equipment in question) cause the router to reset any timers that were about to close the TCP connections on either side.
Why are TCP KeepAlives Useful for FTP?
"This is all very fascinating", you say, "but what does it have to do with
proftpd
and FTP?" If you have ever had an FTP download (or upload)
take a very long time, only to have that transfer timed out in the
middle, then TCP keepalives may prevent the timeout.
Consider what happens for FTP transfers which take a long time (either due to very large file(s) being transferred, or a slow connection): you have one TCP connection for the control connection, and a separate TCP connection for the data transfer connection. All of the bytes are being transferred over the data connection, so that data connection is certainly not idle -- but while the data transfer is occurring, the control connection is idle! And let's assume that your FTP connections are going through some NAT device in between the client and the server. That NAT may not be very smart; it may not know that the two different TCP connections of your FTP session are related to each other; it only sees one idle TCP connection, and one busy TCP connection. If that FTP control connection is idle for too long, then the NAT may close it (in order to keep valuable space in its state tables available for TCP connections that actually need to transfer bytes). (Some NATs have been known to close TCP connections that have been idle for only 5 minutes.) The FTP server sees that the FTP control connection is closed, and aborts the data transfer. What a mess!
If either the FTP server or the FTP client had used TCP keepalives on the control connection, then maybe that NAT would have seen the TCP keepalive probes, and not closed the idle control connection. So how can we make sure that either the client or the server has TCP keepalives enabled?
In proftpd-1.3.5rc1
and later, ProFTPD's SocketOptions
directive supports a
keepalive parameter for controlling whether the server uses TCP
keepalives, e.g.:
# Disable use of TCP keepalives SocketOptions keepalive off # Enable use of TCP keepalives (this is the default) SocketOptions keepalive onIn addition, on some Unix platforms, the
SocketOptions
directive's keepalive parameter can do finer-grained tuning of the
TCP keepalive values:
# Enable use of TCP keepalives, with the given idle/count/interval values SocketOptions keepalive 7200:9:75In general, though, you should use the system-wide defaults unless you are running into data transfer timeout issues. If you are seeing timeouts, try using the keepalive parameter of
SocketOptions
to
gradually reduce the idle timeout by small increments (e.g. 10-15
seconds), then if that does not help, increment the count by 1 at a
time (remember that each probe is more extra data transfer), then if that
still does not help, increase the interval time. Do not reduce the
interval time, since that is the amount of time that you should wait to
see if the other end responds, before sending another probe. Waiting less
time before the other end responds means a greater chance of killing your TCP
connection unnecessarily.
Not all TCP stacks let the application control the TCP keepalive timeout after which the first probe will be sent, or the total number of probes sent, or how much time between probes will be used. That is, many TCP stacks only allow enabling/disabling of TCP keepalive. If TCP keepalive is enabled, then the standard values of 2 hours for the idle timeout, a count of 9 probes, with 75 seconds between probes, will be used.
Since many platforms do not allow fine-grained tuning of TCP keepalive values, especially on a per-service basis, other means for checking whether the connection is still alive must be used. And that leads us to application-level keepalive mechanisms.
FTP KeepAlive
The main issue with FTP keepalives is that they are all initiated by the client.
FTP is a request/response model, and it does not allow for the FTP server to
send arbitrary unrequested data to the FTP client via the control connection.
Fortunately, there are many FTP clients which implement some sort of FTP
keepalive feature.
How FTP KeepAlive Works
Some firewalls/routers know about this
Sadly, some FTP servers cannot handle receiving an FTP command on the control
connection while they are in the middle of transferring data on the data
connection (
FTP Client-Specific KeepAlive Settings
SSH KeepAlive
How SSH KeepAlive Works
In the case of ProFTPD's
KeepAlive in Other Protocols
HTTP
LDAP
WebSocket
Additional Reading
If tuning TCP keepalives does not work to keep your long-lasting data transfers
from timing out, what can be done? Answer: use keepalive features at other
layers in the protocol stack. FTP has its own ideas for doing keepalive
checks, but they are not as elegant as that of TCP.
Since the FTP server cannot do anything to test whether the FTP session is
alive, the FTP client must do the tests. The easiest way to test whether an
FTP session is alive is to send an FTP command. And fortunately,
RFC 959, Section 4.1.3
defines the NOOP
("No Operation") command whose sole purpose is
to elicit the "OK" response from the FTP server. This makes the
NOOP
command the ideal way to test whether the FTP server is
still alive and listening to the FTP client.
NOOP
trick, though, and
may filter out/drop that FTP command. FTP clients, then, have been known to
resort to a number of other FTP commands for use as FTP keepalives, including:
Some FTP clients even choose a command at random from the above list, just to
keep any interfering router/firewall/NAT guessing!
NOOP
LIST
STAT
CWD
REST 0
PWD
TYPE A
proftpd
can handle this). But that may not
matter, for the purposes of FTP keepalives; all that matters is that at the
TCP level, the bytes were sent by the client and acknowledged by the server's
TCP stack.
Not every FTP client supports the FTP keepalive functionality. If you want
to try out FTP clients which do support FTP keepalives, you might look
into the ftp:nop-interval
setting for
lftp
, or the
control-timeout
setting for
ncftp
.
The SSH2 protocol (and SFTP, which runs over SSH2) is more complex than FTP,
and thus has much better support for application-level keepalive
functionality. Either end of an SSH2 connection can send messages at any time.
The usual mechanism used by SSH2 implementations for implementing an SSH2-level
keepalive check is send either a CHANNEL_REQUEST
or a
GLOBAL_REQUEST
message for a known unsupported command, and to
request a response from the other side. It does not matter that the requested
command is unsupported; all that matters is that the response comes back,
signifying that the other end is still alive and listening. Both clients and
servers use this technique.
mod_sftp
module, the way to configure
SSH2 keepalives is the SFTPClientAlive
directive. When configured, the
mod_sftp
module sends CHANNEL_REQUEST
/GLOBAL_REQUEST
messages for "[email protected]" in order to solicit a response
from the connected client.
Many application protocols end up reinventing the keepalive feature in some
way, usually as a "ping/pong" mechanism where a "ping" is sent every so often
by one side, with a "pong" response expected from the other end of the
connection.
Most HTTP connections have no need of a keepalive mechanism since HTTP
connections are usually short-lived, and since there are usually data flowing
in one direction or the other on the HTTP connection (thus an HTTP connection
is usually not idle for long enough time to warrant a keepalive feature).
HTTP long polling (i.e.
RFC 6202) is an exception;
and for HTTP long polling connections, use of TCP keepalives may be needed.
But the HTTP protocol itself does not specifically define a way for either
end to arbitrarily send data across the connection for the purpose of
determining whether the connection is still alive. (HTTP keepalive refers
to a different concept, i.e. that of telling the server to not
close the connection after sending its response so that the connection can
be reused, thus "kept alive".)
For long-lived LDAP connections, keepalive functionality can be implemented
by using the Abandon operation, as described
here. The idea is to have the client send a request that it knows the
server will ignore/discard; the act of transmitting the request over the
connection acts to keep any intermediaries on the network (router/NAT/firewall)
from closing an "idle" connection prematurely.
The WebSocket protocol, defined in
RFC 6455, does have need
of a keepalive mechanism, since it establishes a long-lived connection.
Thus does the RFC define ping/pong messages; see Section 5.5.2
(PING
), and Section 5.5.3 (PONG
).