Things Routers Do: Network Address Translation

WiFi is fairly ubiquitous in 2015. In most of the nonprofessional contexts in which we use it, it’s provided by a small box that’s plugged into mains power and an Ethernet cable, usually with an antenna or two sticking out of it. I’ve heard these boxes called all kinds of things - hotspots, middleboxes, edge routers, home routers, NAT devices, gateways, and probably a few more I’ve forgotten; there are surely more I haven’t heard. “Router” is the word I hear and use most often myself, despite the unfortunate overlap with a more specific meaning (a device with multiple network links, capable of sending traffic between them). There are an awful lot of things these boxes do which aren’t implied by “router”!

One such thing is a really essential networking operation that nearly every one of us uses multiple times daily – network address translation, also known as masquerading. This is most of the work your WiFi hotspot or home router does to share one Internet connection with multiple devices. Here’s a quick summary of how they usually work (lots of other configurations are possible, but this is by far the most common I’m aware of):

the NAT device has (at least) two network interfaces, one of which is usually WiFi; the other is usually a link to a DSL, cable, satellite, cellular, or Ethernet network
the NAT device has a publicly assigned, routable, unique address on its non-WiFi interface
the NAT device has a static, private IP address on another interface (usually a WiFi interface). This network isn’t publicly routable, or reachable from the greater Internet by any means other than the NAT device’s public address
other devices on the same WiFi network as the NAT device are configured to send network traffic through the NAT device’s WiFi interface; the NAT device does a transformation on the traffic and sends it through its other (e.g. cellular, Ethernet, DSL, cable, satellite) interface
return traffic from the Internet comes into the NAT device’s non-WiFi interface; the NAT device does a transformation on the traffic and sends it to the appropriate WiFi device

The router’s WiFi network is usually in the reserved private address space, meaning that hosts on the Internet can’t directly send traffic to devices that are on this network – the best they can do is send traffic to the non-WiFi interface of the NAT device, and trust the NAT device to make sure it gets to the right host on the private WiFi network. Sending traffic to a nice website via the greater Internet (a.k.a. “the cloud”) looks like this:

+========+ private +==============+ public +=========+ public +===============+
| laptop |-------->|  NAT device  |------->|  cloud  |------->| the-toast.net |
+========+         +==============+        +=========+        +===============+

and the return traffic for replies looks like this:

+========+ private +==============+ public +=========+ public +===============+
| laptop |<--------|  NAT device  |<-------|  cloud  |<-------| the-toast.net |
+========+         +==============+        +=========+        +===============+

Assuming no general network brokenness, your laptop has no problem making a request to the-toast.net, a host on the public Internet. Since your laptop is on a network that’s not publicly addressable, the-toast.net can’t directly make a request back to your laptop. In order to receive a response, something about your request has to indicate to the-toast.net that it should send its response to the NAT device, to be forwarded to your laptop.

Instead of forcing your laptop to know details about the NAT device’s connection to the Internet, the NAT device just automatically rewrites outbound requests to look like they’re coming from its own public interface. While the full picture of the connection looks like the drawing above, all the-toast.net sees is this:

+==============+ public +=========+ public +===============+
|  NAT device  |------->|  cloud  |------->| the-toast.net |
+==============+        +=========+        +===============+

with a return path like this:

+==============+ public +=========+ public +===============+
|  NAT device  |<-------|  cloud  |<-------| the-toast.net |
+==============+        +=========+        +===============+

The NAT device takes care of the first step in outgoing requests (rewriting requests from your laptop to look like they came from the NAT device) and the last step in incoming replies (rewriting replies so they go to your laptop rather than the NAT device).

There’s a multiplexing problem in the operations I’ve described – there are (potentially) many WiFi clients sharing the private network with the NAT device, but the NAT device has only one interface to the outside world, and one address on which to receive traffic. If the NAT device only rewrites the addresses for traffic, it will have trouble disambiguating where return traffic should go. Consider the (often common!) situation where more than one host in a WiFi network is browsing the same website:

+========+ private +==============+ public +=========+ public +===============+
| marcy's|-------->|  NAT device  |------->|  cloud  |------->| the-toast.net |
| laptop |      -->|              |        |         |        |               |
+========+      |  +==============+        +=========+        +===============+
                |
+========+      |
| jane's |-------
| laptop |
+========+

Imagine Marcy asks the-toast.net for the article at /category/humor/, and Jane simultaneously asks the-toast.net for the article at /category/history/. In order to properly deliver Jane and Marcy’s responses without mixing them up, the NAT device needs to do more than just rewrite the destinations and sources of traffic. The NAT device needs to:

tag outgoing requests in some way that will retain the tag in incoming replies (so it knows how to rewrite traffic)
maintain a map of tags to private network hosts (so it knows what to rewrite traffic to)

Most NAT devices do this by manipulating an additional piece of information in a connection request: the transport-layer port number. Both UDP (the not-guaranteed-reliable transmission protocol used for name lookups) and TCP (the guaranteed, in-order transmission protocol used for web traffic) have a multiplexing facility of their own in the form of port numbers, which are always specified alongside the address of a node. (This explanation ignores ICMP and other non-TCP/UDP protocols for simplicity.)

public-computer.com on tcp port 5126 wants to connect to the-toast.net on tcp port 80

will generate a reply like

the-toast.net on tcp port 80 replies to public-computer.com on tcp port 5126

The port number for the server (in this case, the-toast.net) is set in advance per-service - port 80 for non-encrypted websites, port 443 for encrypted websites, and many more. The port number for the client (in this example, public-computer.com) is chosen randomly by the client, and only needs to be unique over the set of connections on the computer.

The source port nicely fills the requirement of being an arbitrary tag in an outgoing request that will be retained in an incoming reply! The NAT device can maintain a table of mappings like this:

marcy's laptop on tcp port 5126 to the-toast.net on port 80 -> nat-device on tcp port 60000 to the-toast.net on port 80
the-toast.net on tcp port 80 to nat-device on tcp port 60000 -> the-toast.net on tcp port 80 to marcy's laptop on tcp port 5126

Then, even if two hosts in the WiFi network both try to initiate a connection that looks like this:

me on tcp port 5126 to the-toast.net on port 80

the return traffic can be disambiguated, since the NAT device chooses which source port to map the connection to and can enforce uniqueness there.

What about traffic that looks like replies from public servers, but for which the NAT device doesn’t have a table entry? There are a number of things a device can do for such traffic:

look in a preconfigured table of incoming mappings (usually called “port forwards”) for a prearranged destination – e.g., “send all traffic for port 6667 to the local machine called irc”.
send all traffic without matching entries to a specific host (often called the “DMZ host” in the router’s configuration)
attempt to respond to the traffic with a service the NAT device itself is running
inform the responder that the traffic was refused
ignore the traffic completely

Most NAT devices provide some facility to configure which of the above it does on match failures, in addition to some parameters for the private network. Many such devices are running on top of an embedded Linux platform, where you can get NAT set up (assuming you’re already set up to route packets from one interface to another) with one simple invocation on the command line:

sudo iptables -I POSTROUTING -t nat -j MASQUERADE

(In the more modern nftables, things are a bit more complicated, but not by much. Unfortunately, I don’t know of many home router products that have made the jump to nftables from iptables.)

The NAT module is in a position of some considerable power over the user’s outbound connections, interposed as it is between the private network and all hosts on the public Internet. If you want to verify that the behavior of iptables or nftables is as you expect, you have a long row to hoe – you first need to inspect the binaries that modify the rules governing how the kernel should handle incoming traffic (i.e., the iptables or nftables commands themselves), the kernel modules for NAT translation, the kernel code for the iptables or nftables subsystems themselves, and probably a few more. If you don’t already know anything about kernel programming, you’ll probably find the transition nontrivial.

Instead, wouldn’t it be much more fun to write our own?