But if all we need to do is NAT traffic, why not just build something that only knows how to NAT traffic? I’ve looked at building networked applications on top of (and with) the full network stack provided by the MirageOS library OS a lot, but we can also build lower-level applications with fundamentally the same programming tactics and tools we use to write, for example, DNS resolvers.
Building A Typical Stack From Scratch
Let’s have a look at the ethif-v4 example in the mirage-skeleton example repository. This example unikernel shows how to build a network stack “by hand” from a bunch of different functors, starting from a physical device (provided by config.ml at build time, representing either a Xen backend if you configure with mirage configure --xen or a Unix tuntap backend if you build with mirage configure --unix). I’ve reproduced the network setup bits from the most recent version as of now and annotated them a bit:
WiFi is fairly ubiquitous in 2015. In most of the nonprofessional contexts in which we use it, it’s provided by a small box that’s plugged into mains power and an Ethernet cable, usually with an antenna or two sticking out of it. I’ve heard these boxes called all kinds of things – hotspots, middleboxes, edge routers, home routers, NAT devices, gateways, and probably a few more I’ve forgotten; there are surely more I haven’t heard. “Router” is the word I hear and use most often myself, despite the unfortunate overlap with a more specific meaning (a device with multiple network links, capable of sending traffic between them). There are an awful lot of things these boxes do which aren’t implied by “router”!
One such thing is a really essential networking operation that nearly every one of us uses multiple times daily — network address translation, also known as masquerading. This is most of the work your WiFi hotspot or home router does to share one Internet connection with multiple devices. Here’s a quick summary of how they usually work (lots of other configurations are possible, but this is by far the most common I’m aware of):
the NAT device has (at least) two network interfaces, one of which is usually WiFi; the other is usually a link to a DSL, cable, satellite, cellular, or Ethernet network
the NAT device has a publicly assigned, routable, unique address on its non-WiFi interface
the NAT device has a static, private IP address on another interface (usually a WiFi interface). This network isn’t publicly routable, or reachable from the greater Internet by any means other than the NAT device’s public address
other devices on the same WiFi network as the NAT device are configured to send network traffic through the NAT device’s WiFi interface; the NAT device does a transformation on the traffic and sends it through its other (e.g. cellular, Ethernet, DSL, cable, satellite) interface
return traffic from the Internet comes into the NAT device’s non-WiFi interface; the NAT device does a transformation on the traffic and sends it to the appropriate WiFi device
The router’s WiFi network is usually in the reserved private address space, meaning that hosts on the Internet can’t directly send traffic to devices that are on this network — the best they can do is send traffic to the non-WiFi interface of the NAT device, and trust the NAT device to make sure it gets to the right host on the private WiFi network. Sending traffic to a nice website via the greater Internet (a.k.a. “the cloud”) looks like this:
Assuming no general network brokenness, your laptop has no problem making a request to the-toast.net, a host on the public Internet. Since your laptop is on a network that’s not publicly addressable, the-toast.net can’t directly make a request back to your laptop. In order to receive a response, something about your request has to indicate to the-toast.net that it should send its response to the NAT device, to be forwarded to your laptop.
Instead of forcing your laptop to know details about the NAT device’s connection to the Internet, the NAT device just automatically rewrites outbound requests to look like they’re coming from its own public interface. While the full picture of the connection looks like the drawing above, all the-toast.net sees is this:
+==============+ public +=========+ public +===============+
| NAT device |------->| cloud |------->| the-toast.net |
+==============+ +=========+ +===============+
with a return path like this:
+==============+ public +=========+ public +===============+
| NAT device |<-------| cloud |<-------| the-toast.net |
+==============+ +=========+ +===============+
The NAT device takes care of the first step in outgoing requests (rewriting requests from your laptop to look like they came from the NAT device) and the last step in incoming replies (rewriting replies so they go to your laptop rather than the NAT device).
There’s a multiplexing problem in the operations I’ve described — there are (potentially) many WiFi clients sharing the private network with the NAT device, but the NAT device has only one interface to the outside world, and one address on which to receive traffic. If the NAT device only rewrites the addresses for traffic, it will have trouble disambiguating where return traffic should go. Consider the (often common!) situation where more than one host in a WiFi network is browsing the same website:
Imagine Marcy asks the-toast.net for the article at /category/humor/, and Jane simultaneously asks the-toast.net for the article at /category/history/. In order to properly deliver Jane and Marcy’s responses without mixing them up, the NAT device needs to do more than just rewrite the destinations and sources of traffic. The NAT device needs to:
tag outgoing requests in some way that will retain the tag in incoming replies (so it knows how to rewrite traffic)
maintain a map of tags to private network hosts (so it knows what to rewrite traffic to)
Most NAT devices do this by manipulating an additional piece of information in a connection request: the transport-layer port number. Both UDP (the not-guaranteed-reliable transmission protocol used for name lookups) and TCP (the guaranteed, in-order transmission protocol used for web traffic) have a multiplexing facility of their own in the form of port numbers, which are always specified alongside the address of a node. (This explanation ignores ICMP and other non-TCP/UDP protocols for simplicity.)
public-computer.com on tcp port 5126 wants to connect to the-toast.net on tcp port 80
will generate a reply like
the-toast.net on tcp port 80 replies to public-computer.com on tcp port 5126
The port number for the server (in this case, the-toast.net) is set in advance per-service – port 80 for non-encrypted websites, port 443 for encrypted websites, and many more. The port number for the client (in this example, public-computer.com) is chosen randomly by the client, and only needs to be unique over the set of connections on the computer.
The source port nicely fills the requirement of being an arbitrary tag in an outgoing request that will be retained in an incoming reply! The NAT device can maintain a table of mappings like this:
marcy's laptop on tcp port 5126 to the-toast.net on port 80 -> nat-device on tcp port 60000 to the-toast.net on port 80
the-toast.net on tcp port 80 to nat-device on tcp port 60000 -> the-toast.net on tcp port 80 to marcy's laptop on tcp port 5126
Then, even if two hosts in the WiFi network both try to initiate a connection that looks like this:
me on tcp port 5126 to the-toast.net on port 80
the return traffic can be disambiguated, since the NAT device chooses which source port to map the connection to and can enforce uniqueness there.
What about traffic that looks like replies from public servers, but for which the NAT device doesn’t have a table entry? There are a number of things a device can do for such traffic:
look in a preconfigured table of incoming mappings (usually called “port forwards”) for a prearranged destination — e.g., “send all traffic for port 6667 to the local machine called irc”.
send all traffic without matching entries to a specific host (often called the “DMZ host” in the router’s configuration)
attempt to respond to the traffic with a service the NAT device itself is running
inform the responder that the traffic was refused
ignore the traffic completely
Most NAT devices provide some facility to configure which of the above it does on match failures, in addition to some parameters for the private network. Many such devices are running on top of an embedded Linux platform, where you can get NAT set up (assuming you’re already set up to route packets from one interface to another) with one simple invocation on the command line:
sudo iptables -I POSTROUTING -t nat -j MASQUERADE
(In the more modern nftables, things are a bit more complicated, but not by much. Unfortunately, I don’t know of many home router products that have made the jump to nftables from iptables.)
The NAT module is in a position of some considerable power over the user’s outbound connections, interposed as it is between the private network and all hosts on the public Internet. If you want to verify that the behavior of iptables or nftables is as you expect, you have a long row to hoe — you first need to inspect the binaries that modify the rules governing how the kernel should handle incoming traffic (i.e., the iptables or nftables commands themselves), the kernel modules for NAT translation, the kernel code for the iptables or nftables subsystems themselves, and probably a few more. If you don’t already know anything about kernel programming, you’ll probably find the transition nontrivial.
My first interesting job was as a student systems administrator for a fairly heterogenous group of UNIX servers. For the first many months, I was essentially a clever interface to an array of search engines. I came to have a great appreciation for the common phenomenon of a detailed solution to a very specific problem, laid out beautifully in the personal site of someone I’d never met. I answered a lot of “how on Earth did you figure that out?” with “somebody on the Internet wrote about it”.
I wasn’t courageous enough to run my own blog back then. I didn’t think I had anything to offer, although in retrospect the work I was doing was pretty cool (I had my hands in a few large Linux cluster projects, for example). These little personal blogs I made great use of were written by named people, with their credentials often right on the front page. I didn’t have any credentials, and nobody knew my name; what business did I have claiming to have any answers to anything?
Years later, I used “some random” in my site’s name because I don’t think who I am matters at all. I used “idiot” because I don’t think my credentials matter at all either, and I don’t want anyone else to stay silent because they think these things matter and they don’t have them. There’s nothing that gives me any more permission to write than any other person on the planet, including you.
I started using somerandomidiot.com for both my e-mail and my web presence in early 2014, moving away from yomimono.org, the domain I’d been using since 2006. (If you know a yomimono from elsewhere on the Internet, it’s likely to be me.) There is quite the difference in reaction between these domain names.
Dictating my @yomimono.org email address over the telephone required me to memorize a few variations on the NATO alphabet, and even when dictated correctly was easily typo’d. I got used to people reading my address back as “your user name at…” and then the sound of giving it the ol’ college try and then abandoning the enterprise as hopeless. On the other hand, most English speakers know how to both hear and say “some random idiot”; I’ve never had to repeat it more than once.
In-person conversations were less frustrating, but other English speakers hearing my previous domain name were usually fairly befuddled. On hearing “at some random idiot dot com,” though, most people smile or laugh a bit, which I enjoy. I haven’t yet had to present it in any kind of formal setting where I’m attempting to get people to take me seriously. I wonder whether totally-credible-trustworthy-human.com is available?
For reasons that don’t need exploring at this juncture, I decided to start reading through a bunch of papers on virtualization, and I thought I’d force myself to actually do it by publicly committing to blogging about them.
First on deck is Disco: Running Commodity Operating Systems on Scalable Multiprocessors, a paper from 1997 that itself “brings back an idea popular in the 1970s” — run a small virtualization layer between hardware and multiple virtual machines (referred to in the paper as a virtual machine monitor; “hypervisor” in more modern parlance). Disco was aimed at allowing software to take advantage of new hardware innovations without requiring huge changes in the operating system. I can speculate on a few reasons this paper’s first in the list:
if you have a systems background, most of it is intelligible with some brow-furrowing
it goes into a useful level of detail on the actual work of intercepting, rewriting, and optimizing host operating systems’ access to hardware resources
the authors went on to found VMware, a massively successful virtualization company
I read the paper intending to summarize it for this blog, but I got completely distracted by the paper’s motivation, which I found both interesting and unexpected.
I tried to have a short summary of my work ready for this conference. I generally told people something like “I did a bunch of testing on the Mirage network stack,” which while true, is not the most self-aggrandizing way to sum up what I’ve been up to this summer. “I leveraged existing solutions to provide a systematic process for randomized exploration of the potentially underspecified parser inputs,” I could’ve said (which wouldn’t have gotten me very far at this conference, but would fit in great at some others I’ve been to). Or maybe something like “I found a bunch of bugs in variable-length option parsing code with fuzz testing,” which is extremely accurate but also makes people’s eyes glaze over about halfway through.
Of course, you can just check my permanent record. That record does elide this blog, however, which is not an insignificant product of work:
How to Set the Evil Bit – generating, understanding, and using a TCP options-parsing bug to crash Mirage unikernels; an explanation of the underlying bounds checking problem that makes this bug potentially lead to data leakage
Well, I didn’t really make anything cool and new. I want to do that. I want to be able to make VPN tunnels from a unikernel running on an embedded ARM machine to a unikernel running somewhere in the public cloud and shove all of my traffic over them. I want to make a webapp that serves you the worst possible next Tetris piece given a certain game state. I want to replace my Linux Git backup host with a unikernel that squirrels everything away into an Irmin store. I want to figure out a patch for the bug @frioux reported earlier this week, at the very least!
In short, OPW round 8 is over, but I’m not done playing here yet.
Looks Like Fun; Can I Try?
OPW Round 9 has an application deadline of October 22, so there’s still loads of time to get in on that. For excellent advice on applying to and participating in OPW, check out Lita Cho’s excellent roundup, and if you still want more, there are a whole pile of excellent blogs aggregated at the Women in Free Software Planeteria. Many other interns are also posting retrospectives this week, so it’s an excellent time to go learn from other people’s mistakes.
In that spirit, here are some mistakes I made:
not bothering my mentors enough
not contacting the other OPW intern on my project
not asking enough questions on the mailing list and IRC
choosing a project that required a lot of domain knowledge that I already had, so I wasn’t forced to learn so much as if I’d chosen something else
beginning to address some ongoing mental health issues at the end of the round; I likely would’ve found many things to be easier had I done so earlier
A short note: someone very bravely began a conversation about mental health in the Hacker School community earlier this year. This conversation led me to two blog posts that have been extremely helpful, and perhaps you will find them to be helpful too:
Julia Evans, prolific blogger and rad person, gave me several kind comments on the “Why I Unikernel” posts (security, self-hosting). She also asked, quite reasonably, whether I’d written a high-level summary of how I host my blog from a unikernel. “No, but I should,” I said, and unlike most times I say I should do something, I actually did it.
Here’s the very-high-level overview:
use brain to generate content that some human, somewhere, might want to read (hardest step)
write all that stuff in Markdown
use Octopress to generate a static site from that Markdown
use Mirage to build a unikernel with the blog content
upload the unikernel to an EC2 instance running Linux
build a new EC2 instance from the uploaded unikernel
make sure that newly generated instance looks like my website with new content
shut down the Linux host that made the new EC2 instance
make somerandomidiot.com point to the new EC2 instance
kill the EC2 instance which previously served somerandomidiot.com