git (and its distributed version control system friends
darcs) have some great properties. Not only do you get a full history of changes on objects stored in them, you can get comments on changes, as well as branching and merging, which lets you do intermediate changes without messing up state for other entities which want to work with the repository.
That’s all pretty cool. I actually want that for some of my data structures, come to think of it. Say, for example, a boring ol’ key-value store which can be updated from a few different threads – in this case, a cache that stores values it gets from the network and the querying/timeout code around it. It would be nice if each thread could make a new branch, make its changes, then merge them into the primary branch once it’s done.
It turns out you can totally do that with Irmin, “the database that never forgets”! I did (and am still doing) a bit of work on sticking a modified version of the MirageOS address resolution protocol code’s data structures into Irmin:
$ git log --all --decorate --oneline --graph
* 68216f3 (HEAD, primary, expire_1429688434.645130) Arp.tick: updating to age out old entries
* ec10c9a entry added: 192.168.3.1 -> 02:50:2a:16:6d:01
* 6446cef entry added: 10.20.254.2 -> 02:50:2a:16:6d:01
* 81cfa43 entry added: 10.50.20.22 -> 02:50:2a:16:6d:01
* 4e1e1c7 Arp.tick: merge expiry branch
| * cd787a0 (expire_1429688374.601896) Arp.tick: updating to age out old entries
* | 8df2ef7 entry added: 10.23.10.1 -> 02:50:2a:16:6d:01
* 8d11bba Arp.create: Initial empty cache
When last we spoke, I left you with a teaser about writing your own NAT implementation.
iptables (and friends
pf, to be a little less partisan and outdated) provide the interfaces to the kernel modules that implement NAT in many widely-used routers. If we wanted to implement our own in a traditional OS, we’d have to either take a big dive into kernel programming or find a way to manipulate packets at the Ethernet layer in userspace.
But if all we need to do is NAT traffic, why not just build something that only knows how to NAT traffic? I’ve looked at building networked applications on top of (and with) the full network stack provided by the MirageOS library OS a lot, but we can also build lower-level applications with fundamentally the same programming tactics and tools we use to write, for example, DNS resolvers.
Building A Typical Stack From Scratch
Let’s have a look at the
ethif-v4 example in the mirage-skeleton example repository. This example unikernel shows how to build a network stack “by hand” from a bunch of different functors, starting from a physical device (provided by
config.ml at build time, representing either a Xen backend if you configure with
mirage configure --xen or a Unix tuntap backend if you build with
mirage configure --unix). I’ve reproduced the network setup bits from the most recent version as of now and annotated them a bit:
WiFi is fairly ubiquitous in 2015. In most of the nonprofessional contexts in which we use it, it’s provided by a small box that’s plugged into mains power and an Ethernet cable, usually with an antenna or two sticking out of it. I’ve heard these boxes called all kinds of things - hotspots, middleboxes, edge routers, home routers, NAT devices, gateways, and probably a few more I’ve forgotten; there are surely more I haven’t heard.
My first interesting job was as a student systems administrator for a fairly heterogenous group of UNIX servers. For the first many months, I was essentially a clever interface to an array of search engines. I came to have a great appreciation for the common phenomenon of a detailed solution to a very specific problem, laid out beautifully in the personal site of someone I’d never met. I answered a lot of “how on Earth did you figure that out?
For reasons that don’t need exploring at this juncture, I decided to start reading through a bunch of papers on virtualization, and I thought I’d force myself to actually do it by publicly committing to blogging about them.
First on deck is Disco: Running Commodity Operating Systems on Scalable Multiprocessors, a paper from 1997 that itself “brings back an idea popular in the 1970s” – run a small virtualization layer between hardware and multiple virtual machines (referred to in the paper as a virtual machine monitor; “hypervisor” in more modern parlance). Disco was aimed at allowing software to take advantage of new hardware innovations without requiring huge changes in the operating system. I can speculate on a few reasons this paper’s first in the list:
- if you have a systems background, most of it is intelligible with some brow-furrowing
- it goes into a useful level of detail on the actual work of intercepting, rewriting, and optimizing host operating systems’ access to hardware resources
- the authors went on to found VMware, a massively successful virtualization company
I read the paper intending to summarize it for this blog, but I got completely distracted by the paper’s motivation, which I found both interesting and unexpected.