Mirage

In honor of the transnational strike on Amazon this week, here are instructions for moving your AWS unikernels to a cloud that used to claim it wasn’t evil. You might also be interested in establishing a picket line for your packets.

This blog originally ran on Amazon EC2. Since early 2017, it’s been running on a different tech behemoth’s massive public cloud. The deployment process is considerably easier and faster on this alternative public cloud – I first saw it as a live demo given by Michael Bright and immediately knew I wanted to replace my AWS pipeline with it. My AWS unikernel deployments required a secondary Linux host for building AMIs from a kernel image and usually took around 20 minutes from start to finish; GCP deployments can be done from my development host and take around 90 seconds.

MirageOS is a collection of libraries and a system for assembling them into unikernels. What happens if you want to make changes to those libraries and test them with a new unikernel?

Say, for example, I have a static website (like this blog) that I build in MirageOS. I want to make some changes to the TCP implementation against which the blog is built. In order to do that, I need to do all the following:

figure out which module to change
figure out which package provides that module
get the source for that package and instruct the package manager to use it instead of the release
make changes
reinstall the package with your changes
rebuild the unikernel completely
see whether changes had the desired effect

Here’s a quick primer on how.

As a result of great encouragement from colleagues and friends, I gave a few talks in September.

Persistent Networking with Irmin and MirageOS, which I gave at the OCaml Workshop, is a talk on sticking a persistent database into various levels of the network stack. (It includes demonstrations from What a Distributed, Version-Controlled ARP Cache Gets You, as well as an Irmin-ified NAT device that I haven’t yet written up here.) The slides for my OCaml Workshop talk are also available.

git (and its distributed version control system friends hg and darcs) have some great properties. Not only do you get a full history of changes on objects stored in them, you can get comments on changes, as well as branching and merging, which lets you do intermediate changes without messing up state for other entities which want to work with the repository.

That’s all pretty cool. I actually want that for some of my data structures, come to think of it. Say, for example, a boring ol’ key-value store which can be updated from a few different threads – in this case, a cache that stores values it gets from the network and the querying/timeout code around it. It would be nice if each thread could make a new branch, make its changes, then merge them into the primary branch once it’s done.

It turns out you can totally do that with Irmin, “the database that never forgets”! I did (and am still doing) a bit of work on sticking a modified version of the MirageOS address resolution protocol code’s data structures into Irmin:

$ git log --all --decorate --oneline --graph
* 68216f3 (HEAD, primary, expire_1429688434.645130) Arp.tick: updating to age out old entries
* ec10c9a entry added: 192.168.3.1 -> 02:50:2a:16:6d:01
* 6446cef entry added: 10.20.254.2 -> 02:50:2a:16:6d:01
* 81cfa43 entry added: 10.50.20.22 -> 02:50:2a:16:6d:01
*   4e1e1c7 Arp.tick: merge expiry branch
|\  
| * cd787a0 (expire_1429688374.601896) Arp.tick: updating to age out old entries
* | 8df2ef7 entry added: 10.23.10.1 -> 02:50:2a:16:6d:01
|/  
* 8d11bba Arp.create: Initial empty cache

When last we spoke, I left you with a teaser about writing your own NAT implementation. iptables (and friends nftables and pf, to be a little less partisan and outdated) provide the interfaces to the kernel modules that implement NAT in many widely-used routers. If we wanted to implement our own in a traditional OS, we’d have to either take a big dive into kernel programming or find a way to manipulate packets at the Ethernet layer in userspace.

But if all we need to do is NAT traffic, why not just build something that only knows how to NAT traffic? I’ve looked at building networked applications on top of (and with) the full network stack provided by the MirageOS library OS a lot, but we can also build lower-level applications with fundamentally the same programming tactics and tools we use to write, for example, DNS resolvers.

Building A Typical Stack From Scratch

Let’s have a look at the ethif-v4 example in the mirage-skeleton example repository. This example unikernel shows how to build a network stack “by hand” from a bunch of different functors, starting from a physical device (provided by config.ml at build time, representing either a Xen backend if you configure with mirage configure --xen or a Unix tuntap backend if you build with mirage configure --unix). I’ve reproduced the network setup bits from the most recent version as of now and annotated them a bit:

Julia Evans, prolific blogger and rad person, gave me several kind comments on the “Why I Unikernel” posts (security, self-hosting). She also asked, quite reasonably, whether I’d written a high-level summary of how I host my blog from a unikernel. “No, but I should,” I said, and unlike most times I say I should do something, I actually did it.

Here’s the very-high-level overview:

use brain to generate content that some human, somewhere, might want to read (hardest step)
write all that stuff in Markdown
use Octopress to generate a static site from that Markdown
use Mirage to build a unikernel with the blog content
upload the unikernel to an EC2 instance running Linux
build a new EC2 instance from the uploaded unikernel
make sure that newly generated instance looks like my website with new content
shut down the Linux host that made the new EC2 instance
make somerandomidiot.com point to the new EC2 instance
kill the EC2 instance which previously served somerandomidiot.com

And below, one can find the gory details.

Having a machine capable of executing arbitrary instructions on the public Internet is a responsibility, and it’s a fairly heavy one to assume just to run a blog. Some people solve this by letting someone else take care of it – GitHub, Tumblr, or Medium, for example. I’m not so keen on that solution for a number of reasons, almost none of which are Internet-old-person crankery.

First, and most emotionally: as dumb as my thoughts are, they’re mine. Not GitHub’s or Medium’s or any other group’s. Most entities on the web don’t host user content out of the goodness of their heart; they’re getting something out of it, and it’s likely that they’re getting more out of it than the user is. I’m reminded of the old MetaFilter maxim: “If you’re not paying for it, you’re not the consumer, you’re the product.” Either someone’s making money off of you now or they plan to do it later. I don’t want to encourage that kind of behavior. I just want to write things that people can read about how to make stuff work.

Before I started this blog, I had started a few others at my other domain (now moribund). Despite repeated attempts, I never could resign myself to doing systems administration for a web server that executed dynamic code, like that which powers WordPress or Drupal; I’d install such a framework, begin locking the site down, realize that I’d spent a lot of time reassuring myself that the site was secure without believing it for a second, then delete the framework and revert the frontpage to an index.html rather like what’s present there now. Particularly ambitious iterations would get a post or two published before this cycle completed, now long-vanished.

It’s Northern Hemisphere summer right now, and in Wisconsin we’re having one of the loveliest ones I can remember. Today the temperature is hovering right at pleasant, there are high clouds blowing across the sky, the breeze is soothing, and birds are singing all over the place. It is not, in short, programming weather. It is sit-outside, read-a-novel, do-nothing weather.

Sunbeams stream through the leaves of a large tree, beneath which is a bicycle. — Yes, this sort of thing.

We don’t often let our programs slack off, even when we let ourselves take a peaceful day. I got to wondering (staring off into space, watching the shadows cast by sun-dappled leaves) what the most trivial, do-nothing Mirage project would look like, and how it could be constructed with a minimum of activity and a maximum of understanding.

[] dothraki@iBook:~$ mkdir trivial
[] dothraki@iBook:~$ cd trivial/
[] dothraki@iBook:~/trivial$ ls -alh
total 16K
drwxrwxr-x   2 dothraki dothraki 4.0K Jul 23 13:17 .
drwxr-xr-x 161 dothraki dothraki  12K Jul 23 13:17 ..
[] dothraki@iBook:~/trivial$ mirage configure --xen
[ERROR]      No configuration file config.ml found.
You'll need to create one to let Mirage know what to do.

Okay, we’ll have to do at least one thing to make this work. Mirage uses config.ml to programmatically generate a Makefile and main.ml when you invoke mirage --configure. main.ml uses instructions from config.ml to satisfy module types representing driver requirements for your application, then begins running the threads you requested that it run. That all sounds an awful lot like work; maybe we can get away with not asking for anything.

[] dothraki@iBook:~/trivial$ touch config.ml
[] dothraki@iBook:~/trivial$ mirage configure --xen
Mirage      Using scanned config file: config.ml
Mirage      Processing: /home/dothraki/trivial/config.ml
Mirage      => rm -rf /home/dothraki/trivial/_build/config.*
Mirage      => cd /home/dothraki/trivial && ocamlbuild -use-ocamlfind -tags annot,bin_annot -pkg mirage config.cmxs
empty       Using configuration: /home/dothraki/trivial/config.ml
empty       0 jobs []
empty       => ocamlfind printconf path
empty       Generating: main.ml
empty       Now run 'make depend' to install the package dependencies for this unikernel.
[] dothraki@iBook:~/trivial$ ls
_build  config.ml  empty.xl  log  main.ml  Makefile

That seems like a great start! Maybe we can trivially achieve our dream of doing nothing.

[] dothraki@iBook:~/trivial$ make depend
opam install mirage-xen --verbose
[NOTE] Package mirage-xen is already installed (current version is 1.1.1).

Resting on our laurels. Excellent. (In keeping with the lazy theme of this post, I’ll elide the make depend step from future examples, but if you’re playing along at home you may discover that you need to run it when you introduce new complexity in pursuit of perfect non-action.)

[] dothraki@iBook:~/trivial$ make
ocamlbuild -classic-display -use-ocamlfind -pkgs lwt.syntax,mirage-types.lwt -tags "syntax(camlp4o),annot,bin_annot,strict_sequence,principal" -cflag -g -lflags -g,-linkpkg,-dontlink,unix main.native.o
ocamlfind ocamldep -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -modules main.ml > main.ml.depends
ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml
+ ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml
File "main.ml", line 8, characters 2-13:
Error: Unbound module OS
Command exited with code 2.
make: *** [main.native.o] Error 10
[] dothraki@iBook:~/trivial$

Oh, bother.

Our mission: fuzzing TCP options from scapy.

Our target: the echo service from mirage-tcpip/examples/services.ml.

Outcome: a revision on a widely-used OCaml dependency, gleeful murder and resurrection of several EC2 instances, something to brag to my mom about, a look at a case worse than failure, and great justice.

Mirage

Ditch That AWS Build Host

A Quick Guide to Quick Changes in MirageOS

OCaml Workshop and Strange Loop Talks

What a Distributed, Version-Controlled ARP Cache Gets You

Let's Play Network Address Translation: The Home Game

Building A Typical Stack From Scratch

I Am Unikernel (And So Can You!)

My Content is Mine: Why I Unikernel, Part 2

Attack Surface: Why I Unikernel, Part 1

Doing Nothing in Mirage

Yes, this sort of thing.

How to Set the Evil Bit