I’ve been talking about network address translation here for a while, including instructions on building your own NAT device with MirageOS. The library behind those posts, mirage-nat, went on to back talex5’s unikernel firewall for QubesOS, but was unreleased and essentially unmaintained between late 2015 and early 2017. At the March 2017 MirageOS hack retreat in Marrakesh, talex5 convinced me to do some much-needed maintenance on this library. After having let it age between March and October, I was persuaded to release a version with the hippest latest build system last week.
Crowbar is a tool that combines afl-persistent’s instrumentation with quickcheck-like property-based testing. afl-fuzz is a great tool for detecting crashes, but Crowbar helps us go a step farther and automatically discover inputs which cause our program to no longer have the properties we expect it to have. For reasons that don’t need exploring at this juncture, I first thought to apply Crowbar to charrua-client, a library which implements the DHCP state machine from a client perspective.
Say, for example, I have a static website (like this blog) that I build in MirageOS. I want to make some changes to the TCP implementation against which the blog is built. In order to do that, I need to do all the following:
- figure out which module to change
- figure out which package provides that module
- get the source for that package and instruct the package manager to use it instead of the release
- make changes
- reinstall the package with your changes
- rebuild the unikernel completely
- see whether changes had the desired effect
Here’s a quick primer on how.
As a result of great encouragement from colleagues and friends, I gave a few talks in September. Persistent Networking with Irmin and MirageOS, which I gave at the OCaml Workshop, is a talk on sticking a persistent database into various levels of the network stack. (It includes demonstrations from What a Distributed, Version-Controlled ARP Cache Gets You, as well as an Irmin-ified NAT device that I haven’t yet written up here.
Most instructions on how to get started with OCaml packages now advise the user to get started with opam, which is excellent advice. Getting up and running with opam is pretty easy, but I wasn’t sure where to go from there when I wanted to modify other people’s packages and use the modifications in my environment. I wish I’d realized that the documentation for making packages has a lot of applicable advice for that use case, as well as the apparent target (making your own packges from scratch).
git (and its distributed version control system friends
darcs) have some great properties. Not only do you get a full history of changes on objects stored in them, you can get comments on changes, as well as branching and merging, which lets you do intermediate changes without messing up state for other entities which want to work with the repository.
That’s all pretty cool. I actually want that for some of my data structures, come to think of it. Say, for example, a boring ol’ key-value store which can be updated from a few different threads – in this case, a cache that stores values it gets from the network and the querying/timeout code around it. It would be nice if each thread could make a new branch, make its changes, then merge them into the primary branch once it’s done.
It turns out you can totally do that with Irmin, “the database that never forgets”! I did (and am still doing) a bit of work on sticking a modified version of the MirageOS address resolution protocol code’s data structures into Irmin:
$ git log --all --decorate --oneline --graph * 68216f3 (HEAD, primary, expire_1429688434.645130) Arp.tick: updating to age out old entries * ec10c9a entry added: 192.168.3.1 -> 02:50:2a:16:6d:01 * 6446cef entry added: 10.20.254.2 -> 02:50:2a:16:6d:01 * 81cfa43 entry added: 10.50.20.22 -> 02:50:2a:16:6d:01 * 4e1e1c7 Arp.tick: merge expiry branch |\ | * cd787a0 (expire_1429688374.601896) Arp.tick: updating to age out old entries * | 8df2ef7 entry added: 10.23.10.1 -> 02:50:2a:16:6d:01 |/ * 8d11bba Arp.create: Initial empty cache
When last we spoke, I left you with a teaser about writing your own NAT implementation.
iptables (and friends
pf, to be a little less partisan and outdated) provide the interfaces to the kernel modules that implement NAT in many widely-used routers. If we wanted to implement our own in a traditional OS, we’d have to either take a big dive into kernel programming or find a way to manipulate packets at the Ethernet layer in userspace.
But if all we need to do is NAT traffic, why not just build something that only knows how to NAT traffic? I’ve looked at building networked applications on top of (and with) the full network stack provided by the MirageOS library OS a lot, but we can also build lower-level applications with fundamentally the same programming tactics and tools we use to write, for example, DNS resolvers.
Building A Typical Stack From Scratch
Let’s have a look at the
ethif-v4 example in the mirage-skeleton example repository. This example unikernel shows how to build a network stack “by hand” from a bunch of different functors, starting from a physical device (provided by
config.ml at build time, representing either a Xen backend if you configure with
mirage configure --xen or a Unix tuntap backend if you build with
mirage configure --unix). I’ve reproduced the network setup bits from the most recent version as of now and annotated them a bit:
Julia Evans, prolific blogger and rad person, gave me several kind comments on the “Why I Unikernel” posts (security, self-hosting). She also asked, quite reasonably, whether I’d written a high-level summary of how I host my blog from a unikernel. “No, but I should,” I said, and unlike most times I say I should do something, I actually did it.
Here’s the very-high-level overview:
- use brain to generate content that some human, somewhere, might want to read (hardest step)
- write all that stuff in Markdown
- use Octopress to generate a static site from that Markdown
- use Mirage to build a unikernel with the blog content
- upload the unikernel to an EC2 instance running Linux
- build a new EC2 instance from the uploaded unikernel
- make sure that newly generated instance looks like my website with new content
- shut down the Linux host that made the new EC2 instance
somerandomidiot.compoint to the new EC2 instance
- kill the EC2 instance which previously served
And below, one can find the gory details.
Having a machine capable of executing arbitrary instructions on the public Internet is a responsibility, and it’s a fairly heavy one to assume just to run a blog. Some people solve this by letting someone else take care of it – GitHub, Tumblr, or Medium, for example. I’m not so keen on that solution for a number of reasons, almost none of which are Internet-old-person crankery. First, and most emotionally: as dumb as my thoughts are, they’re mine.
Before I started this blog, I had started a few others at my other domain (now moribund). Despite repeated attempts, I never could resign myself to doing systems administration for a web server that executed dynamic code, like that which powers WordPress or Drupal; I’d install such a framework, begin locking the site down, realize that I’d spent a lot of time reassuring myself that the site was secure without believing it for a second, then delete the framework and revert the frontpage to an index.
It’s Northern Hemisphere summer right now, and in Wisconsin we’re having one of the loveliest ones I can remember. Today the temperature is hovering right at pleasant, there are high clouds blowing across the sky, the breeze is soothing, and birds are singing all over the place. It is not, in short, programming weather. It is sit-outside, read-a-novel, do-nothing weather.
We don’t often let our programs slack off, even when we let ourselves take a peaceful day. I got to wondering (staring off into space, watching the shadows cast by sun-dappled leaves) what the most trivial, do-nothing Mirage project would look like, and how it could be constructed with a minimum of activity and a maximum of understanding.
 dothraki@iBook:~$ mkdir trivial  dothraki@iBook:~$ cd trivial/  dothraki@iBook:~/trivial$ ls -alh total 16K drwxrwxr-x 2 dothraki dothraki 4.0K Jul 23 13:17 . drwxr-xr-x 161 dothraki dothraki 12K Jul 23 13:17 ..  dothraki@iBook:~/trivial$ mirage configure --xen [ERROR] No configuration file config.ml found. You'll need to create one to let Mirage know what to do.
Okay, we’ll have to do at least one thing to make this work. Mirage uses
config.ml to programmatically generate a
main.ml when you invoke
main.ml uses instructions from
config.ml to satisfy module types representing driver requirements for your application, then begins running the threads you requested that it run. That all sounds an awful lot like work; maybe we can get away with not asking for anything.
 dothraki@iBook:~/trivial$ touch config.ml  dothraki@iBook:~/trivial$ mirage configure --xen Mirage Using scanned config file: config.ml Mirage Processing: /home/dothraki/trivial/config.ml Mirage => rm -rf /home/dothraki/trivial/_build/config.* Mirage => cd /home/dothraki/trivial && ocamlbuild -use-ocamlfind -tags annot,bin_annot -pkg mirage config.cmxs empty Using configuration: /home/dothraki/trivial/config.ml empty 0 jobs  empty => ocamlfind printconf path empty Generating: main.ml empty Now run 'make depend' to install the package dependencies for this unikernel.  dothraki@iBook:~/trivial$ ls _build config.ml empty.xl log main.ml Makefile
That seems like a great start! Maybe we can trivially achieve our dream of doing nothing.
 dothraki@iBook:~/trivial$ make depend opam install mirage-xen --verbose [NOTE] Package mirage-xen is already installed (current version is 1.1.1).
Resting on our laurels. Excellent. (In keeping with the lazy theme of this post, I’ll elide the
make depend step from future examples, but if you’re playing along at home you may discover that you need to run it when you introduce new complexity in pursuit of perfect non-action.)
 dothraki@iBook:~/trivial$ make ocamlbuild -classic-display -use-ocamlfind -pkgs lwt.syntax,mirage-types.lwt -tags "syntax(camlp4o),annot,bin_annot,strict_sequence,principal" -cflag -g -lflags -g,-linkpkg,-dontlink,unix main.native.o ocamlfind ocamldep -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -modules main.ml > main.ml.depends ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml + ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml File "main.ml", line 8, characters 2-13: Error: Unbound module OS Command exited with code 2. make: *** [main.native.o] Error 10  dothraki@iBook:~/trivial$
Our mission: fuzzing TCP options from
Our target: the
echo service from
Outcome: a revision on a widely-used OCaml dependency, gleeful murder and resurrection of several EC2 instances, something to brag to my mom about, a look at a case worse than failure, and great justice.
What Even Is TCP Anyway
Here’s the lazy way of explaining it: TCP is the abstraction layer that allows you to pretend that network communication works in a logical, orderly, reliable fashion when you’re writing an application. Reading data and having it always be in the order it was sent? TCP. Being able to know whether a connection is open or closed? TCP. Knowing the difference between data coming from two separate processes on the same remote host? TCP. (There are other ways to get these guarantees, but the vast majority of Internet traffic that needs them gets them via TCP.)
On a less abstract level, TCP is a header (one of several!) that your operating system slaps on your network traffic before shipping it over the wire, on the way to its final destination. For damn near all the information on TCP you can shake a stick at, you can consult RFC 793 directly. The header summary, most relevant for our exploration, is reproduced below:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Everything here is a fixed-length field except for
data, all of which are optional.
Data is up to the application, when it’s present (and is also frequently referred to as
payload). When you loaded this web page, TCP packets were sent from my server at
somerandomidiot.com to your computer, and the contents of the
data field were these very words that you’re reading right now. TCP is
data-agnostic; it only cares that your payload arrives intact, not what’s in it.
Options, on the other hand, are very much TCP’s concern.
Looking into some of the results from last week’s fuzzing session, I noticed something interesting:
$ tcpdump -r experimenting_with_pathoc.pcap 'src host 192.168.2.24 and tcp & 1 != 0' reading from file experimenting_with_pathoc.pcap, link-type EN10MB (Ethernet) $
Let’s translate that into human.
tcpdump -r experimenting_with_pathoc.pcap: use tcpdump to read an existing packet trace named
src host 192.168.2.24: show me only packets that were sent by
192.168.2.24, which is the IP address of a running unikernel that’s serving web pages on port 80.
and tcp & 1 != 0: of the packets sent by
192.168.2.24, show me only those where the least significant bit of the 13th byte of the TCP header was not zero. The 13th byte of the TCP header is designated for flags relevant to how the packet should be processed by the TCP state machine, and the least significant bit corresponds to the
FINflag, used to initiate graceful connection closures.
All together, “show me all the packets sent by 192.168.2.24 which initiated a graceful connection closure.”
tcpdump helpfully shows us… all zero such packets in the trace.
This isn’t necessarily wrong for a webserver implementing HTTP/1.1, which defaults to persistent connections:
8.1.2 Overall Operation
A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server. – RFC 2616
So let’s make something that will try to initiate a connection closure.
Officially, my job for the summer is to help shore up the network stack in Mirage, in part by running the current code through its paces, and in part through implementing some new functionality. This first week, I continued some work I started at the end of Hacker School - figuring out how to fuzz some strange (and not-so-strange) corners, and how to wrangle the data I got out of doing so.
Fuzz What Now?
Let’s step back. Way, way, way back.
If you’re a computer program, and you have some data that you care about, your data is likely in some kind of structure reflecting an underlying order to that data. Objects are a common way to organize this stuff; dictionaries, hashmaps, lists, arrays, trees, the list goes on. That’s all well and good when your program is running, keeping all this stuff in memory. But it happens depressingly often that you need to dump this stuff to permanent storage, or express it in some way to some other program or another computer, or represent it on the screen because something awful has happened, and you can’t just say “
memory address 0x52413abd,
memory address 0x52413cda,
memory address 0x52413ea2” - these things are meaningless outside the context of the current run of that program.
So we have serialization, the high-level concept for the jillion different ways to take that data and put it in a string, or a binary data format, so something else can read that string and reassemble the structure of the data. That’s deserialization, which implies parsing; parsing is a pretty big deal.
When the data you’re attempting to assemble into a structure is as you expect it and everything is correct, parsing’s no problem. But it frequently happens that everything is not as you expect it, for any number of reasons - the programmer who made the program that made the message made a mistake; the programmer who made the program that reads the message made a mistake; the programs reading and writing the message are using different versions of the specification in the first place; the specification wasn’t specific about whether the third byte’s range from 0 to 5 was inclusive or exclusive and each programmer made a different decision; both programs agree, but the message was corrupted in transit; the message was corrupted in transit, and one program has implemented a different corruption recovery algorithm than the other… I’ll stop now, but I could keep going for a long time.
There are a lot of bad messages out there. It’s hard to make your parser do the right thing when it receives an arbitrary bad message. It can be hard to even know that your parser does the wrong thing when it receives an arbitrary bad message - if you thought of a certain kind of bad message to use in testing, of course you fixed your code to deal with it; you thought of it! But there are almost certainly loads more bad messages out there than the ones you thought of - both by chance, and by design.
If humans can’t make enough bad messages, maybe computers can. Randomly generating a whole mess of bad messages, sending them to your program, and seeing what happens is called fuzz testing, and it’s awesome.
$ ec2-get-console-output --region the-best-region i-0123abcd 2014-04-03T16:42:58+0000 Xen Minimal OS!
In that time, I’ve done some stuff:
- submitted a successful pull request to Mirage
- made a permanent home for Secret Project Glow Cloud’s code
- successfully applied to the Outreach Program for Women to work on Mirage some more
- broke and fixed my OCaml development environment repeatedly
- sang in public in front of other Hacker Schoolers
- looked at a lot of neat objects and buildings
- started work on a fuzzing framework to scratch my own itch for testing network clients
I figured it was time to tell you about some of it, but first I did some other stuff:
- upgraded some packages on my build machine
- broke the build on my blog
- learned about how Mirage makefiles are generated by trying to get mine working again
You’d rather hear about all of that, right?
“This is a pretty strange piece of code, and it may take a few moments of thought to figure out what’s going on.”
– Real World OCaml
A few weeks ago, fellow Hacker Schooler Chen Lin and I were trying to solve a simple graph problem in Haskell. I was all ready to charge forward with something quite like the Java implementation I learned back in undergrad, but my fellow Hacker Schooler had some hesitation around whether this kind of structure would work in Haskell.
After a little bit of Googling, I found out that the canonical solution in Haskell involves something intriguingly dubbed tying the knot. I stared blankly at this HaskellWiki page with my fellow Hacker Schooler, trying to understand it quickly enough to have a useful conversation about it, and failed. We threw a couple of other ideas around and then decided to both pursue other projects. I moved on, Chen moved on, and I’m not sure either of us thought much about it…
…until yesterday, when I ran into tying the knot again. This time, it was hiding deep within (of all things!) the chapter on imperative programming in Real World OCaml, and I was unhurried and determined. “Abstract concept, I am going to understand you so hard,” I thought, jaw set.
When last we left our hero, I was strugging valiantly to get a Mirage unikernel version of this blog running on Amazon EC2. All unikernels built and shipped off to EC2 would begin booting, but never become pingable or reachable on TCP port 80.
ec2-get-console-output on any instance running a Mirage unikernel would show the beginning stages of a DHCP transaction, then the disappointing
RX exn Invalid_argument("String.sub"), then… silence.
When all you had for many years was a hammer, stuff is still going to look an awful lot like nails to you, even if it’s pretty distinctly screw-shaped. I wanted to take a packet trace of this transaction pretty badly. I could do three things that were almost like this:
- get a packet trace of another machine getting a DHCP lease on EC2
- get a packet trace of a unikernel getting a DHCP lease on my local Xen server
- print out an awful lot of diagnostic data from the EC2 unikernel and read it from the console
Trying to draw some conclusions from the first option above led me down the wrong path for about a day or so. I did manage to cause the DHCP client to fail on my local Xen server by sending a DHCP reply packet with no
server-identifier set, using
scapy and some hackery to cause the
xid to always match:
I left off last time telling you about getting Mirage to not work. I’m still working hard to get this blog – yes, this one you’re reading now – up and running as a unikernel on EC2.
It became clear to me last week that I needed to fork my own instance of the
mirage-tcpip repository and compile my kernels with it, if I were to make any progress in debugging the DHCP problems I was having. A few naive attempts to monkey with version of
mirage-tcpip downloaded by
opam weren’t successful, so I set about to figure out how actual OCaml developers develop in OCaml with
First stop: the opam documentation on doing tricky things. This is a little short of a step-by-step “do this, dorp” guide, unfortunately; here’s what I end up doing, and it sorta seems to work.
A week or so ago, I heard about the Mirage project, a library OS project that makes tiny virtual machines running on top of Xen to run a given application, and do nothing else. I was intrigued, and started working through the excellent intro documentation, and got to the point where I wanted to replace my ho-hum statically-compiled blog hosted from Ubuntu LTS with a unikernel that would serve my static site and do nothing else.
There are excellent instructions on doing this with a Jekyll site on Amir Chaudhry’s blog. Octopress, which I use to generate this site, is built on top of Jekyll, and I only had a few extra goodies to throw in before I was able to make a unikernel that would run my blog with a few
rake invocations. After getting the first unikernel up and running via Xen on my laptop, I entertained myself by throwing a few
nmap commands at it; I was particularly curious to see whether my unikernel knew what to do with UDP packets:
sudo nmap -sO 192.168.2.13 Starting Nmap 6.40 ( http://nmap.org ) at 2014-03-14 23:26 EDT Nmap scan report for 192.168.2.13 Host is up (0.00037s latency). Not shown: 254 open|filtered protocols PROTOCOL STATE SERVICE 1 open icmp 6 open tcp MAC Address: 00:16:3E:53:E0:1B (Xensource) Nmap done: 1 IP address (1 host up) scanned in 17.72 seconds
Hee hee hee.