I’ve been talking about network address translation here for a while, including instructions on building your own NAT device with MirageOS. The library behind those posts, mirage-nat, went on to back talex5’s unikernel firewall for QubesOS, but was unreleased and essentially unmaintained between late 2015 and early 2017. At the March 2017 MirageOS hack retreat in Marrakesh, talex5 convinced me to do some much-needed maintenance on this library. After having let it age between March and October, I was persuaded to release a version with the hippest latest build system last week.
Crowbar is a tool that combines afl-persistent’s instrumentation with quickcheck-like property-based testing. afl-fuzz is a great tool for detecting crashes, but Crowbar helps us go a step farther and automatically discover inputs which cause our program to no longer have the properties we expect it to have. For reasons that don’t need exploring at this juncture, I first thought to apply Crowbar to charrua-client, a library which implements the DHCP state machine from a client perspective.
Most instructions on how to get started with OCaml packages now advise the user to get started with opam, which is excellent advice. Getting up and running with opam is pretty easy, but I wasn’t sure where to go from there when I wanted to modify other people’s packages and use the modifications in my environment. I wish I’d realized that the documentation for making packages has a lot of applicable advice for that use case, as well as the apparent target (making your own packges from scratch).
It’s Northern Hemisphere summer right now, and in Wisconsin we’re having one of the loveliest ones I can remember. Today the temperature is hovering right at pleasant, there are high clouds blowing across the sky, the breeze is soothing, and birds are singing all over the place. It is not, in short, programming weather. It is sit-outside, read-a-novel, do-nothing weather.
We don’t often let our programs slack off, even when we let ourselves take a peaceful day. I got to wondering (staring off into space, watching the shadows cast by sun-dappled leaves) what the most trivial, do-nothing Mirage project would look like, and how it could be constructed with a minimum of activity and a maximum of understanding.
 dothraki@iBook:~$ mkdir trivial  dothraki@iBook:~$ cd trivial/  dothraki@iBook:~/trivial$ ls -alh total 16K drwxrwxr-x 2 dothraki dothraki 4.0K Jul 23 13:17 . drwxr-xr-x 161 dothraki dothraki 12K Jul 23 13:17 ..  dothraki@iBook:~/trivial$ mirage configure --xen [ERROR] No configuration file config.ml found. You'll need to create one to let Mirage know what to do.
Okay, we’ll have to do at least one thing to make this work. Mirage uses
config.ml to programmatically generate a
main.ml when you invoke
main.ml uses instructions from
config.ml to satisfy module types representing driver requirements for your application, then begins running the threads you requested that it run. That all sounds an awful lot like work; maybe we can get away with not asking for anything.
 dothraki@iBook:~/trivial$ touch config.ml  dothraki@iBook:~/trivial$ mirage configure --xen Mirage Using scanned config file: config.ml Mirage Processing: /home/dothraki/trivial/config.ml Mirage => rm -rf /home/dothraki/trivial/_build/config.* Mirage => cd /home/dothraki/trivial && ocamlbuild -use-ocamlfind -tags annot,bin_annot -pkg mirage config.cmxs empty Using configuration: /home/dothraki/trivial/config.ml empty 0 jobs  empty => ocamlfind printconf path empty Generating: main.ml empty Now run 'make depend' to install the package dependencies for this unikernel.  dothraki@iBook:~/trivial$ ls _build config.ml empty.xl log main.ml Makefile
That seems like a great start! Maybe we can trivially achieve our dream of doing nothing.
 dothraki@iBook:~/trivial$ make depend opam install mirage-xen --verbose [NOTE] Package mirage-xen is already installed (current version is 1.1.1).
Resting on our laurels. Excellent. (In keeping with the lazy theme of this post, I’ll elide the
make depend step from future examples, but if you’re playing along at home you may discover that you need to run it when you introduce new complexity in pursuit of perfect non-action.)
 dothraki@iBook:~/trivial$ make ocamlbuild -classic-display -use-ocamlfind -pkgs lwt.syntax,mirage-types.lwt -tags "syntax(camlp4o),annot,bin_annot,strict_sequence,principal" -cflag -g -lflags -g,-linkpkg,-dontlink,unix main.native.o ocamlfind ocamldep -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -modules main.ml > main.ml.depends ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml + ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml File "main.ml", line 8, characters 2-13: Error: Unbound module OS Command exited with code 2. make: *** [main.native.o] Error 10  dothraki@iBook:~/trivial$
Our mission: fuzzing TCP options from
Our target: the
echo service from
Outcome: a revision on a widely-used OCaml dependency, gleeful murder and resurrection of several EC2 instances, something to brag to my mom about, a look at a case worse than failure, and great justice.
What Even Is TCP Anyway
Here’s the lazy way of explaining it: TCP is the abstraction layer that allows you to pretend that network communication works in a logical, orderly, reliable fashion when you’re writing an application. Reading data and having it always be in the order it was sent? TCP. Being able to know whether a connection is open or closed? TCP. Knowing the difference between data coming from two separate processes on the same remote host? TCP. (There are other ways to get these guarantees, but the vast majority of Internet traffic that needs them gets them via TCP.)
On a less abstract level, TCP is a header (one of several!) that your operating system slaps on your network traffic before shipping it over the wire, on the way to its final destination. For damn near all the information on TCP you can shake a stick at, you can consult RFC 793 directly. The header summary, most relevant for our exploration, is reproduced below:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Everything here is a fixed-length field except for
data, all of which are optional.
Data is up to the application, when it’s present (and is also frequently referred to as
payload). When you loaded this web page, TCP packets were sent from my server at
somerandomidiot.com to your computer, and the contents of the
data field were these very words that you’re reading right now. TCP is
data-agnostic; it only cares that your payload arrives intact, not what’s in it.
Options, on the other hand, are very much TCP’s concern.
“This is a pretty strange piece of code, and it may take a few moments of thought to figure out what’s going on.”
– Real World OCaml
A few weeks ago, fellow Hacker Schooler Chen Lin and I were trying to solve a simple graph problem in Haskell. I was all ready to charge forward with something quite like the Java implementation I learned back in undergrad, but my fellow Hacker Schooler had some hesitation around whether this kind of structure would work in Haskell.
After a little bit of Googling, I found out that the canonical solution in Haskell involves something intriguingly dubbed tying the knot. I stared blankly at this HaskellWiki page with my fellow Hacker Schooler, trying to understand it quickly enough to have a useful conversation about it, and failed. We threw a couple of other ideas around and then decided to both pursue other projects. I moved on, Chen moved on, and I’m not sure either of us thought much about it…
…until yesterday, when I ran into tying the knot again. This time, it was hiding deep within (of all things!) the chapter on imperative programming in Real World OCaml, and I was unhurried and determined. “Abstract concept, I am going to understand you so hard,” I thought, jaw set.
When last we left our hero, I was strugging valiantly to get a Mirage unikernel version of this blog running on Amazon EC2. All unikernels built and shipped off to EC2 would begin booting, but never become pingable or reachable on TCP port 80.
ec2-get-console-output on any instance running a Mirage unikernel would show the beginning stages of a DHCP transaction, then the disappointing
RX exn Invalid_argument("String.sub"), then… silence.
When all you had for many years was a hammer, stuff is still going to look an awful lot like nails to you, even if it’s pretty distinctly screw-shaped. I wanted to take a packet trace of this transaction pretty badly. I could do three things that were almost like this:
- get a packet trace of another machine getting a DHCP lease on EC2
- get a packet trace of a unikernel getting a DHCP lease on my local Xen server
- print out an awful lot of diagnostic data from the EC2 unikernel and read it from the console
Trying to draw some conclusions from the first option above led me down the wrong path for about a day or so. I did manage to cause the DHCP client to fail on my local Xen server by sending a DHCP reply packet with no
server-identifier set, using
scapy and some hackery to cause the
xid to always match:
I left off last time telling you about getting Mirage to not work. I’m still working hard to get this blog – yes, this one you’re reading now – up and running as a unikernel on EC2.
It became clear to me last week that I needed to fork my own instance of the
mirage-tcpip repository and compile my kernels with it, if I were to make any progress in debugging the DHCP problems I was having. A few naive attempts to monkey with version of
mirage-tcpip downloaded by
opam weren’t successful, so I set about to figure out how actual OCaml developers develop in OCaml with
First stop: the opam documentation on doing tricky things. This is a little short of a step-by-step “do this, dorp” guide, unfortunately; here’s what I end up doing, and it sorta seems to work.
A week or so ago, I heard about the Mirage project, a library OS project that makes tiny virtual machines running on top of Xen to run a given application, and do nothing else. I was intrigued, and started working through the excellent intro documentation, and got to the point where I wanted to replace my ho-hum statically-compiled blog hosted from Ubuntu LTS with a unikernel that would serve my static site and do nothing else.
There are excellent instructions on doing this with a Jekyll site on Amir Chaudhry’s blog. Octopress, which I use to generate this site, is built on top of Jekyll, and I only had a few extra goodies to throw in before I was able to make a unikernel that would run my blog with a few
rake invocations. After getting the first unikernel up and running via Xen on my laptop, I entertained myself by throwing a few
nmap commands at it; I was particularly curious to see whether my unikernel knew what to do with UDP packets:
sudo nmap -sO 192.168.2.13 Starting Nmap 6.40 ( http://nmap.org ) at 2014-03-14 23:26 EDT Nmap scan report for 192.168.2.13 Host is up (0.00037s latency). Not shown: 254 open|filtered protocols PROTOCOL STATE SERVICE 1 open icmp 6 open tcp MAC Address: 00:16:3E:53:E0:1B (Xensource) Nmap done: 1 IP address (1 host up) scanned in 17.72 seconds
Hee hee hee.
Robot Finds Kitten
Sometime way back in the past, a human who wanted me to feel joy introduced me to Robot Finds Kitten, a Zen simulation which is pretty close to exactly what it says on the tin. There are already quite a lot of ports of the original POSIX implementation, but none of them were written in Elm. Obviously this is a problem that needs fixing.
I got a really wonderful introduction to Elm when Evan Czaplicki came to Hacker School in our second week. We got a slightly adapted version of this talk from StrangeLoop 2013, which moved me to make a browser game (something I’ve never wanted to do at any previous point in life). The language seemed elegant and expressive, for lack of less cliched words, and I thought it might be relatively simple to make a succinct Robot Finds Kitten clone.
I promised a lot of people that I would let them know how Hacker School is. It’s difficult for me to answer this question (although Julia Evans, a previous batch process, has done a fantastic job), both because it feels so early in the batch and because Hacker School is a lot of things. I also promised a lot of people that I would let them know how New York is, and that’s a little easier, so I’ll start there and then move on.
I got my acceptance notification for the winter 2014 batch of Hacker School on January 3rd, six weeks ago. Right after being accepted, I wrote a bit in the same directory where I’d saved my application answers: