Programming

NAT your own packets

I’ve been talking about network address translation here for a while, including instructions on building your own NAT device with MirageOS. The library behind those posts, mirage-nat, went on to back talex5’s unikernel firewall for QubesOS, but was unreleased and essentially unmaintained between late 2015 and early 2017. At the March 2017 MirageOS hack retreat in Marrakesh, talex5 convinced me to do some much-needed maintenance on this library. After having let it age between March and October, I was persuaded to release a version with the hippest latest build system last week.

Crowbar Your Favorite Library for Fun and Bugfixes

Crowbar is a tool that combines afl-persistent’s instrumentation with quickcheck-like property-based testing. afl-fuzz is a great tool for detecting crashes, but Crowbar helps us go a step farther and automatically discover inputs which cause our program to no longer have the properties we expect it to have. For reasons that don’t need exploring at this juncture, I first thought to apply Crowbar to charrua-client, a library which implements the DHCP state machine from a client perspective.

Fun with Opam: Advice to my Past Self

Most instructions on how to get started with OCaml packages now advise the user to get started with opam, which is excellent advice. Getting up and running with opam is pretty easy, but I wasn’t sure where to go from there when I wanted to modify other people’s packages and use the modifications in my environment. I wish I’d realized that the documentation for making packages has a lot of applicable advice for that use case, as well as the apparent target (making your own packges from scratch).

Doing Nothing in Mirage

It’s Northern Hemisphere summer right now, and in Wisconsin we’re having one of the loveliest ones I can remember. Today the temperature is hovering right at pleasant, there are high clouds blowing across the sky, the breeze is soothing, and birds are singing all over the place. It is not, in short, programming weather. It is sit-outside, read-a-novel, do-nothing weather.

Sunbeams stream through the leaves of a large tree, beneath which is a bicycle.

Yes, this sort of thing.

We don’t often let our programs slack off, even when we let ourselves take a peaceful day. I got to wondering (staring off into space, watching the shadows cast by sun-dappled leaves) what the most trivial, do-nothing Mirage project would look like, and how it could be constructed with a minimum of activity and a maximum of understanding.

[] dothraki@iBook:~$ mkdir trivial
[] dothraki@iBook:~$ cd trivial/
[] dothraki@iBook:~/trivial$ ls -alh
total 16K
drwxrwxr-x   2 dothraki dothraki 4.0K Jul 23 13:17 .
drwxr-xr-x 161 dothraki dothraki  12K Jul 23 13:17 ..
[] dothraki@iBook:~/trivial$ mirage configure --xen
[ERROR]      No configuration file config.ml found.
You'll need to create one to let Mirage know what to do.

Okay, we’ll have to do at least one thing to make this work. Mirage uses config.ml to programmatically generate a Makefile and main.ml when you invoke mirage --configure. main.ml uses instructions from config.ml to satisfy module types representing driver requirements for your application, then begins running the threads you requested that it run. That all sounds an awful lot like work; maybe we can get away with not asking for anything.

[] dothraki@iBook:~/trivial$ touch config.ml
[] dothraki@iBook:~/trivial$ mirage configure --xen
Mirage      Using scanned config file: config.ml
Mirage      Processing: /home/dothraki/trivial/config.ml
Mirage      => rm -rf /home/dothraki/trivial/_build/config.*
Mirage      => cd /home/dothraki/trivial && ocamlbuild -use-ocamlfind -tags annot,bin_annot -pkg mirage config.cmxs
empty       Using configuration: /home/dothraki/trivial/config.ml
empty       0 jobs []
empty       => ocamlfind printconf path
empty       Generating: main.ml
empty       Now run 'make depend' to install the package dependencies for this unikernel.
[] dothraki@iBook:~/trivial$ ls
_build  config.ml  empty.xl  log  main.ml  Makefile

That seems like a great start! Maybe we can trivially achieve our dream of doing nothing.

[] dothraki@iBook:~/trivial$ make depend
opam install mirage-xen --verbose
[NOTE] Package mirage-xen is already installed (current version is 1.1.1).

Resting on our laurels. Excellent. (In keeping with the lazy theme of this post, I’ll elide the make depend step from future examples, but if you’re playing along at home you may discover that you need to run it when you introduce new complexity in pursuit of perfect non-action.)

[] dothraki@iBook:~/trivial$ make
ocamlbuild -classic-display -use-ocamlfind -pkgs lwt.syntax,mirage-types.lwt -tags "syntax(camlp4o),annot,bin_annot,strict_sequence,principal" -cflag -g -lflags -g,-linkpkg,-dontlink,unix main.native.o
ocamlfind ocamldep -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -modules main.ml > main.ml.depends
ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml
+ ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml
File "main.ml", line 8, characters 2-13:
Error: Unbound module OS
Command exited with code 2.
make: *** [main.native.o] Error 10
[] dothraki@iBook:~/trivial$ 

Oh, bother.

How to Set the Evil Bit

Our mission: fuzzing TCP options from scapy.

Our target: the echo service from mirage-tcpip/examples/services.ml.

Outcome: a revision on a widely-used OCaml dependency, gleeful murder and resurrection of several EC2 instances, something to brag to my mom about, a look at a case worse than failure, and great justice.

Parsers Optional

Friends, I have spoken to you of TCP and of fuzzing. Next I will speak to you of both, but today, I will speak to you of TCP options. If you’re here for the pwnage, sit tight; it’s coming.

What Even Is TCP Anyway

Here’s the lazy way of explaining it: TCP is the abstraction layer that allows you to pretend that network communication works in a logical, orderly, reliable fashion when you’re writing an application. Reading data and having it always be in the order it was sent? TCP. Being able to know whether a connection is open or closed? TCP. Knowing the difference between data coming from two separate processes on the same remote host? TCP. (There are other ways to get these guarantees, but the vast majority of Internet traffic that needs them gets them via TCP.)

On a less abstract level, TCP is a header (one of several!) that your operating system slaps on your network traffic before shipping it over the wire, on the way to its final destination. For damn near all the information on TCP you can shake a stick at, you can consult RFC 793 directly. The header summary, most relevant for our exploration, is reproduced below:

0                   1                   2                   3   
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|            Window             |
|       |           |G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             data                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Everything here is a fixed-length field except for Options, Padding, and data, all of which are optional. Data is up to the application, when it’s present (and is also frequently referred to as payload). When you loaded this web page, TCP packets were sent from my server at somerandomidiot.com to your computer, and the contents of the data field were these very words that you’re reading right now. TCP is data-agnostic; it only cares that your payload arrives intact, not what’s in it.

Options, on the other hand, are very much TCP’s concern.

Tying the Knot

“This is a pretty strange piece of code, and it may take a few moments of thought to figure out what’s going on.”

– Real World OCaml

A few weeks ago, fellow Hacker Schooler Chen Lin and I were trying to solve a simple graph problem in Haskell. I was all ready to charge forward with something quite like the Java implementation I learned back in undergrad, but my fellow Hacker Schooler had some hesitation around whether this kind of structure would work in Haskell.

After a little bit of Googling, I found out that the canonical solution in Haskell involves something intriguingly dubbed tying the knot. I stared blankly at this HaskellWiki page with my fellow Hacker Schooler, trying to understand it quickly enough to have a useful conversation about it, and failed. We threw a couple of other ideas around and then decided to both pursue other projects. I moved on, Chen moved on, and I’m not sure either of us thought much about it…

…until yesterday, when I ran into tying the knot again. This time, it was hiding deep within (of all things!) the chapter on imperative programming in Real World OCaml, and I was unhurried and determined. “Abstract concept, I am going to understand you so hard,” I thought, jaw set.

Arriving At the Mirage

When last we left our hero, I was strugging valiantly to get a Mirage unikernel version of this blog running on Amazon EC2. All unikernels built and shipped off to EC2 would begin booting, but never become pingable or reachable on TCP port 80. ec2-get-console-output on any instance running a Mirage unikernel would show the beginning stages of a DHCP transaction, then the disappointing RX exn Invalid_argument("String.sub"), then… silence.

When all you had for many years was a hammer, stuff is still going to look an awful lot like nails to you, even if it’s pretty distinctly screw-shaped. I wanted to take a packet trace of this transaction pretty badly. I could do three things that were almost like this:

  • get a packet trace of another machine getting a DHCP lease on EC2
  • get a packet trace of a unikernel getting a DHCP lease on my local Xen server
  • print out an awful lot of diagnostic data from the EC2 unikernel and read it from the console

Trying to draw some conclusions from the first option above led me down the wrong path for about a day or so. I did manage to cause the DHCP client to fail on my local Xen server by sending a DHCP reply packet with no server-identifier set, using scapy and some hackery to cause the xid to always match:

Advancing Toward the Mirage

I left off last time telling you about getting Mirage to not work. I’m still working hard to get this blog – yes, this one you’re reading now – up and running as a unikernel on EC2.

It became clear to me last week that I needed to fork my own instance of the mirage-tcpip repository and compile my kernels with it, if I were to make any progress in debugging the DHCP problems I was having. A few naive attempts to monkey with version of mirage-tcpip downloaded by opam weren’t successful, so I set about to figure out how actual OCaml developers develop in OCaml with opam.

First stop: the opam documentation on doing tricky things. This is a little short of a step-by-step “do this, dorp” guide, unfortunately; here’s what I end up doing, and it sorta seems to work.

It's a mirage! (Or, how to shave a yak.)

A week or so ago, I heard about the Mirage project, a library OS project that makes tiny virtual machines running on top of Xen to run a given application, and do nothing else. I was intrigued, and started working through the excellent intro documentation, and got to the point where I wanted to replace my ho-hum statically-compiled blog hosted from Ubuntu LTS with a unikernel that would serve my static site and do nothing else.

There are excellent instructions on doing this with a Jekyll site on Amir Chaudhry’s blog. Octopress, which I use to generate this site, is built on top of Jekyll, and I only had a few extra goodies to throw in before I was able to make a unikernel that would run my blog with a few rake invocations. After getting the first unikernel up and running via Xen on my laptop, I entertained myself by throwing a few nmap commands at it; I was particularly curious to see whether my unikernel knew what to do with UDP packets:

sudo nmap -sO 192.168.2.13

Starting Nmap 6.40 ( http://nmap.org ) at 2014-03-14 23:26 EDT
Nmap scan report for 192.168.2.13
Host is up (0.00037s latency).
Not shown: 254 open|filtered protocols
PROTOCOL STATE SERVICE
1        open  icmp
6        open  tcp
MAC Address: 00:16:3E:53:E0:1B (Xensource)

Nmap done: 1 IP address (1 host up) scanned in 17.72 seconds

Hee hee hee.

Finding Kitten

Robot Finds Kitten

Sometime way back in the past, a human who wanted me to feel joy introduced me to Robot Finds Kitten, a Zen simulation which is pretty close to exactly what it says on the tin. There are already quite a lot of ports of the original POSIX implementation, but none of them were written in Elm. Obviously this is a problem that needs fixing.

Before I get into the gory details of learning Elm via robots, I should tell you that my implementation is available for free play, and you can also go look at the source code.

Elm

I got a really wonderful introduction to Elm when Evan Czaplicki came to Hacker School in our second week. We got a slightly adapted version of this talk from StrangeLoop 2013, which moved me to make a browser game (something I’ve never wanted to do at any previous point in life). The language seemed elegant and expressive, for lack of less cliched words, and I thought it might be relatively simple to make a succinct Robot Finds Kitten clone.

Sometimes it's no fun to be right.

I promised a lot of people that I would let them know how Hacker School is. It’s difficult for me to answer this question (although Julia Evans, a previous batch process, has done a fantastic job), both because it feels so early in the batch and because Hacker School is a lot of things. I also promised a lot of people that I would let them know how New York is, and that’s a little easier, so I’ll start there and then move on.