It's a mirage! (Or, how to shave a yak.)

A week or so ago, I heard about the Mirage project, a library OS project that makes tiny virtual machines running on top of Xen to run a given application, and do nothing else. I was intrigued, and started working through the excellent intro documentation, and got to the point where I wanted to replace my ho-hum statically-compiled blog hosted from Ubuntu LTS with a unikernel that would serve my static site and do nothing else.

There are excellent instructions on doing this with a Jekyll site on Amir Chaudhry’s blog. Octopress, which I use to generate this site, is built on top of Jekyll, and I only had a few extra goodies to throw in before I was able to make a unikernel that would run my blog with a few rake invocations. After getting the first unikernel up and running via Xen on my laptop, I entertained myself by throwing a few nmap commands at it; I was particularly curious to see whether my unikernel knew what to do with UDP packets:

sudo nmap -sO 192.168.2.13

Starting Nmap 6.40 ( http://nmap.org ) at 2014-03-14 23:26 EDT
Nmap scan report for 192.168.2.13
Host is up (0.00037s latency).
Not shown: 254 open|filtered protocols
PROTOCOL STATE SERVICE
1        open  icmp
6        open  tcp
MAC Address: 00:16:3E:53:E0:1B (Xensource)

Nmap done: 1 IP address (1 host up) scanned in 17.72 seconds

Hee hee hee.

I spent a little time automating my build process for my blog, and then I spent waaaaay too much time trying to make a nice way to deploy the image automatically to somewhere out on the cloud. Running the generated image locally on Xen is extremely easy - xm create -c image.xl and we’re good - but I’m not interested in running a publicly accessible hypervisor; I’m interested in using someone else’s.

At first blush, Amazon EC2 seems like it would be the best choice – it’s built on top of Xen, it offers a free tier that would easily serve my needs for computational power, and I even already had experience using it. Unfortunately, a series of dependencies makes EC2 suboptimal for this purpose:

The EC2 Free Tier is 750 hours of usage on a t1.micro instance.
To use any EC2 instance, one must make an Amazon Machine Instance (AMI) for Xen to boot; one can’t just upload a kernel
There are two types of storage available for an instance - instance-backed storage and Elastic Block Storage
EC2 t1.micro instances must use Elastic Block Storage for their filesystems; while instance types which use instance-backed storage are available, the cheapest one costs $.06/hour vs the t1.micro $0.02
It’s only possible to make an Amazon Machine Image which uses Elastic Block Storage from an EC2 image

Therefore, in order to run a Mirage kernel on a t1.micro instance, one must do all of the following:

spin up an instance running a traditional Linux OS image
create an EBS block to attach to the Linux image
attach the EBS block to the Linux image
make a bootable filesystem on the EBS block
copy the Mirage kernel to the EBS block over the network
make a workable Grub configuration on the EBS block
snapshot the EBS block
associate the EBS block snapshot with an Amazon Kernel Image that boots PV-GRUB, which gives you an Amazon Machine Instance blocked by an Elastic Block Store
finally, fire up your AMI from the previous step.

This process is already documented on the Mirage website, and it’s even partially scripted via Mirari. Most steps can be done programmatically. But it’s a really nasty, involved, convoluted process, especially if you want to develop your unikernel locally but test in the cloud; having a test-deploy cycle of tens of minutes is ruinous for someone as distractable as myself.

Since it’s cheaper to run two t1.micro instances than one m1.small (a maximum of $.04/hr vs a steady $.06/hr, respectively), I thought the best way to solve this might be to maintain a build host for kernels which would be a t1.micro instance itself. One could even keep the build host stopped when kernels weren’t being tested or actively developed, minimizing the attack surface presented by running a remote server.

After launching a t1.micro EBS-backed instance running the most recently new Ubuntu hotness, I tag it in the admin console as role=host, so I’ll be able to distinguish it from my generated kernels running on other instances when interacting with EC2 programmatically. I write a little squib to get the public hostname out of that output:

ec2-describe-instances --region us-west-2 -F tag:role=host|grep ^INSTANCE|cut -f4

and ssh in with the chosen keypair. Once I’m into the server, I have some traditional sysadminning to do to get the machine up to date. (This includes, humorously enough, a kernel update.) I go to install the packages I need for programmatic manipulation of volumes and snapshots and I get a nasty surprise:

E: Package 'ec2-ami-tools' has no installation candidate
E: Package 'ec2-api-tools' has no installation candidate

Hrmph! Fine, I’ll install them from source, like an animal. I also copy over some auth information, which I’ll admit makes me real itchy to do. I have to install an entire JVM just to get the tools working - especially galling, since I know they’re little more than an interface to a REST API, and underneath all of that crud is something that could probably be replaced with a very small shell script. After wrangling the EC2_HOME and JAVA_HOME environment variables into the right shape, I’m able to run commands like ec2-describe-instances from my host instance. Yaaaay.

Next, we need an EBS block from which to take snapshots. Creation and deletion of EBS volumes is free, but keeping them around costs money in quantity, so I decide to programmatically create and delete an EBS volume for each invocation of the script. If the latency on this operation is too terrible, I can easily rewrite the script to use a persistent block.

After a moderate amount of scripting, I have my first automatically generated AMI up and running! …well, for values of “running”. According to ec2-get-console-output, the Mirage kernel is running:

[] me@my-computer:~$ ec2-get-console-output --region us-west-2 i-abcd0123
i-abcd0123
2014-03-14T23:32:57+0000
Xen Minimal OS!
  start_info: 0xac4000(VA)
    nr_pages: 0x26700
  shared_inf: 0x7df30000(MA)

…

close blk: backend at /local/domain/0/backend/vbd/2188/2049
kernel.c: Mirage OS!
kernel.c:   start_info: 0x1905000(VA)

…

DHCP: start discovery

Sending DHCP broadcast len 552
Dropping ipv6
Dropping ipv6
DHCP: input ciaddr 0.0.0.0 yiaddr 172.31.11.10 siaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 
????z sname  file 

DHCP: offer received: 172.31.11.10

…but nobody’s home on the public IP, and it fails Amazon’s reachability tests. I double-check the security group settings, and it should be reachable. Hm, what’s going on here? I flail around trying reboots and redeploys, but unsurprisingly nothing changes. Looking more closely, it becomes clear that the kernel isn’t completing the DHCP process when running on EC2; it should be requesting the IP (in this case, 172.31.11.10) that the DHCP server offered it, and then binding to that IP when the server confirms that it’s OK.

I dig down pretty deep into the IP/UDP/BOOTP/DHCP stack in the mirage-tcpip module without finding any solid answers. Figuring that there must be some difference in the nitty-gritty details of how my local copy of dnsmasq answers DHCP lease requests, versus how Amazon’s answering them for EC2 servers, I run some packet captures hoping something will jump out at me as obviously different. I don’t come up with much. I proceed to trying to reproduce the remote behavior locally with scapy, and that’s where I’m at right now.

In the process of working with Mirage, I’ve learned a new package manager and written code in three languages that I’d never touched before, and I don’t even have it working yet! I’d call that some pretty good yak shaving, wouldn’t you?