When last we spoke, I left you with a teaser about writing your own NAT implementation.
iptables (and friends
pf, to be a little less partisan and outdated) provide the interfaces to the kernel modules that implement NAT in many widely-used routers. If we wanted to implement our own in a traditional OS, we’d have to either take a big dive into kernel programming or find a way to manipulate packets at the Ethernet layer in userspace.
But if all we need to do is NAT traffic, why not just build something that only knows how to NAT traffic? I’ve looked at building networked applications on top of (and with) the full network stack provided by the MirageOS library OS a lot, but we can also build lower-level applications with fundamentally the same programming tactics and tools we use to write, for example, DNS resolvers.
Building A Typical Stack From Scratch
Let’s have a look at the
ethif-v4 example in the mirage-skeleton example repository. This example unikernel shows how to build a network stack “by hand” from a bunch of different functors, starting from a physical device (provided by
config.ml at build time, representing either a Xen backend if you configure with
mirage configure --xen or a Unix tuntap backend if you build with
mirage configure --unix). I’ve reproduced the network setup bits from the most recent version as of now and annotated them a bit:
module Main (C: CONSOLE) (N: NETWORK) (Clock : V1.CLOCK) = struct (* N, a module of type NETWORK (defined in module V1_LWT from mirage-types), is the building point for the rest of our stack. Modules E, I, U, and T provide functions like [write], which take a record of the type matching the module (e.g., E.write needs an E.t argument) along with some information to write and generate a reasonable set of headers of the appropriate layer before calling a lower-level [write] function. *) module E = Ethif.Make(N) module I = Ipv4.Make(E) module U = Udp.Make(I) (* Ethernet, Ipv4, and UDP don't need outside timers or randomness, just an underlying implementation to listen from and write to, but TCP does *) module T = Tcp.Flow.Make(I)(OS.Time)(Clock)(Random) (* DHCP also needs timers and randomness *) module D = Dhcp_clientv4.Make(C)(OS.Time)(Random)(U) let or_error c name fn t = fn t >>= function | `Error e -> fail (Failure ("Error starting " ^ name)) | `Ok t -> return t let start c net _ = (* net is of type N.t *) or_error c "Ethif" E.connect net >>= fun e -> (* e is of type Ethif.t, on which we can call ethernet-level listen and write *) or_error c "Ipv4" I.connect e >>= fun i -> (* we can manually set IP options here for interface i, in addition to overwriting them (potentially) with DHCP below *) I.set_ip i (Ipaddr.V4.of_string_exn "10.0.0.2") >>= fun () -> I.set_ip_netmask i (Ipaddr.V4.of_string_exn "255.255.255.0") >>= fun () -> I.set_ip_gateways i [Ipaddr.V4.of_string_exn "10.0.0.1"] >>= fun () -> or_error c "UDPv4" U.connect i >>= fun udp -> let dhcp, offers = D.create c (N.mac net) udp in or_error c "TCPv4" T.connect i >>= fun tcp -> (* main body of code continues... *)
The code doesn’t do much once it’s built the stack – just prints lines to the console when various types of traffic are received – so I’ve elided that portion from the reproduction here. If we wanted to work with an
Ethif.t (a type representing the Ethernet layer communications on that interface), an
I.t (the IP layer), or even the raw physical device passed to the
start function with the name of
net, we can do that just as we can work with
Working with Multiple Network Interfaces
Working with two interfaces rather than one is fairly similar. A nice minimal example, working right down on the netif layer, is the
netif-forward example unikernel, also in
config.ml for this unikernel defines two interfaces, and
unikernel.ml provides a module Main functorized over two modules of type
NETWORK - there’s no expectation that these are necessarily the same type of physical interface, just that they both know how to satisfy the basic operations required of a network device.
Instead of building something on top of the provided netifs,
netif-forward (as of the latest revision) works with them directly – it takes packets from the first interface (
n1, of type
N1.t), queues them, and then sends them out the second interface (
n2, of type
N2.t) as quickly as it can.
module Main (C: CONSOLE)(N1: NETWORK)(N2: NETWORK) = struct let (in_queue, in_push) = Lwt_stream.create () let (out_queue, out_push) = Lwt_stream.create () let listen nf = let hw_addr = Macaddr.to_string (N1.mac nf) in let _ = printf "listening on the interface with mac address '%s' \n%!" hw_addr in N1.listen nf (fun frame -> return (in_push (Some frame))) let update_packet_count () = let _ = packets_in := Int32.succ !packets_in in let _ = packets_waiting := Int32.succ !packets_waiting in if (Int32.logand !packets_in 0xfl) = 0l then let _ = printf "packets (in = %ld) (not forwarded = %ld)" !packets_in !packets_waiting in print_endline "" let start console n1 n2 = let forward_thread nf = while_lwt true do lwt _ = Lwt_stream.next in_queue >>= fun frame -> return (out_push (Some frame)) in return (update_packet_count ()) done <?> ( while_lwt true do lwt frame = Lwt_stream.next out_queue in let _ = packets_waiting := Int32.pred !packets_waiting in N2.write nf frame done ) in (listen n1) <?> (forward_thread n2) >> return (print_endline "terminated.") end
Building a NAT Library and Unikernel
For our NAT implementation, we need to be able to:
- make reference to the publicly-routable IP address on the Internet-facing interface
- generate new and unique port numbers to use to disambiguate traffic from different hosts on the private network side
- keep a table mapping private-network connections to their public-network analogs
- add new entries to the table based on new connection attempts
- alter Ethernet, IP, TCP, and UDP headers of incoming and outgoing packets:
- replace ip addresses and ports according to table entries
- recalculate checksums on IP and transport layers after making other mutations
Since there’s nothing privileged about any of the data structures we’re using, or the memory we’re accessing, it’s relatively straightforward to pull the packet-transformation and inspection code out into a simple library that does the following:
- decomposes incoming packets into either a tuple of the relevant layers or None
- pulls relevant information for NAT decision-making (Ethernet layer ethertype, IP-layer source and destination address and protocol, transport-layer port numbers) out of packet layers
- given an existing NAT table and an incoming packet, either rewrites the packet according to the rules in the table or returns None
- given an existing NAT table and an incoming packet, along with an IP address and port number, creates a new NAT table rule for the packet using the IP address and port number provided
Along with a library that provides basic CRUD operations on the table itself, this is enough to get Internet browsing working through a NATting unikernel with not much code at all. If you’d like to try it out, here are some instructions on setting up a Xen machine to NAT via mirage-nat. The instructions given are for a CubieBoard2 or CubieTruck, but any machine running Xen with multiple network interfaces (or even virtual bridges, if you wish to NAT nonphysical devices) can run the NATting unikernel.
Some Comments on Limitations of the Implementation
This is not enough to have a stable or even reasonably secure Internet browsing through a NATting unikernel, largely because there’s no nice facility for table entries to be removed. This has two important consequences:
- the NAT table will grow until it consumes all available memory and the NAT device crashes. This mimics the behavior of many commercial implementations (memory exhaustion due to NAT table size is a common reason you need to restart your home router), but in this case that isn’t a feature.
- the NAT table will allow servers which previously replied to requests, to send new traffic to the host which made the original request. In other words, if a client made an unencrypted HTTP request to
the-toast.net, downloaded a webpage, and then closed the connection three days ago, the NAT device has no way of knowing that
the-toast.netshouldn’t be sending responses now. This is particularly bad in the case of UDP, which has fewer protocol-level safeguards against state-violating traffic.
There’s nothing about the MirageOS architecture that imposes these limitations – code which times out and maintains state is already implemented in MirageOS. Ideally, we’d want to make use of the state machine logic for TCP connections already included in the
mirage-tcpip library, so we could continue to use the power of our library OS architecture to avoid duplicating this code. We’d be stuck writing our own UDP “connection” expiry logic no matter what, since UDP is a connectionless protocol, although we could provide those as a library as well – perhaps a firewalling unikernel might be able to use this code in the future?
Some of the research leading to these results has received funding from the European Union’s Seventh Framework Programme FP7/2007-2013 under the UCN project, grant agreement no 611001.