What a Distributed, Version-Controlled ARP Cache Gets You

git (and its distributed version control system friends hg and darcs) have some great properties. Not only do you get a full history of changes on objects stored in them, you can get comments on changes, as well as branching and merging, which lets you do intermediate changes without messing up state for other entities which want to work with the repository.

That’s all pretty cool. I actually want that for some of my data structures, come to think of it. Say, for example, a boring ol’ key-value store which can be updated from a few different threads – in this case, a cache that stores values it gets from the network and the querying/timeout code around it. It would be nice if each thread could make a new branch, make its changes, then merge them into the primary branch once it’s done.

It turns out you can totally do that with Irmin, “the database that never forgets”! I did (and am still doing) a bit of work on sticking a modified version of the MirageOS address resolution protocol code’s data structures into Irmin:

$ git log --all --decorate --oneline --graph
* 68216f3 (HEAD, primary, expire_1429688434.645130) Arp.tick: updating to age out old entries
* ec10c9a entry added: 192.168.3.1 -> 02:50:2a:16:6d:01
* 6446cef entry added: 10.20.254.2 -> 02:50:2a:16:6d:01
* 81cfa43 entry added: 10.50.20.22 -> 02:50:2a:16:6d:01
*   4e1e1c7 Arp.tick: merge expiry branch
|\  
| * cd787a0 (expire_1429688374.601896) Arp.tick: updating to age out old entries
* | 8df2ef7 entry added: 10.23.10.1 -> 02:50:2a:16:6d:01
|/  
* 8d11bba Arp.create: Initial empty cache

A Bit on ARP

ARP (“Address Resolution Protocol”) is an oft-neglected part of how IPv4 and Ethernet (and garden-variety WiFi, which in this context pretends to be Ethernet) get glued together. In lieu of a full explanation of ARP, you can imagine it as a mapping between IPv4 addresses (e.g. 192.168.42.11) and link-level addreses (e.g. 00:16:3e:c0:ff:ee). Link-level addresses are assigned per-hardware device. A machine can have multiple IP addresses per link-level addresses; it will have only one link-level address per network hardware device.

A machine attempting to resolve an IPv4 address on the local network (including the gateway through which it sends traffic for networks that are not local) sends an ARP request; if a reply is heard, it stores the entry in the ARP cache. Some hosts may send broadcast reply messages without having heard a request when claiming new IP addresses. These messages are often called “gratuitous ARPs”, and should also result in a cache entry when heard. An ARP implementation also needs to emit replies when it hears queries for IPv4 addresses it is using.

A few timeouts are also required. Entries in the cache need to be aged out after a while, since IPv4 addresses are frequently reassigned on a network and an entry from several days ago may no longer be valid. An ARP implementation also needs to give up on resolving an address if no intelligible reply message is received within a reasonable amount of time.

ARP in Mirage

With a library OS implemented modularly, you’d think that it would be possible to substitute out a given ARP implementation with one that exposes the same API with similar semantics. Unfortunately, in the current case of MirageOS, it’s not cleanly possible with the existing tcpip – the ARP implementation isn’t specified when building a stack; rather it’s included in the Ipv4 module.

Since Irmin also doesn’t have a nice story for running inside Xen unikernels yet, I decided to step outside of MirageOS unikernels to do some initial mucking around with Irmin-ified ARP caches within a Unix-only test environment.

Storing the ARP Cache

The cache in the existing implementation is built around the Pervasives Hashtbl.t, which is mutable. With Irmin, the mutability is managed entirely by the DVCS in which the structure is stored, so it’s better to use an immutable data structure. Since what we’re implementing is still just a simple key-value store, and moreover it’s on something which is orderable, let’s flip the code over to use a Map with keys of type Ipaddr.V4.t and entries of the record type (Entry.t) used by the previous ARP implementation. Entry.t is a variant of either a Pending of Lwt.u * Lwt.t entry, for addresses which the implementation is currently attempting to look up, or a Confirmed of float * Macaddr.t entry for addresses which have been mapped, with an expiry time and a link-level address.

Changes

In order to use this structure with Irmin, I had to define several functions required by the Tc.S0 module type; in order to do that, I had to implement several of them for the underlying Ipaddr.V4.t and Entry.t. I wrote test code for all of this using alcotest to attempt to keep implementation mistakes to a minimum.

Tests for the interface to the ARP table type at the level where Irmin is wrapping calls were more interesting. I initially wrote the test code to use the in-memory database version of Irmin, but soon realized that it might be interesting to use an on-disk database instead. Using an on-disk database meant that I could see the results of each commit in the history, so that if a test was failing, I might have some idea why.

Commit History as Test Artifact

Here’s an example from debugging ARP entry expiry, where the expected behavior is that one thread will clone, remove entries with an expiry time before the current time, and merge the map with removed entries back into the primary branch. All changes occur to the same file (or node, more correctly), namely the root node, represented in the filesystem as a file named __root__. The commit history shows:

  • primary branch: an initial commit of an empty map
  • new branch “entries_aged_out”: a commit which adds an entry to the cache
  • new branch “expire_1429…860”: a commit which makes no change (although the comment and branch name show that this is the entry expiry thread, operating on the empty map it finds at the HEAD of the primary branch)
  • a merge of branches “entries_aged_out” and “expire_1429…860” into the primary branch
  • new branch “expire_1429…584”: a commit which removes the only entry in the cache
  • a merge of branch “expire_1429…584” into the primary branch

    * a8e9140 (HEAD, primary, expire_1429688434.984584) Arp.tick: updating to age out old entries
    *   0e233ec entries_aged_out: merge seeded ARP entry
    |\  
    | * 17717e4 (expire_1429688374.939860) Arp.tick: updating to age out old entries
    * | 0fa385b (entries_aged_out) entries_aged_out: seed cache entry
    |/  
    * d1c42a3 Arp.create: Initial empty cache
    

With the exception of the explicit seeding of the ARP cache in the “entries_aged_out” thread, each action is an expected side effect of the ARP cache’s operation, conducted in the background after a cache is initialized via its create function. Yet by observing the commit record of the cache, we can observe that the operations happened, or didn’t happen, as expected. We can even programmatically manipulate the cache with Irmin to write test code that asserts that, for example, merging a branch created by an expiry thread never resulted in an entry being added to the cache – a possible outcome of a race condition between expiry and update threads.

Testing Side Effects

The ARP code in Mirage isn’t just responsible for manipulating the cache in response to incoming messages and the passage of time. It’s also responsible for providing its own messages to the network, to ensure that other hosts can map any IP addresses that the stack claims to their underlying link-level addresses. These operations don’t touch the cache, so our souped-up data structure won’t help us – if we want to ensure that they’re happening, and happening correctly, we need to do something else.

Enter mirage-vnetif, a software networking bridge that lets us send packets without sending packets. With mirage-vnetif, we can define one “backend” (analogous to a switch) for multiple “network interfaces” to plug into. The network interfaces fulfill the module type V1_LWT.NETWORK defined by mirage-types (a more complete explanation of the modular, functorized stack is available on the MirageOS blog), so stacks can be built on top of them – including Ethernet and ARP layers. Using virtual network interfaces, we can assert that certain operations result in packets going out on the network, and that the packets look like we expect, from within a test framework like alcotest`.

In the test following, we set up two virtual network interfaces, a speaker and a listener, and build an Ethernet and ARP layer on top of each. We then instruct the speaker’s ARP layer that it should set an IPv4 address for which to answer probe requests. According to the documentation for ARP, this should result in an outgoing gratuitous ARP for that IP, mapping it to the speaker’s link-layer address. With the listener interface on the same backend, we should be able to hear via its ARP-layer listener whether the speaker spent a gratuitous ARP that could be understood.

The code below is excerpted and slightly edited for readability from a larger set of tests.

let input_single_reply () =
  (* some module definitions: 
   module B = Basic_backend.Make
   module T = Table.Make(Irmin.Path.String_list)
   module Irmin_storer_fs = Irmin_git
   module Irmin_backend_fs = Irmin_unix.Irmin_git.FS
   module I_fs = Irmin.Basic(Irmin_backend_fs)(T)
   module A_fs = Irmin_arp.Arp.Make(E)(Clock)(OS.Time)(Irmin_backend_fs)
   *)
  (* use on-disk git fs for cache so we can read it back and check it ourselves *)
  let root = root ^ "/input_single_reply" in
  (* set some configuration parameters for Irmin's use *)
  let listen_config = Irmin_storer_fs.config ~root:(root ^ "/listener") () in
  let backend = B.create () in (* create a software bridge *)
  (* get_arp is a convenience function that sets up an ARP stack on a 
     given network interface, and returns the relevant layer records *)
  get_arp ~backend ~root:(root ^ "/listener") () >>= 
  fun (backend, listen_netif, listen_ethif, listen_arp) ->
  get_arp ~backend ~root:(root ^ "/speaker") () >>=
  fun (backend, speak_netif, speak_ethif, speak_arp) ->
  (* run a listener which will call listen_arp if a packet is received,
     and then shut down the listener's underlying interface.  
     shutting down the interface stops the test from unnecessarily 
     waiting for more packets.  then, call code which should result in 
     a GARP from one side (speak_arp) and make sure a corresponding
     cache entry appears on the other side (listen_arp). *)
  timeout_or ~timeout:0.1 ~msg:"Nothing received by listen_netif when trying to
  do single reply test"
    listen_netif (fun () -> A_fs.set_ips speak_arp [ first_ip ] >>= fun _a -> Lwt.return_unit)
    (fun () -> fun buf -> A_fs.input listen_arp buf >>= fun () -> V.disconnect listen_netif)
  >>= fun () ->
  (* load our own representation of the ARP cache of the listener *)
  let store = Irmin.basic (module Irmin_backend_fs) (module T) in
  Irmin.create store listen_config Irmin_unix.task >>= fun store ->
  (* retrieve the cache from its storage location,
     which is always T.Path.empty  *)
  Irmin.read_exn (store "readback of map") T.Path.empty >>= fun map ->
  try
    let open Entry in
    match T.find first_ip map with
    | Confirmed (time, entry) -> 
        (* an entry was found, so make sure it's mapping
           to the correct link-level address *)
        OUnit.assert_equal ~printer:Macaddr.to_string 
           entry (V.mac speak_netif);
      Lwt.return_unit
    | Pending _ -> OUnit.assert_failure "Pending entry for an entry that had a
  GARP emitted on the same vnetif backend"
  with
    Not_found -> OUnit.assert_failure "Expected cache entry not found in
    listener cache map, as read back from Irmin"  

Test Results

A nice side effect of programming against a DVCS API is that test results can also easily be put in the DVCS store. If you use a filesystem store for testing, rather than an in-memory one, the results will persist across runs. You could even conceive of including the history of test results in your main code repository, where commits would be immediately followed by a branch with a series of commits reflecting the test sttus of the branch. (I enabled this on the irmin-arp repository for a while, but it turned out to be a bit annoying in my particular workflow; however, I can imagine this being useful for other people, projects, or levels of confidence.)

$ git log --oneline --format="%h (%ad) %s" |grep "merge_conflicts_difft"|head -1
6a812a6 (Wed Apr 22 07:39:27 2015 +0000) merge_conflicts_difft_noeds: test succeeded; removing data
$ git log --oneline --format="%h (%ad) %s" |grep "merge_conflicts_difft"|tail -1
fef5d73 (Thu Apr 9 15:45:32 2015 +0000) merge_conflicts_difft_nodes: beginning new test
$ git log --oneline --format="%h (%ad) %s" |grep "merge_conflicts_difft"|wc -l
1296

So?

So there are some obvious problems here:

  • the implementation is not complete
  • the implementation is not usable in a unikernel (yet)
  • when the implementation is complete, it will be annoying to use with tcpip without patches
  • since Irmin “never forgets”, the memory use of the cache will grow without bound

The first three of those are “just work on it more” issues. The last is a more general known problem in using Irmin, which I’m not qualified to solve, but maybe you are?

Acknowledgements

Some of the research leading to these results has received funding from the European Union’s Seventh Framework Programme FP7/2007-2013 under the UCN project, grant agreement no 611001.