Chamelon: MVP persistent block storage for MirageOS

Page content

TL;DR: I wrote a key-value store for MirageOS backed by block storage. It’s called chamelon, it’s based off LittleFS, and if you’re brave, you can use it to store data. Examples are available: a URL shortener and an OAuth2 authorization server.

In English: I couldn’t save or load files before, and now I could. Wowzers!

Motivation

I’ve spent a long time avoiding needing persistent writeable storage in my MirageOS unikernels. This blog, for example, doesn’t serve files from a filesystem - it uses crunch to make an in-memory key-value store for the unikernel out of files from the system on which the unikernel was built. Since the blog doesn’t generate any information while running that it needs to store, that’s plenty fine. Many other “needs” for storage can be served by a remote database server, rather than a local filesystem, and in fact should be for any nontrivial data storage.

Unfortunately, there’s an inconvenient middle ground, where we need to store a small amount of crucial data which we’d prefer to keep as close to the vest as possible. For reasons that do not need explaining at this juncture, I recently found myself unavoidably on that middle ground. To fill the gap between block and kv, there is a released FAT library for MirageOS which I found unsatisfactory based on previous use and wasn’t interested in improving, and an unreleased filesystem called wodan which I wasn’t confident would ever see release.

In short, it seemed my choices were to commit to maintaining a storage layer someone else had initially written, or maintain a storage layer I’d done the initial work on myself. I’ve done a lot of work on MirageOS that sounds like “maintain something someone else initially wrote”, and since resigning from the core team, I’ve been enjoying not doing that anymore. I decided to continue not doing that, and start my own filesystem implementation. At least I would get to make all the design mistakes in this one!

Design

Speaking of design, I’m not a storage expert. We wrote an ffs implementation in my undergrad Operating Systems course, and that’s about the most complicated storage thing I’ve ever been involved in. I did, however, have an idea of the traits I wanted in a storage system: * possible to implement mirage-kv’s hierarchical dictionaries on top of (in practice, a directory-based filesystem) * tolerates arbitrary poweroff without data loss or corruption * supports very small underlying storage (< 1Mb, but ideally smaller)

Like any underinformed person, I set about searching the web with stuff like “easy small fault-tolerant filesystem”. After an hour of depressing results, I came across the LittleFS project. In its own words,

At a high level, LittleFS is a block based filesystem that uses small logs to store metadata and larger copy-on-write (COW) structures to store file data.

Although the design had a lot of nice attributes, I’ll admit that the following quote from DESIGN.md is what hooked me:

This relies on the super secret computer science hack where you can pretend any algorithmic complexity is O(1) by bounding the input.

That’s the kind of design philosophy I can get behind. chamelon, which the Internet informs me is French for “little camel” and definitely not a swear or anything, is only not named ocaml-littlefs because I was afraid that would be insulting to LittleFS.

Implementation

chamelon started off as a read-only tool for interpreting LittleFS filesystems, based on the on-disk structure documentation in SPEC.md. LittleFS’s reference implementation came with several useful tools for diagnosing my implementation’s various failures in interpreting on-disk structures, and the FUSE implementation for LittleFS also went a long way toward generating test cases for a reasonable OCaml implementation.

There are a few properties of LittleFS that make sense for the original intended environment (microcontrollers directly addressing flash devices) that I chose not to include for chamelon:

wear-leveling: for a unikernel, we expect to get a virtual block device which the hypervisor is managing via its own device driver, which is presumably handling these details.
write fault detection: as above. However, we implement CRC32 checksumming on commits as in the LittleFS spec so as to retain interoperability; we just don’t check the CRC32 after write.
singly linked list threaded through the directory tree for constant RAM traversal. The additional complexity isn’t justified for any of my use cases.

Additionally, since we’re not implementing a POSIX filesystem layer but rather the MirageOS KV.RW module type, there’s a lot we don’t need to do. Prominently, we don’t care about mv operations at all, let alone atomic mv operations, which removes the motivation for a lot of global state manipulation in LittleFS.

LittleFS does not specify any on-disk format for timestamps, let alone modification or creation time. Implementing mirage-kv requires one to have some way of recording modification time, so this is implemented via “user attributes”.

Testing

chamelon has been lightly fuzz tested for graceful failure when initially connecting to a filesystem, and to ensure that formatting a block device always results in a clean filesystem.

It has also been lightly benchmarked to make sure baseline performance when searching directories is not worse than O(n).

A suite of manually created tests is available for LittleFS structure handling as well as the MirageOS KV.RW module implementation.

Most of chamelon’s critical problems have been discovered by the good old-fashioned practice of testing in prod. They are overwhelmingly related to the split operation, which allows multiple metadata blocks to represent a single directory; future testing, both manual and automated, is likely to focus on this area.

Strengths

Table Stakes

It’s trivial to wire into a MirageOS unikernel. It works for the thing I wanted to use it for.

It’s Small

The smallest possible littlefs filesystem is, indeed, very little, a property chamelon inherits. It can handle block devices smaller than many hypervisors are ready to provide.

Weaknesses

Performance, performance, performance

Lists of items within a directory are unordered, so any operation on a directory (including finding a file within it) is O(n) on items within the directory.

The mirage-kv interface has a few functions (last_modified, digest) which are recursive over the contents of a dictionary (in chamelon, this means we have to apply them on all directory contents). chamelon currently has no optimizations for these operations, although they could be implemented via user attributes. However, doing so would sacrifice some interoperability with littlefs tools, which wouldn’t update these attributes when writing to the filesystem.

Space Usage

This implementation is not suitable for use as a large filesystem. Addressing is limited to 32-bit block indices, and currently the solo5 block implementation supports blocks of a maximum size of 512 bytes, for a maximum addressible filesystem of 65535 MiB. Block devices of larger size can be supplied, but only 64 GiB will be usable by chamelon.

littlefs is most efficient in its space usage with many small ( < 1/4 block size) files in a broad hierarchy (i.e. most directories have many files or other directories in them). It is least space-efficient with many files of size > 1/4 block size and < 1 block size, and with deeply-nested directory structures. Combined with the duplication of every block’s metadata, this can result in lots of wasted space in worst-case scenarios.

So Release It, You Hypocrite

OK! chamelon (and chamelon-unix, a suite of command-line tools) are available in opam as of midday on 2022/03/04, thanks to lots of help from the opam maintainer team.

The mirage front-end tool doesn’t know about chamelon, so in order to use it you’ll have to ask for a block device in your config.ml and then run Kv.Make on the block device in your unikernel’s main method. See the examples at the top of this page for more details.

Happy saving!