Whacking the Bitcoin Piñata

@yomimono or @hannesm surely know if people have tried crowbar on the BTC Piñata. – @kensan@mastodon.social

tl;dr - yes, and it seems that ocaml-x509 is not trivially easy to trick.


The Bitcoin Piñata

In 2015 David Kaloper-Mersinjak and Hannes Mehnert released ocaml-tls, an implementation of TLS (formerly known as SSL) written fully in OCaml. A full writeup of the stack is available in their Usenix Security 2015 paper, and as a series of blog posts on mirage.io. To accompany the release they also deployed a fully-automated bug bounty for the security stack – the bitcoin piñata.

The piñata will establish TLS connections only with endpoints presenting a certificate signed by its own, undisclosed certificate authority, but allows an attacker to easily listen to the encrypted traffic. The piñata always sends the same plaintext in such a connection: the private key to a wallet containing approximately 10 bitcoin. If the attacker can decrypt the ciphertext, or trick the piñata into negotiating a TLS connection with another host and disclosing the key, the information (and therefore the money) is theirs.


Crowbar is a library for writing tests. It combines a property-based API (like QuickCheck) with a coverage-driven generator of test cases (like the fuzzer American Fuzzy Lop). Crowbar tries to find counterexamples to stated properties by prioritizing the generation of test cases which touch more code. It is very good at finding counterexamples.

Testing ocaml-x509

TLS connections are usually authenticated via X509 certificates. ocaml-tls uses ocaml-x509 for this purpose, which is written as a standalone library. There is a clear separation of concerns between ocaml-x509 and ocaml-tls, and a straightforward API for certificate operations in ocaml-x509; both features help tremendously in writing tests for certificate handling.

Crowbar Your Favorite Library for Fun and Bugfixes

Crowbar is a tool that combines afl-persistent’s instrumentation with quickcheck-like property-based testing. afl-fuzz is a great tool for detecting crashes, but Crowbar helps us go a step farther and automatically discover inputs which cause our program to no longer have the properties we expect it to have. For reasons that don’t need exploring at this juncture, I first thought to apply Crowbar to charrua-client, a library which implements the DHCP state machine from a client perspective.

Parsers Optional

Friends, I have spoken to you of TCP and of fuzzing. Next I will speak to you of both, but today, I will speak to you of TCP options. If you’re here for the pwnage, sit tight; it’s coming.

What Even Is TCP Anyway

Here’s the lazy way of explaining it: TCP is the abstraction layer that allows you to pretend that network communication works in a logical, orderly, reliable fashion when you’re writing an application. Reading data and having it always be in the order it was sent? TCP. Being able to know whether a connection is open or closed? TCP. Knowing the difference between data coming from two separate processes on the same remote host? TCP. (There are other ways to get these guarantees, but the vast majority of Internet traffic that needs them gets them via TCP.)

On a less abstract level, TCP is a header (one of several!) that your operating system slaps on your network traffic before shipping it over the wire, on the way to its final destination. For damn near all the information on TCP you can shake a stick at, you can consult RFC 793 directly. The header summary, most relevant for our exploration, is reproduced below:

0                   1                   2                   3   
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
|          Source Port          |       Destination Port        |
|                        Sequence Number                        |
|                    Acknowledgment Number                      |
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|            Window             |
|       |           |G|K|H|T|N|N|                               |
|           Checksum            |         Urgent Pointer        |
|                    Options                    |    Padding    |
|                             data                              |

Everything here is a fixed-length field except for Options, Padding, and data, all of which are optional. Data is up to the application, when it’s present (and is also frequently referred to as payload). When you loaded this web page, TCP packets were sent from my server at somerandomidiot.com to your computer, and the contents of the data field were these very words that you’re reading right now. TCP is data-agnostic; it only cares that your payload arrives intact, not what’s in it.

Options, on the other hand, are very much TCP’s concern.

Throwing Some Fuzzy Dice

I mentioned a while ago that the Mirage project agreed to have me on board, through the OPW internship project, for the summer. We started on Monday, and I’ve already had a lot of fun!

Officially, my job for the summer is to help shore up the network stack in Mirage, in part by running the current code through its paces, and in part through implementing some new functionality. This first week, I continued some work I started at the end of Hacker School - figuring out how to fuzz some strange (and not-so-strange) corners, and how to wrangle the data I got out of doing so.

Fuzz What Now?

Let’s step back. Way, way, way back.

If you’re a computer program, and you have some data that you care about, your data is likely in some kind of structure reflecting an underlying order to that data. Objects are a common way to organize this stuff; dictionaries, hashmaps, lists, arrays, trees, the list goes on. That’s all well and good when your program is running, keeping all this stuff in memory. But it happens depressingly often that you need to dump this stuff to permanent storage, or express it in some way to some other program or another computer, or represent it on the screen because something awful has happened, and you can’t just say “memory address 0x52413abd, memory address 0x52413cda, memory address 0x52413ea2” - these things are meaningless outside the context of the current run of that program.

So we have serialization, the high-level concept for the jillion different ways to take that data and put it in a string, or a binary data format, so something else can read that string and reassemble the structure of the data. That’s deserialization, which implies parsing; parsing is a pretty big deal.

When the data you’re attempting to assemble into a structure is as you expect it and everything is correct, parsing’s no problem. But it frequently happens that everything is not as you expect it, for any number of reasons - the programmer who made the program that made the message made a mistake; the programmer who made the program that reads the message made a mistake; the programs reading and writing the message are using different versions of the specification in the first place; the specification wasn’t specific about whether the third byte’s range from 0 to 5 was inclusive or exclusive and each programmer made a different decision; both programs agree, but the message was corrupted in transit; the message was corrupted in transit, and one program has implemented a different corruption recovery algorithm than the other… I’ll stop now, but I could keep going for a long time.

There are a lot of bad messages out there. It’s hard to make your parser do the right thing when it receives an arbitrary bad message. It can be hard to even know that your parser does the wrong thing when it receives an arbitrary bad message - if you thought of a certain kind of bad message to use in testing, of course you fixed your code to deal with it; you thought of it! But there are almost certainly loads more bad messages out there than the ones you thought of - both by chance, and by design.

If humans can’t make enough bad messages, maybe computers can. Randomly generating a whole mess of bad messages, sending them to your program, and seeing what happens is called fuzz testing, and it’s awesome.