Temporal Hex Dump

After building some hardware to trace and inject data on the Nintendo DSi’s RAM bus, it became obvious pretty fast that there’s a lot of data there, and (as far as I know) no good tools for analyzing these sorts of logs.

The RAM tracer has already given us a lot of insight into how the DSi works by virtue of letting us inspect the boot process, the inter-processor communication, and most of the code that runs on the system. But all of that knowledge comes in an indirect way, from using the RAM tracer as a platform to run other experiments. I’ve been interested in figuring out whether there’s a way to use the RAM trace itself to help understand a system’s dynamic behaviour.

The RAM is on a packet-oriented bus, so it would make sense to have a tool that looks kind of like a packet-based protocol analyzer. Think Wireshark, but for memory.

But there are also a lot of complex patterns that show up over time. As the DS loads a file, or initializes itself, or renders frame after frame of a UI, there are obvious patterns that emerge. So it also might make sense to have a visual tool, like vusb-analyzer.

Unfortunately, both of these approaches ignore the spatial organization of memory. The bus is a stream of packets that say ‘read’ or ‘write’, but the contents of RAM as a whole is more like a file that’s changing over time. Like in a version control system.

So the tool I’ve been imagining is kind of a hybrid of these. It would have a graphical timeline that helps you visually navigate through large datasets and identify timing patterns. It would have a packet-by-packet listing of the reads and writes. And most importantly, it would be a hex dump tool. But instead of showing a hex dump of a static file, it would be a two-dimensional hex dump. The hex dump shows space, but you can also scrub forward or backward in time, and watch the hex dump change. The hex dump could be annotated with colors, to show which data is about to change, or which data recently changed. You could right click on a byte, and see hyperlinks to the memory transactions that are responsible for that byte’s previous and next values.

As far as I know, nobody’s written a tool like this. So I have no idea how useful it will actually be for reverse engineering or performance optimization, but it seems like a promising experiment at least. So far I’ve been working on an indexing and caching infrastructure to make it possible to interactively browse these huge memory dumps, and I’ve been working on the visual timeline widget. Here’s a quick screencast:

The top section shows read/write/zero activity binned by address, with each vertical pixel representing about 64 kB. The horizontal axis is time, with continuous zooming. The bottom section of the graph shows bandwidth, color-coded according to read/write/zero. Blue pixels are reads, reds are write, and orange is a write of a zero byte.

This log file is about a gigabyte of raw data, or about 2 minutes of wallclock time. It shows the Opera browser on the Nintendo DSi loading a very large web page, then crashing. You can see its heap growing, and you can watch the memory access patterns of code, data, and inter-processor communication.

There’s a lot of room for improvement, but I’m optimistic that this will be at least a useful tool for understanding the DSi, and maybe even a more generally applicable tool for reverse engineering and optimization.

As usual, the source is in svn if anyone’s interested. It’s implemented with C++, wxWidgets, sqlite3, and Boost. I’ve only tested it on Linux, but it “should” be portable.

Virtual USB Analyzer

From late 2005 to early 2007, I worked on the USB virtualization stack at VMware. We ran into all sorts of gnarly bugs, many of which were very hard to reproduce or which required access to esoteric or expensive hardware. To help with debugging problems both internally and with customers in the field, we added logging support to our virtual USB stack. Starting with VMware Workstation 5.5, if you set the right hidden config option we’d start dumping the contents of all USB packets to a log. It was a USB sniffer (like USB Snoopy), but built into the virtual hardware.

To make it easier to analyze the resulting logs, I started working on a GUI tool that could navigate through these giant log files. This tool proved to be really useful within our team at VMware, and we’d often ask customers on the beta forums to generate log files that we could analyze. I called this tool vusb-analyzer.

Well, it’s been a while, but I’m proud to now have the opportunity to release vusb-analyzer as open source software under the MIT license. This isn’t just a code dump- I removed the tool from our internal repository today, and all future development will occur in the open, in a Subversion repository on Source Forge.

Currently, vusb-analyzer is most useful for analyzing logs captured by VMware products. Indeed, this is a convenient way to debug USB drivers. You can attach your USB device to VMware Workstation, VMware Fusion, or the free-as-in-beer VMware Player, and it can transparently save a plaintext log of all USB packets that pass through. Debugging a driver in a VM is really convenient, and I find it quite useful, but I understand it’s not for everybody. If you already have a favorite USB sniffer tool, it’s easy to extend vusb-analyzer with a log format plugin so it can read other formats. We already did this once, to add support for the logs generated by our favorite hardware USB analyzers.

I hope vusb-analyzer turns out to be useful for the open source community, particularly to those who are working on Linux device drivers. To get started, visit the project’s web site:


Besides the usual introduction and download links, there is also a detailed tutorial with lots of shiny screenshots 🙂

Introducing Metalkit

Metalkit is another of my random side-projects. It’s a very simple library for writing programs that run on IA32 (x86) machines on the bare metal. It isn’t an operating system, but it does contain some of the low-level pieces you might use to create one.

I created it partly for fun and for the challenge, and partly to use as a framework for low-level hardware testing at work. It is open source, released under an MIT-style license.

Features currently include:

  • A 512-byte bootloader that works either as a floppy disk MBR or a GNU Multiboot image. When you build a program with Metalkit, the same binary image can be used either as a raw floppy disk image or as a “kernel” image in GRUB. This makes it easy to use your programs on virtual machines (VMware, QEMU), emulators (Bochs), or real machines.
  • Basic PCI bus support. You can scan for PCI devices, find out what resources (I/O ports, memory, IRQs) they have, and poke at their configuration registers.
  • VGA text mode.
  • A very tiny zlib-compatible decompressor, the “puff” reference implementation of DEFLATE.
  • Low-level support for the PIT timer.
  • A small, efficient, and powerful interrupt subsystem. ISR trampolines are assembled at runtime, saving space in the binary. Any ISR can execute the equivalent of a longjmp(3) on return, making simple thread context-switching very easy. Includes basic PIC interrupt routing. Includes default fault handlers which dump CPU registers and the stack any time an unhandled fault occurs.

Metalkit could be useful for educational purposes, because programs written with Metalkit are extremely small and self-contained. This example is a complete Metalkit program which lists all devices on the PCI bus:

#include "types.h"
#include "vgatext.h"
#include "pci.h"
#include "intr.h"

    PCIScanState busScan = {};


    VGAText_WriteString("Scanning for PCI devices:\n\n");

    while (PCI_ScanBus(&busScan)) {
        VGAText_Format(" %2x:%2x.%1x  %4x:%4x\n",
        busScan.addr.bus, busScan.addr.device,
        busScan.addr.function, busScan.vendorId,


    return 0;

This example compiles to a 2962-byte image, and uses only about 1500 lines of library code. This is great for educational purposes, because it is practical to understand the purpose of every byte in that compiled image– and when this example is running, that’s the only code running on your computer.

Another example included with the source is a simple pre-emptive thread scheduler implemented in 152 lines of C. Metalkit itself doesn’t know anything about threads or multitasking, but it’s possible to use Metalkit’s interrupt trampoline as a thread context switch. This example creates two busy-looping threads. Each thread prints its name, and the “Task 2” thread also increments a counter. The example switches threads round-robin style on every timer interrupt. Here’s the tiny example running in Bochs:

If you want to play with Metalkit, all you need is an x86-compatible PC and a copy of the GNU toolchain (GCC and Binutils). Source code is now at https://github.com/scanlime/metalkit.
Also, if you’re interested in OS development or just hacking on the bare metal, the OSDev.org Wiki is an invaluable resource.