Tag Archives: assembly


S/PDIF Digital Audio on a Microcontroller

A few years ago, I implemented an S/PDIF encoder object for the Parallax Propeller. When I first wrote this object, I wrote only a very terse blog post on the subject. I rather like the simplicity and effectiveness of this project, so I thought I’d write a more detailed explanation for anyone who’s curious about the gritty details.

This is a recent video by Nick at Gadget Gangster where he takes the S/PDIF object for a test drive. [via Oldbitcollector]

The source code is open, under an MIT-style license. If you’re a fellow Propeller fan, it’s pretty easy to use this code to give your next sound project a digital output. If not, read on… perhaps you will be inspired to try exploring digital audio on a different microcontroller platform!

Digital Audio Primer

Starting from the very basics… what is S/PDIF, and why would we even want to generate it directly from a microcontroller?

Microcontroller audio projects are getting more and more popular, especially as legions of Arduino hackers build DIY drum machines, noise makers, 8-bit synthesizers, and so on. Many of these bit-bang low-fi audio in software. Some of them use external analog synthesizers, MP3 decoders, or other support ICs. If you wanted high-fidelity audio, though, your options get more limited. Some microcontrollers (like the Propeller) can perform PWM at a high enough frequency to produce reasonable audio quality. But this is still no match for an external DAC, much less a high-quality external DAC. So, if you’re really trying to produce higher quality audio without a lot of extra fuss or expense, it makes some sense to let someone else do the job.

If you have a hi-fi stereo receiver, you already have an external DAC and a good way to communicate with it. Nearly all consumer audio receivers now include digital audio inputs based on the Sony/Phillips Digital Interconnect Format (S/PDIF) standard. This consumer standard is actually a variant of the professional AES3 standard. Electrically, this is a high speed unidirectional serial link with a clock that runs at a high multiple of the audio sample rate. The physical transport can be a low-voltage signal over 75-ohm coax, or it could be optical. Optical interconnects (with TOSLINK connectors) are especially common, and to transmit these signals all you need is an LED.

For every audio sample. this digital signal transmits all of the bits in the sample, as well as some control information. Also, quite importantly in fact, it transmits the timing of these samples. The DAC synchronizes its conversion cycles to the time-of-arrival of each sample that comes over the digital bitstream. So, the analog timing characteristics inherent in this digital signal can also influence the resulting analog signal.

Sound nerds tend to get fabulously stressed out over jitter and wander and so on— names for different kinds of deviations from optimal bit timings. It’s good to keep in mind that, by nature, S/PDIF is a much more real-timey sort of signal than your average serial data link. But unless you’re the sort of hopeless audiophile who spends more on your amplifier power cables than I’d spend on a car, you probably shouldn’t get too bent out of shape over a few nanoseconds here or there.

S/PDIF Data Format

S/PDIF uses a serial signal clocked at 64x the audio sample rate. So, for 48 kHz audio, we need a serial signal at a whopping 3.072 megabits per second! However, the receiver doesn’t just need the bits, it also needs a clock. Since we only have one electrical or optical signal to work with, both the clock and data have to be recoverable from this one signal. S/PDIF does this using Biphase Mark encoding, which is a close relative of Manchester code. Because of this coding, we actually have to transmit on a clock rate which is 2x the bit rate. So, again for 48 kHz audio, we need a transmit clock of 6.144 MHz.

But wait, why 64x the sample rate? Even if we’re transmitting stereo audio at 24 bits per sample, that’s only 48 bits. Where do the other bits go?

Some of them are used by S/PDIF for synchronization purposes, and some are used for a low-frequency signaling channel which can transmit status words at a rate much slower than the audio sample rate, and some are effectively useless to us, reserved for obsolete or infrequently-used standards. The most primitive grouping of bits understood by S/PDIF is a 32-bit subframe which encodes one sample for one audio channel:

So many bits… what does it all mean???

  • The preamble identifies the type of subframe, as we’ll see below. It is the only part of the stream which is not biphase-mark encoded. It is the only place where we’ll see a run of three clock cycles with no bit transitions, so this allows the receiver to uniquely identify the preamble within the received bitstream.
  • Each subframe includes 24 bits of audio data, transmitted LSB-first and biphase-mark encoded. The low 4 bits of this stream may be used for other purposes, depending on which standard you’re reading.
  • The Valid bit indicates that this subframe contains valid sample data, and it is okay to output. In practice this bit isn’t really useful, since with the advent of Dolby compressed audio over S/PDIF, the receiver has a lot more work to do in order to determine if the data is valid uncompressed audio.
  • The User and Control bits are both part of a lower-bandwidth serial stream that we’ll see later.
  • And finally, each subframe has a Parity bit to help detect single-bit errors.

The User and Control bits on each channel collectively form four low-bandwidth serial channels, each running at a rate of one bit per sample. In S/PDIF, the User bit is unused, and the Control bits on each channel transmit a 192-bit Channel Status word. This word is fully transmitted once per block where a block is defined as a group of 192 frames beginning with a Z preamble.

In the professional AES3 protocol, there is a lot of data packed into this status word. But S/PDIF uses it for very little. In fact, only 13 of these bits are used at all, and in practice there isn’t really anything useful in this word. AES3 encodes an exact sample rate here, but in S/PDIF the only indication of sample rate is the clock recovered from the S/PDIF bitstream itself.

Biphase Mark Code

Wow, so far this looks pretty easy. Well, except for the high bit rate, and the picky timing. But what about this biphase mark code?

There are a few different ways to think of biphase mark encoding. If you’re familiar with Frequency Shift Keying (FSK) modulation, it might make sense to think of BMC as a particular form of FSK. A string of ones would be encoded as a square wave at a frequency equal to the original bit rate. A string of zeros would be a square wave at half that frequency. Put another way, you can think of BMC in terms of bit transitions. A zero bit is encoded by a transition followed by a non-transition, whereas a one bit turns into two transitions.

This demystifies BMC a bit… but why do it at all? Well, like any protocol which has to travel over some kind of analog physical media, very low-frequency signals (down to and including DC) can be troublesome. Let’s say we’re using an optical TOSLINK cable to transmit S/PDIF, and we have two theoretical bitstreams. One of them always transmits “one” bits, the other always transmits “zero” bits. The first bitstream means the transmitter’s LED is always on, and the second means the LED is always off. How does the receiver tell these two streams apart?

At first it seems obvious. The “one” stream is brighter than the “zero” stream. But actually, this might not be true. Maybe the first stream has a very dim light or a long cable. Maybe the second stream has a light leak around the receiver. It’s unreliable to rely on any absolute amount of light to discriminate ones from zeros, and in fact it’s not that hard to imagine situations where one system’s zero is brighter than another system’s one.

Similar problems exist in many kinds of analog transmission problems. Radio receivers, for example, need to deal with a very wide range of signal strengths. Unbalanced coaxial cables, such as S/PDIF over copper, can face similar problems. The receiver circuit in each of these cases needs to employ some kind of automatic gain control (AGC). AGC circuits track the average power level of the received signal, and “center” the one/zero discrimination threshold around this value. It’s a simplification, but AGC circuits can also be thought of as high-pass filters, since they subtract the unknown DC bias in the received signal.

Since DC signals are removed by the receiver, we can’t use them to carry any useful data. Those hypothetical all-zero or all-one bitstreams would be a disaster, since the receiver would continuously detect a signal level equal to the average. Any tiny amount of electrical noise would be detected as a one or zero.

This is where BMC helps. We can use a simple SciPy simulation to plot unencoded and encoded bitstreams in the frequency domain:

Now it’s easy to see that BMC is in fact shifting the signal up in the frequency domain. It needs twice the bandwidth now, but the center frequency is now near the bit rate, and we no longer have any signal at DC. Hooray, no more grumpy receiver AGC.

Microcontroller Implementation

Typically if you were generating an S/PDIF signal, it would be sane to use an FPGA or an ASIC. In silicon. But this article is about breaking the mold and doing it in pure software. Why? For fun, and maybe also to lower the barrier to entry on digital audio. There are a few challenges to overcome, though:

  1. Need to have enough CPU left over to generate the audio signal in the first place
  2. Very high bit rate for a software implementation
  3. Strict bit timing, at an unusual frequency
  4. Biphase mark encoding is not parallelizable

My platform of choice for this project was the multi-core Parallax Propeller, since it’s simple and hobbyist-friendly yet it also has features which directly address these challenges. The XMOS XCore, another parallel microcontroller, would also be a fabulous choice. It may also be possible to implement S/PDIF on a sufficiently fast single-core microcontroller. Unfortunately, an 8-bit µC like the AVR used in the popular Arduino board probably wouldn’t be fast enough.

Edit: Actually, perhaps it would be doable on the Arduino after all… you would just need a less common crystal frequency. To generate an S/PDIF signal with a 32 KHz sample rate, for example, you could run the AVR at 16.384 MHz. The encoded bitstream clock would need to be 4.096 MHz (32k * 64 * 2), and you can program the AVR’s SPI master to transmit at up to half the main oscillator frequency. So you could run the AVR at 8.192 MHz or 16.384 MHz. At the latter frequency, you would have 32 instructions for every 8 bits of encoded bitstream data. That should be enough to do the encoding in an ISR and have a little time left over for applications…

A multi-core microcontroller makes challenge (1) a piece of cake. On the Propeller, one of the eight CPU cores can be dedicated to S/PDIF encoding. The other seven are available for application code, sound streaming or synthesis, and for other I/O devices.

Challenges (2) and (3) can be mitigated if we have a little bit of help from hardware. If we were using traditional bit-banging, and toggling I/O pins in code, a very fast processor would be needed. Even if the encoding and output could be done in four instructions per bit, just the S/PDIF encoding would require a little over 24 MIPS of processing power. Annoyingly, the CPU clock would have to be run at a multiple of the audio bit rate. You would have no way to use a separate clock. But if we had some hardware to shift out bits at the right time, the CPU can spend that time doing other tasks. Many microcontrollers have an SPI port that may be able to do the job. The XCore actually has special-purpose shift register hardware just to help with high-speed I/O tasks like this. And the Propeller has something close enough— a “video generator” that can be configured as a latch and shift register. The Propeller’s video generator can be clocked by a PLL that we program to synthesize the audio bitstream clock.

Challenge (4) means we need at least a small amount of code which runs serially for every bit in the audio bitstream. The problem is similar to calculating parity. Every input bit affects all subsequent output bits. In my implementation, I just use the fastest unrolled loop I can to perform the biphase mark encoding in two instructions per bit. Here’s an excerpt from the meatiest part of the BMC implementation:

              ' Load the preamble. The preamble is not biphase encoded,
              ' but it is subject to being inverted if the previous cell
              ' was a 1. This step is omitted for the second half (second
              ' 32 cells) of a subframe.
              ' In biphase encoding, every bit unconditionally begins with
              ' one transition. We can add these transitions too, in the same
              ' operation.
              ' The masks below select all cells in the biphase register that are
              ' output after the bit we're currently encoding. Any time we
              ' XOR the biphase register with the mask, we're creating a
              ' transition on all future bits. The mask starts at the first
              ' odd numbered non-preamble bit.
              xor     biphase, preamble
              ' To actually biphase encode our input data, we'll insert
              ' additional transitions every time there's a 1 bit in our input.
              ' For the first half of the subframe, we're processing 12 bits
              ' of subframe data. (16, minus the 4-bit preamble)
              ' The loop is unrolled, since this is very speed-critical. At
              ' 48 KHz, we have less than three instructions per bit!
              rcr     subframe, #1 wc     ' Extract the next LSB from the subframe
    if_nc     xor     biphase, mask_4     ' Insert a transition only for '1' bits.
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_5
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_6
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_7
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_8
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_9
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_10
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_11
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_12
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_13
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_14
              rcr     subframe, #1 wc
    if_nc     xor     biphase, mask_15
              waitvid palette, biphase    ' Output the first half of this subframe
' <snip>
' S/PDIF preambles. These are ordered LSB-first, ready for loading into
' 'biphase' before encoding the rest of a subframe.
' These are the preamble encodings that occur after a '0' bit. After a '1'
' bit, these preambles are inverted.
' All odd-numbered unused bits must be '1', so we can insert the fixed
' transitions in the same operation.
preamble_b              long    %010101010101010101010101_00010111
preamble_m              long    %010101010101010101010101_01000111
preamble_w              long    %010101010101010101010101_00100111
' For speed, we precalculate all XOR masks.
mask_0                  long    %11111111111111111111111111111110
mask_1                  long    %11111111111111111111111111111000
mask_2                  long    %11111111111111111111111111100000
mask_3                  long    %11111111111111111111111110000000
mask_4                  long    %11111111111111111111111000000000
mask_5                  long    %11111111111111111111100000000000
mask_6                  long    %11111111111111111110000000000000
mask_7                  long    %11111111111111111000000000000000
mask_8                  long    %11111111111111100000000000000000
mask_9                  long    %11111111111110000000000000000000
mask_10                 long    %11111111111000000000000000000000
mask_11                 long    %11111111100000000000000000000000
mask_12                 long    %11111110000000000000000000000000
mask_13                 long    %11111000000000000000000000000000
mask_14                 long    %11100000000000000000000000000000
mask_15                 long    %10000000000000000000000000000000

This code really does most of the work. The waitvid instruction waits until the video generator has buffer space available for another 32-bit word, but the video generator is actually clocking out data continuously, without any gaps. Since the biphase mark encoder’s output for one subframe is 64 bits, we split the subframe into two halves and process them each as above. The first half is special, though, since the preamble is not biphase mark coded.

Edit: I should mention that in most cases it probably makes more sense to use a 4-bit or 8-bit lookup table to do the BMC encoding. This approach seemed to make sense on the Propeller. However, for example, the S/PDIF library by XMOS uses a 4-bit table to do the conversion.

Using the SpdifOut Object

To use this SpdifOut object in your own Propeller project, you’ll need another cog to supply data to the S/PDIF cog. The object can receive sound samples one long at a time from hub memory, or you can set up a FIFO buffer for transferring data in more of a bursty fashion. In fact, transferring samples one-at-a-time is really the same thing as creating a one-entry FIFO buffer.

This is a complete example which plays uncompressed audio from an SD card:

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000
  SPDIF_PIN     = 22
  SD_CARD_PIN   = 0
  sd : "fsrw"
  spdif : "spdifOut"
  BUFFER_SIZE = 128     ' Must be a power of two
  long bufA[BUFFER_SIZE]
  long bufB[BUFFER_SIZE]
PUB main | f, c
  sd.popen(string("audio.wav"), "r")
  spdif.setBuffer(@bufA, BUFFER_SIZE * 2)
    ' Wait until the driver is using bufB, then read bufA
    repeat until spdif.getCount & BUFFER_SIZE
    sd.pread(@bufA, BUFFER_SIZE * 4)
    ' Now the opposite...
    repeat while spdif.getCount & BUFFER_SIZE
    sd.pread(@bufB, BUFFER_SIZE * 4)

Since we read from the SD card in large blocks, this code uses a double-buffering scheme. While we’re reading one block from the SD card, the other block is being played by the spdifOut module. To represent these two buffers as a FIFO for spdifOut, we just place them consecutively in memory. We can tell which buffer spdifOut is currently playing by looking at the low bits of its played-sample count.

A note about the WAV header: Any modern S/PDIF receiver will actually mute the received audio for a fraction of a second, while it detects whether the bitstream is using Dolby Digital compression. Older receivers without this feature would be in danger of damaging the speakers or amplifier if anyone mistakenly sent them a compressed bitstream they couldn’t handle. Assuming your receiver has this feature, there’s nothing to worry about. If you do have a receiver which starts playing the very first sample you get, you’ll need to be much more careful about the initial conditions. For example, you won’t want to start the S/PDIF cog until the buffer has some valid data in it.

If you’re writing an audio synthesizer, instead of producing big blocks of data, you’re probably producing samples one-at-a-time. This is a very simple sawtooth-wave synthesizer written in assembly. It uses a single long as its buffer, just enough to hold one signed 16-bit sample for each of the two stereo channels. A little bit of Spin code controls the synthesizer cog’s frequency in order to play a short riff:

  _clkmode     = xtal1 + pll16x
  _xinfreq     = 5_000_000
  SPDIF_PIN    = 22
  BPM          = 120     ' Tempo
  ATTENUATION  = 4       ' Power of two
  SAMPLE_FP    = $17C6F  ' Sample rate, fixed point ($100000000 / 44100)
  spdif : "spdifOut"
  long buffer
  long countPtr
  long rate
PUB main | songPtr
  count_addr := spdif.getCountAddr
  rate_addr := @rate
  spdif.setBuffer(buffer_addr := @buffer, 1)
  cognew(@synth, 0)
    songPtr := @song
    repeat while rate := WORD[songPtr] * SAMPLE_FP
      songPtr += 2
      waitcnt(cnt + clkfreq*60/BPM)
        org 0
synth   rdlong   t1, count_addr            ' Wait for the sample count to change
        cmp      t1, spdif_count wz
  if_z  jmp      #synth
        mov      spdif_count, t1
        ' This is a sawtooth-wave synthesizer. "rate" determines the
        ' current tone frequency, and the high bits of "accumulator"
        ' are used to generate a signed 16-bit audio sample.
        rdlong   t1, rate_addr             ' Load wave rate
        add      accumulator, t1           ' Update sawtooth wave
        mov      t1, accumulator           ' Chop off low bits...
        shr      t1, #(16 + ATTENUATION)   '   and decrease the output volume some
        sub      t1, midpoint              ' Convert unsigned to signed samples
        and      t1, cFFFF                 ' Truncate to 16-bit signed
        mov      t2, t1
        shl      t1, #16                   ' Copy right channel to left
        or       t2, t1
        wrlong   t2, buffer_addr           ' Write the next sample now!
        jmp      #synth
count_addr    long  0
buffer_addr   long  0
rate_addr     long  0
spdif_count   long  0
accumulator   long  0
midpoint      long  $8000 >> ATTENUATION   ' Offset to sample midpoint
cFFFF         long  $FFFF
t1            res   1
t2            res   1
song          word  440, 330, 392, 294, 330, 392, 330, 392, 0

Future Work

The full source code is available on the Object Exchange or in my Subversion repository. I enjoyed giving one of my favorite microcontrollers a new kind of output device, and I’m looking forward to seeing what others come up with for the Propeller as well as for other microcontrollers.


A Binary Patch for Robot Odyssey

Robot Odyssey is one of the games that I have the fondest childhood memories of. It’s both a high-quality educational game, and a gentle (but very challenging) introduction to digital logic.

There’s a Wikipedia article on the game. There’s also DroidQuest which is a Java-based clone of Robot Odyssey. The DroidQuest site also contains some good info on Robot Odyssey itself, including the only walkthrough I’ve ever seen.

So, I recently got inspired to try playing through Robot Odyssey again. As a kid, I never managed to beat the game. For a long time, it was nearly impossible to run it on a modern machine. It required a 5.25″ disk drive due to the ancient copy protection, it has CGA graphics, assumes you’re using an IBM XT keyboard, and all of the timing is based on the 4.77 MHZ CPU frequency of the original XT.

Thankfully there’s DOSBox, a really high quality emulator that can run old games like this quite faithfully. I started trying to play Robot Odyssey on DOSBox, but there were still two big problems:

  • Copy Protection.Robot Odyssey checks to make sure you’re running from the original 5.25″ disks, which have a “flaky bit” on them. If the flaky bit isn’t detected, the game will still load but your soldering iron doesn’t work!
  • Inconsistent speed.DOSBox is really good at slowing down the CPU, but this isn’t an exact science. Some things that were really really slow on the original XT (like writing to the CGA card) are fairly fast on DOSBox, and other things are comparatively too slow. This means you’re constantly futzing with the speed of DOSBox’s CPU emulation, depending on what level you’re in, how many robots are on the screen, etc.

The Patch:

So, I decided to solve these problems (and a few others) by binary patching the game itself. Since there are a bunch of user-tweakable knobs, I figured the best way to distribute this patch was as a Python script which patches the original binaries. You can grab the script from:

If you’re interested in the technical details of how the patch works, the source is pretty well commented. I won’t bore you with that here. This is a list of the patcher’s features:

  • Disables copy protection.This is necessary to run the game on any modern machine, even assuming you have the original disks.
  • Installs a frame rate limiter.Instead of adjusting the CPU speed, this is a real and fairly accurate frame rate limiter. You can specify a desired frame rate on the command line when applying the patch. By default it runs at 8 FPS, which feels about right based on memory. (I don’t have an IBM XT handy for checking what the speed is supposed to be…)
  • Halts the CPU when idle.When the frame rate limiter is sleeping, it yields the CPU. This will help a lot if you’re running the game in a multitasking environment or a virtual machine.
  • Compatibility with Windows XP’s built-in DOS emulation.You need the “-p” flag for this, and the frame rate limiter won’t be as precise- but the game will be playable just by double-clicking it in Windows!
  • “Fast” mode.This is an optional feature, enabled with the “-f” flag, which speeds up the game when keyboard input is waiting. This makes it feel a lot more responsive, and makes it faster to navigate around the level. You can also hold down any repeating key as a very simple “turbo” button.
  • Keyboard compatibility patch.Normally, Robot Odyssey is totally unplayable on any computer without a numeric keypad, including laptops, due to a bug in its keyboard handler. If you enable the “-k” flag, the patcher will rewrite the game’s keyboard mapper to be fully compatible with AT keyboards. This also removes the need to play with Caps Lock on.


To use the patch, you’ll need:

  • Python
  • NASM, a spiffy assembler
  • Original Robot Odyssey binaries.Make sure the binaries you have aren’t already patched or cracked in any way. I won’t distribute these myself (so don’t ask!) but there are numerous abandonware sites on the web which should have this game. I’m not sure if multiple revisions of this game were produced. This patcher tries to be pretty lenient, but I’ve only tested it with one version. For reference, these are the SHA-1 hashes from my copy of Robot Odyssey:
    756a92e6647a105695ac61e374fd2e9edbe8d935  GAME.EXE
    692a9bb5caca7827eb933cc3e88efef4812b30c5  LAB.EXE
    360e983c090c95c99e39a7ebdb9d6649b537d75f  MENU2.EXE
    a6293df401a3d4b8b516aa6a832b9dd07f782a39  MENU.EXE
    12df28e9c3998714feaa81b99542687fc36f792f  PLAY.EXE
    bb7b45761d84ddbf0a9e561c3c3603c7f65fd36d  SETUP.EXE
    e4a1e59665595ef84fe7ff45474bcb62c382b68d  TUT.EXE
  • Something that can run DOS games with CGA graphics! This could be a PC booted into DOS, a Windows machine, DosBOX…

Before you apply the patch, make backup copies of all your game binaries:

micah@carrot:~/robot$ mkdir original
micah@carrot:~/robot$ cp *.EXE original/
micah@carrot:~/robot$ ls original/

Now apply the patch to each binary. Each section of the game (Robotropolis, the Innovation Lab, and the Tutorials) have their own separate EXE file, each of which has a separate copy of the game engine. You can use the same or different settings for each.

For example, to patch all binaries with default frame rate, and with the keyboard patch enabled:

micah@carrot:~/robot$ python robot_odyssey_patcher.py original/GAME.EXE GAME.EXE -k
Found copy protection. Disabling...
Found blitter loop. Patching...
Found keyboard mapper. Patching...
Saving comment at 0x1a4d0
micah@carrot:~/robot$ python robot_odyssey_patcher.py original/TUT.EXE TUT.EXE -k
Copy protection not found.
Found blitter loop. Patching...
Found keyboard mapper. Patching...
Saving comment at 0x11380
micah@carrot:~/robot$ python robot_odyssey_patcher.py original/LAB.EXE LAB.EXE -k
Found copy protection. Disabling...
Found blitter loop. Patching...
Found keyboard mapper. Patching...
Saving comment at 0x152a0

Now run PLAY.EXE in Windows, DOSBox, etc. You should see the game running at a steady 8 FPS, and the non-numpad arrow keys should work.

Experiment with the options! The “-h” option gives you a full list of the available setitngs. For example, if you want the game to run a bit faster, you might add “-f -r 10“. This will run the game at 10 FPS, and speed it up when there’s keyboard input. Remember to add “-p” if you’re running in the Windows DOS emulation.

This patcher may also work for games other than Robot Odyssey, which were based on the same engine. For example, Gertrude’s Secrets and Rocky’s Boots. You may have to leave off the “-k” option, since these games don’t necessarily use the same keyboard mapping as Robot Odyssey.

Enjoy exploring Robotropolis!


Real mode to protected mode inside the timer ISR

This rocks:


It’s the insane little trampoline I wrote last year in order to make real-mode BIOS calls from my toy protected mode OS, Metalkit. It’s full of all kinds of awesome and scary things.

So today, I just had occasion to try making a BIOS call from inside the timer interrupt, and it works! (Both in a VMware VM and on the physical laptop I tried it on.) Woohoo!

So now I have this spiffy little app that tests VESA BIOS palette manipulation:


Here’s a screenshot of it running in VMware and QEMU. It doesn’t work correctly in QEMU. Not sure why yet- it could just be that their VESA BIOS doesn’t support command 0x09.

Metalkit VBE Palette demo

If you want to try it yourself (on a VM or a physical machine), here’s a 4 kB precompiled binary. You can either use it as a floppy disk image or a GRUB multiboot image.

(Yes, this is a great example of the sort of dorky thing that gets me excited on a regular basis 😉


Introducing Metalkit

Metalkit is another of my random side-projects. It’s a very simple library for writing programs that run on IA32 (x86) machines on the bare metal. It isn’t an operating system, but it does contain some of the low-level pieces you might use to create one.

I created it partly for fun and for the challenge, and partly to use as a framework for low-level hardware testing at work. It is open source, released under an MIT-style license.

Features currently include:

  • A 512-byte bootloader that works either as a floppy disk MBR or a GNU Multiboot image. When you build a program with Metalkit, the same binary image can be used either as a raw floppy disk image or as a “kernel” image in GRUB. This makes it easy to use your programs on virtual machines (VMware, QEMU), emulators (Bochs), or real machines.
  • Basic PCI bus support. You can scan for PCI devices, find out what resources (I/O ports, memory, IRQs) they have, and poke at their configuration registers.
  • VGA text mode.
  • A very tiny zlib-compatible decompressor, the “puff” reference implementation of DEFLATE.
  • Low-level support for the PIT timer.
  • A small, efficient, and powerful interrupt subsystem. ISR trampolines are assembled at runtime, saving space in the binary. Any ISR can execute the equivalent of a longjmp(3) on return, making simple thread context-switching very easy. Includes basic PIC interrupt routing. Includes default fault handlers which dump CPU registers and the stack any time an unhandled fault occurs.

Metalkit could be useful for educational purposes, because programs written with Metalkit are extremely small and self-contained. This example is a complete Metalkit program which lists all devices on the PCI bus:

#include "types.h"
#include "vgatext.h"
#include "pci.h"
#include "intr.h"
    PCIScanState busScan = {};
    VGAText_WriteString("Scanning for PCI devices:\n\n");
    while (PCI_ScanBus(&busScan)) {
        VGAText_Format(" %2x:%2x.%1x  %4x:%4x\n",
        busScan.addr.bus, busScan.addr.device,
        busScan.addr.function, busScan.vendorId,
    return 0;

This example compiles to a 2962-byte image, and uses only about 1500 lines of library code. This is great for educational purposes, because it is practical to understand the purpose of every byte in that compiled image– and when this example is running, that’s the only code running on your computer.

Another example included with the source is a simple pre-emptive thread scheduler implemented in 152 lines of C. Metalkit itself doesn’t know anything about threads or multitasking, but it’s possible to use Metalkit’s interrupt trampoline as a thread context switch. This example creates two busy-looping threads. Each thread prints its name, and the “Task 2” thread also increments a counter. The example switches threads round-robin style on every timer interrupt. Here’s the tiny example running in Bochs:

If you want to play with Metalkit, all you need is an x86-compatible PC and a copy of the GNU toolchain (GCC and Binutils). You also probably want Subversion so you can check out the Metalkit source from http://svn.navi.cx/misc/trunk/metalkit/.
Also, if you’re interested in OS development or just hacking on the bare metal, the OSDev.org Wiki is an invaluable resource.