Surely I’m not the only one?

Wait a minute, Doc. Are you telling me that you built a time machine.. out of a Saratoga?

The way I see it, if you’re gonna build a time machine into a plane, why not do it with some style?

Uh, does it run on regular unleaded gasoline?

Unfortunately no, it requires something with a little more kick- plutonium.

Random Update

Well, it’s been a while since I’ve updated. Nothing on its own recently has inspired me much to write, but I have some smallish things to mention.

I have a new laptop on the way! My 700 MHz Pentium III with 192MB of RAM was just getting too clunky for day-to-day use. I recently ordered a Thinkpad T43 with 2GHz CPU, 1GB RAM, and SXGA+ 14.1″ display. It shipped Monday, and I can’t wait to start using a Real Computer.

At work, things were stagnating a bit for the past few days. I have three fairly large changes all blocking on code reviews, and no really solid projects in progress. Well, today that changed- I got some new hardware, and began on a really exciting Top Sekret project. If it goes well, you’ll find out eventually.

Life? Not bad at all… I had a really enjoyable three-day weekend. David, Jen, and Kendra are visiting during their spring break, in less than a month. Plans for that week-long adventure are starting to gel, and I can’t wait.

[Update: The Top Sekret project in this post would eventually become VMware Fusion 1.0. I was working on USB at the time, and I did the original port of VMware’s USB stack to Mac OS.]

CIA Updates, Cappucino in The City

Database v7

Well, it’s been a busy past couple days, but I have some great news to report with regards to CIA. The new database backend I’ve been hacking at for the past couple weeks is now in place, and the performance is great.

So, to back up a bit… CIA uses MySQL. It has for almost 2 years now. There are many things CIA does that SQL works great for: storing the smaller bits of metadata on each project/author, tracking the association graphs, and quickly pulling up overview statistics for the site’s front page. SQL does not, however, work well for storing the actual commit messages. I’ll save you the boring details of why, but suffice it to say that this has been bogging CIA down in nearly every way possible. The stats_messages table was eating about 2.5 GB of disk space, the indices were causing mysqld to use most of my RAM, it was using unacceptable amounts of my server’s limited I/O bandwidth, and there was no efficient way to clean out old messages. CIA needed a new storage option.

The new v7 database schema CIA is on now has completely abandoned the stats_messages table. I developed a custom file format for the messages. Each stats target has a file on disk with two ring buffers. A fixed size (currently 4 kB) index buffer provides IDs for recent messages. These IDs are actually offsets into a second larger ring buffer. This second buffer starts out empty, but can grow up to a fixed maximum, currently 256 kB. At the seek offset corresponding to each valid message ID, the content buffer contains a frame marker that lets it detect invalid or expired IDs. The messages themselves, rather than being stored as normal XML, are encoded as a binary representation of SAX events. This makes them quick to load, and lets the format maintain commonly used strings in a dictionary. This provides about the same compression ratio as gzip’ed XML, but it doesn’t have the backup problems that an already-compressed database would present.

I spent Saturday night (in a hack session with Christian and Paul) finalizing the database backend itself and writing a converter tool. Sunday, the converter ran.. and it ran.. and it eventually finished Monday morning after converting over 1.5 million commit messages! After doing a dump/load to shrink the InnoDB tablespace, I had successfully replaced a 2.5 GB impossibly slow SQL table with a zippy collection of small ringbuffer files, totalling under 500 MB. CPU usage with the new backend is about the same, but I/O load is much lower.

Google AdSense

If you look way at the bottom-left of any CIA page, you’ll now see a familiar block of text advertisements. Consider this a trial run. CIA’s web layout is currently such that there really isn’t a great place for advertisements, so they’re really kind of in the middle of nowhere at the moment.

Yes, this marks CIA’s descent into the depths of capitalistic hell I’m sure. The reality is that CIA costs a lot to run. I’ve been paying $40/month for its current hosting. We’ve had a lot of great donations which I am really greatful for. Now that I’m out of college and have a real job, I can pay for the server without begging for $10 here and $5 there.

But yet, the ads are an experiment. CIA receives a huge amount of web traffic. Granted most of this is generated by RSS aggregators and web spiders, but even the comparatively small human-generated slice of the pie is pretty big. If it’s possible to support CIA’s hosting just by adding some unobtrusive text ads, I think this would be beneficial for everybody involved.

Ideally, once I have time to tweak CIA’s layout, there will be a better place for the ads. If they do well enough, I might even be able to get a real dedicated server rather than this Linode box. At the rate CIA is growing, I’m going to need one sooner or later.

I want to emphasize that I will only be placing text ads on the site, and only where they won’t interfere with the display of the site’s real content. Site usability is still my #1 concern, but I need to plan for a future that includes steadily rising server requirements.

Future Work

So, what’s next on the plate for CIA?

Well, database v7 is actually just a small piece of a larger database overhaul I have planned for CIA. I’d like to eventually get rid of MySQL. As nice as MySQL is, I’m really not benefitting from the use of a client/server database. My utopian database v8 format would look something like this:

  • Messages themselves stored as an indexed ringbuffer of SAX events, same as v7
  • Smaller metadata items stored either in SQLite or a very simple special-purpose format
  • Larger metadata items (like images) stored in individual files
  • Counters replaced with fidtool. Fast Interval Databases (FIDs) are another backend format I’ve been working on. They store event timestamps, and efficiently perform interval queries against them. This would let me finally replace the “today”, “yesterday”, “this week”, and such with sliding intervals. For example, with fidtool CIA could keep a running total of commits received in the last 24 hours. I also have some cool plans for activity graphs with google-maps-like AJAX scrolling and zooming.
  • Catalog data cached in SQLite. This should eliminate the uber-queries that are currently necessary to index all the children of a particular stats target. This should finally let me re-enable the project and author index pages.

Well, except for a few new capabilities database v8 will offer, it’s mostly optimization work. Since database v7 seems to have the server running smoothly again, there are a lot of other projects I’d like to poke at:

  • A web frontend for maintaining rulesets and requesting capabilities (finally!)
  • Replacing metadata keys, which exist only for historical reasons really, with a traditional username/password login
  • Further splitting the monolithic server process. I’d like to eventually be able to run the web frontend and XML-RPC server as separate processes, and even to run them directly from apache using mod_python. This should improve server response times, and potentially make maintenance easier.
  • Web site layout tweaks. CIA has grown up a lot since the current layout was designed, it would be nice to revisit it to improve usability. Particularly, I’d like to make it scale better to large numbers of projects, authors, and stored messages
  • I’ve toyed with the idea of replacing CIA’s little documentation system with an external wiki. As it is, the doc browser is really a lot like a read-only wiki in the way it formats documents. With a real wiki, users could more easily contribute their own client scripts, tips and tricks, suggestions, complaints, and documentation.
  • On a similar note, would it be possible to ditch the majority of CIA’s web frontend and somehow embed it into a wiki or some other content framework?
  • Web 2.0 tags, ‘cuz they’re trendy. To really make CIA web-2.0-compliant, I need a page that lists all projects where font size is proportional to activity.

I’m sure I’ve forgotten a few things.. I’ve had a lot of crazy ideas swimming around in my head recently. If I’m not careful, a few of them might get implemented.

Weekend

Thursday night, David flew into town for a job interview with VMware. We don’t know yet whether he got the job, but we’re quite hopeful. Either way, we managed to finish off his day-long interview with the traditional video games and dinner. More video games at my apartment later that night, then Saturday morning we wandered San Francisco a bit with Scott and Carrie. We tried a couple coffee shops in North Beach, and I ended up with some muchly yummy cappucino and biscotti.

His flight back home from SJC was at 5:00 Saturday, so we didn’t have a huge amount of time to spend here. Still, it was a lot more exploration than I got to do when I flew in for my interview earlier this year. It was goodbye for only a week or so. Wednesday I’m coming back to Colorado for thanksgiving, and I plan on spending the weekend in Boulder. Yay!

Fighting entropy

I’m still continuing the gradual process of acquiring furniture and dispersing any large piles of junk from my living room. I finally got to pick up my coffee table yesterday, which I ordered about two weeks ago. I also found a cabinet to hold my server and A/V equipment. It’s garage-grade furniture, but still fairly nice. It’s not something I mind taking a hole saw to so I can run cables 😉

So, I obviously still need a TV stand. I’m waiting until I’m a little less disgusted with their price and their suboptimal dimensions. Don’t let the pictures fool you too much, there’s still a good sized junk pile behind the camera.

Let the pixels flow

My posts have been pretty light on the photos recently- this is mostly just because I haven’t been taking a whole lot of pictures, but even when I had plenty of pictures they were safely quarantined behind gallery rather than flowing freely over my blog and all that aggregate it.

I admit, this photo has been slighly retouched. It was necessary to carefully adjust the gamma and color levels on this image to accurately capture reality, as my poor camera is just not capable of registering the sheer depth of this house’s color. I would not be surprised in the least if this house inspired the entire visual style of Edward Scissorhands.

I crossed paths with an older man just after taking this picture. He mentioned to me that the couple responsible for building this lovely little monstrosity died within just a few months of each other, and the house is on sale for a mere $1.3M or so. Of course, nobody’s buying it because it sits just north of the intersection of Sunnyvale Ave. and El Camino Real- not a particularly quiet road.

In other news, I’m at the Intel Developer Forum today, yesterday, and tomorrow. VMware had some extra tickets, and apparently my job isn’t time-critical enough that anyone thought twice about sending me. I apologize for taking almost no pictures of this event. There’s been a lot of cool hardware and a lot of slides with timing diagrams and artists’ conceptions of CPU architectures. Compared to SIGGRAPH, it’s quite visually dull. My only photo to report at the moment is of the expo floor, during their geekyness-contest that mostly involved a time trial at assembling and confuguring various computers using Intel ™ Technologies ™.

The hilight of my experience yesterday was the pair of sessions on Wireless USB. I’ve heard mention of the WUSB standard for a while now, but this was the first time I’ve heard details on the architecture. It’s great seeing them preserve nearly all software compatibility with USB, while changing some fundamental aspects of the protocol to keep radio link utilization high. Best of all, there was a whole section on WUSB on the expo floor featuring a lot of very preliminary but funcional hardware. I can’t wait for my $50 wireless EZUSB development kit in a few years.

I spent most of my time today in virtualization sessions. For those of you who haven’t been following their latest hype, Intel’s next chipset includes an extension called VT-x which basically handles virtualization of the CPU itself in hardware. This means that instead of doing lots of very complicated dynamic binary translation, like VMware does traditionally, the hardware is engineered to place the virtual machine monitor software in a completely new privilege level and fairly quickly swap out the entire CPU state when switching VMs.

It’s neat stuff, no doubt, but it’s only a small piece of the pie. It’s just a little annoying to see nearly all of Intel’s demo machines running VMware, but very little mention of the nontrivial tasks the VMM still has to perform using software like ours. Even with hardware support for CPU virtualization, the VMM needs to virtualize memory and some if not all I/O devices. Intel has plans for virtualization beyond just the CPU itself, but it will be a long long time before that comes close to the level of support VMware already gives in software.

Ah well. I’m curious what the virtual machine monitor folks at VMware have been doing with all this. I’d like to say I have a lot of inside information that I can’t share with you, but I really don’t know much at this point that didn’t become public at IDF today. I’ve been over in I/O-emulation-land where little of this is going to matter for at least a couple years.

I’ll procrastinate after I pick your browser up off the floor

I’ve been making slow progress on packing today- got all my books boxed up, along with many of my less fragile electro-widgets and such. This type of behaviour leads to procrastination, naturally.

I’ve been running the Deer Park Alpha 2 release of Firefox for a couple days. It does seem to be faster at DHTML, though I don’t have any of the really heavy-duty Javascript I wrote for Destiny M&M handy for benchmarking purposes. The coolest features destined to end up in Firefox 1.1, from my point of view, are SVG support and the “canvas” element.

Canvas is a very misunderstood HTML extension. It’s a new element that Apple invented mostly to make it easier to implement Dashboard. That part of the story is a little silly, and results in a lot of SVG advocacy and a lot of potential users suggesting to Apple places where they might shove their nonstandard hacks.

Well, it turns out that Canvas is indeed a standard- or at least a standard draft. Furthermore, it’s been implemented by the Gecko team, and happily runs in Deer Park. If you read the API, you notice that Canvas and SVG are really solutions to two completely different problems. SVG is a very complicated and very featureful scene graph built on XML, whereas Canvas looks more like a minimal vector drawing and raster compositing library for JavaScript. Canvas uses a simple immediate-mode interface for rendering to a bitmap, which makes it ideal for games or special effects, or for client-side image manipulation applications.

Canvas is so cool I had to abuse it. A while back I tried to render the Peter de Jong map in Javascript, basically making a very slow and crippled but very portable version of Fyre. Anything scene-graph-like, such as your usual DHTML tactics, would be disgustingly slow and memory-intensive. I ended up using Pnglets, a Javascript library that encodes PNG images entirely client-side. This worked, but was only slightly less disgustingly slow.

Anyway, the result of porting this little demo to Canvas was pretty neat. It’s still no speed demon, but it’s very impressive compared to the Pnglets version. It’s fast enough to be somewhat interactive, and it has at least basic compositing of the variety that Fyre had before we discovered histogram rendering. If you have Deer Park, Firefox 1.1, or a recent version of Safari you should be able to run the demo yourself.