I was debugging a Bluetooth-related problem on a Windows 7 machine recently, and I found another great example of why sometimes you get what you pay for, even when buying something as nominally standardized and homogeneous as a Bluetooth adapter. It so happens that this machine was using one of these $5 adapters with the fake antenna:
That was the first thing I noticed about these adapters. The second was that each and every one of them had the same address, 11:11:11:11:11:11. Someone didn’t feel like paying for a MAC OUI.. or an EEPROM.
Anyway, back to the present… I had been noticing that sometimes when sending a file to my cell phone from a Windows 7 machine, it would time out. It didn’t happen every time, but it seemed to time out more often than not. I dig a little deeper, and it looks like the file transfer application (fsquirt.exe) is never even getting a chance to open a connection to the phone! It is timing out far before that even happens. What’s the deal?
To make sense of this, it helps to understand a bit more about how the Bluetooth stack works in Windows 7. First a big caveat: I do not work at Microsoft, nor do I have any confidential information about the inner workings of Windows. All I have, really, are informed guesses based on my observations and perhaps a little reverse engineering to satisfy my curiosity.
The main actors here are:
- fsquirt.exe — The GUI wizard which walks you through sending a file to a Bluetooth device. It lets you choose a device, then it sends the file via the standard OBEX protocol.
- bthprops.cpl — Technically this is a control panel, but for historical reasons it also implements the public API for Microsoft’s Bluetooth stack. This includes entry points for enumerating attached radios, inquiring for nearby devices, and setting up service discovery records for the local machine. Applications can link to this as a DLL.
- fdbth.dll — Microsoft’s applications, like fsquirt, actually don’t use the public APIs from bthprops.cpl to inquire for nearby Bluetooth devices. Those public APIs are synchronous: they don’t return until the Bluetooth inquiry is totally finished. It is of course much more user-friendly to display Bluetooth devices as they are discovered, and Microsoft’s apps do this by using an undocumented Function Discovery provider.
- bthserv.dll — Some operations in bthprops.cpl go directly to the bthport driver using device ioctls, but there is a system-wide service which maintains a database with the state of all known Bluetooth devices. It’s also responsible for maintaining local SDP records. The APIs in bthprops.cpl communicate with bthserv.dll via local RPC calls.
- bthport.sys — This is the kernel-mode driver which does most of the work. It receives remote name requests, inquiries, and many other operations in the form of ioctls from user-space code.
The important fact I learned while investigating this bug: Inside bthport.sys, there is a global FIFO queue of actions which contact remote devices. If anyone asks to read a device’s name, to perform an inquiry, or to establish an ACL connection with a remote device, it goes in this queue. From Microsoft’s perspective this probably simplifies handling unreliable Bluetooth radios quite a lot, since we never rely on the radio to handle more than one of these operations at a time. Additionally, Inquiry operations are rather heavy-weight, and depending on your link manager settings, you may not be able to do anything else while they’re in progress.
So, back to the hang. When I traced the communications between the USB dongle and the Windows 7 Bluetooth stack, I found this:
The setting: fsquirt has just finished the wizard pane where we’re looking for nearby devices. It notifies fdbth that it no longer needs to keep looking. But in its haste to prettify the UI as much as possible, it has already started asking for human-readable names for all of the Bluetooth devices within earshot. Those requests go into bthport’s queue. The fsquirt app wants to establish an ACL connection to my phone already, but it’s waiting in line behind one of these name requests.
In case you don’t have the Bluetooth spec memorized (I sure don’t), here’s a play-by-play of the USB log:
- OUT 05 04… — The PC asks the Bluetooth dongle to get a human-readable device name for one of the nearby devices.
- IN 0F… — Command is in-progress.
- (Over 20 seconds elapse)
- IN 03 0B 08… — Connection request finished, error code 08 (Timed out)
But wait! We never asked the device to create a connection! We asked it to read the other device’s name! This looks like a pretty grievous firmware bug in this decrepit little dongle. As we see below, Windows is pretty confused too. My “waiting” and “timed out” notes indicate the state of the fsquirt wizard. The timestamps aren’t saved in this log, but fsquirt is only willing to wait 10 seconds or so before giving up. It’s waiting on bthport, and bthport is waiting on any response, even a timeout, from the Bluetooth adapter.
And it keeps waiting, for 70 seconds. Why so long? It should never take that long for a Bluetooth device that’s actually present to respond. Well, actually it isn’t waiting on the remote device. It’s waiting on the radio itself. The radio is programmed to time out after a much shorter interval specified by Windows, in this case about 20 seconds. Windows is relying on the radio to implement this timeout. What we’re seeing here is an even longer failsafe timeout, to catch errors with the local radio. Like the one we just experienced.
When Windows finally gives up on waiting for our buggy radio to respond correctly to the remote name timeout, we can see it issue the next command. This time it’s the Create Connection command that fsquirt had been waiting on. But fsquirt is already long gone, and the user is already frustrated.
If we’re lucky, the user throws the buggy Bluetooth dongle out the window, not their PC.