(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)
It’s probably fairly obvious how dictating text into, say, a mail client works, and even how voice commands to invoke menu items and keystrokes fit together. Let’s look at how voice control of the pointer works in Dragon. There are two different modes of operation, although they can be mingled, plus commands for clicking buttons, and pressing and releasing which enable dragging.
The first is the Mouse Grid: start by dividing the screen into a 3x3 grid, let the user select a grid square then centre the pointer in that square and subdivide into another 3x3 grid. Repeat until the user has moved the pointer to where they need it. You do this by saying things like “Mouse Grid / 1 / 3 / 7 / Close Mouse Grid”. (If you have multiple monitors, the first digit you say selects the monitor.)
The other is relative movement, by saying things like “Move Mouse Up And Left 3 Centimetres”. (You can also use inches, and “points”. None of them is a particularly obvious unit when staring at the screen; I tend to use centimetres.) If you aren’t sure about the distance you can go a bit more interactive, with “Move Mouse Up And Left”, watching it move, maybe saying “faster” and “slower” to control the speed, and finally “stop” when you’ve reached your destination.
My rule of thumb is to use relative for short or repeated movements (because going back and forth by a fixed amount is easy once you know what that distance is), and Mouse Grid for everything else.
Okay, fine. It’s a little cumbersome, but it works. What can we find to make it worse, so you can not do that and generally stop bugging me?
Remember Fitts’ Law: wider, closer things are faster to acquire. Closer doesn’t really apply here (except that longer distances are harder to guess, so relative movement is more likely to miss). Wider gives a margin for error, which applies with both relative and Mouse Grid movements, suggesting Fitts’ Law will apply here; although I haven’t actually checked for research confirming that, anecdotally it feels right based on my experience.
So how can you confound Fitts’ Law, beyond making your buttons tiny? By masking how tiny your buttons are. Here are two links. Which is bigger?
The left one is div
> a
with padding on the div
. The right one is div
> a
with padding on the a
. They look the same. They aren’t. Again, with an outline on the anchors:
(Apologies for the shoddy CSS; I didn’t have the patience to think it through any better.)
This affects everyone, but mouse users are close to where they need to be and can just slide around to find the hitzone. This kind of “looking with the mouse”, looking for the cursor to change or similar isn’t practical for voice users; speculatively moving a bit and trying again ("Move Mouse Right / Stop / Mouse Click") is much more expensive. Trying again with Mouse Grid is even worse.
Simple rule: hitzones should fill their visible area.
The formula at the top is for the Accot-Zhai steering law, a path-following generalisation of Fitts’ Law which should model using the Mouse Grid well.