(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)
I wrote yesterday about how moving the pointer by voice works; it’s a bit cumbersome, so forcing voice users to go that route isn’t great. Here’s another example.
Modal windows are an old desktop interface method requiring a specific interaction (such as confirming an operation) before continuing with normal application use. More generally, user interface modes are different states of the interface where it will respond differently to the same input (mouse clicks, keystrokes or whatever). Web apps are increasingly using modal techniques; done poorly they are a pain for voice users.
Two examples. The first: Twitter. If you click on a Twitter username, you get a modal sheet about that user. Clicking outside, or on the close button, dismisses it. So does pressing escape. Twitter also has a “new Tweet” button top right that is almost the same – except that escape does nothing. By voice, pressing a key takes about a second – compare to the problems discussed yesterday with controlling the pointer.
Second example: Google Reader. This has pretty good keyboard shortcuts, so I spend most of my time in it navigating by macros, saying things like “next news story”, “view original” and so on. The trouble with voice recognition is that it’s imperfect, so occasionally it will hear one thing and recognize it as something else, which ends up as dictation, i.e. basically just random keystrokes. (I could shift out into something called Command Mode, which won’t generate normal dictation; however I usually only think about that once something has gone wrong.)
Three keystrokes in particular do big things in Reader: “u” and “f” change the screen layout (and can be repeated to switch back), and “e” starts the “email this story” feature, from which there is no return (unless you tab to the cancel button, or move the pointer to click on it). The escape key apparently isn’t used while this feature is running, and would make things much easier. (Also, the email interface is inline, rather than floating over the rest of Reader as Twitter does things, making it less obvious that the feature is modal. That only confused me the first time, however.)
Simple rule: for interface modes, provide keyboard shortcuts.
Simpler rule: support the escape key.
Note this isn’t just for web apps; sites like Mother Jones have taken to using a floating layer to ask for donations. While it isn’t strictly modal (the site behind scrolls and links still work), since it obscures part of the content it’s effectively the same issue. Escape should close these kinds of things as well (or they should disappear automatically after a while – as advertising often does).