Recent diary entries

Accessibility Advent: show jump links on focus

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

Lots of sites provide a “jump to content” link designed for screen readers and the like; there often may be other jump links, to navigation, user tools and so on. Most people will then hide the links visually, typically by positioning them off the viewport, say using { position: absolute; left: -9999px; }.

A couple of sites go further, and show the links again when they are focused. Here that is on Mother Jones; hit tab a couple of times and watch the top right corner.

Here’s why I like it: although you can see the focused anchor target in the status bar in most browsers, it may not be visible depending on configuration, and more importantly you can style it to fit in with your site. Some people won’t look at the status bar at all, and most people are going to be more attentive to your content – within the viewport – than to the browser’s chrome.

The way Mother Jones uses can be implemented in CSS, but we can go further with a tiny bit of JavaScript and show the entire set of jump links when any one is focused, using a single line of jQuery or slightly more pure JavaScript. Then you can do more sophisticated styling, including pushing the entire site content down, which sounds like it goes against yesterday’s advice but which I’d be okay with because tabbing is a keyboard operation, so you’re not going to confuse someone in the middle of a mouse operation by doing this. It’s also obvious, unmissable – and that’s helpful.

Why does this matter? I tend to use Glee Box to navigate links by voice, but sometimes it doesn’t work, or there are too many links with the same anchor text, or I want to read a long article opening links as I go. Making it easy to skip focus past your header and navigation is just as useful for me as for the other potential audiences of those jump links.

Dragon Naturally Speaking on Windows allows you to speak voice links natively, so you don’t require a plug-in like Glee Box; however the same issues can arise, particularly many links on a page (Dragon will only respond to a certain number, to avoid taking too long and getting in the user’s way) and the desire to open links while reading long articles.

Accessibility Advent: don't jiggle the layout

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

Another problem caused principally by the relative difficulties of driving the pointer by voice compared to with a mouse or tablet: please don’t let elements of the layout move.

It’s surprising how many reasons people think they have for moving elements around the page. Advertising banner stripes that push down the entire page, but are not of the same height from page to page; emphasizing buttons on rollover in a way that changes the size and so forces the entire navigation bar to layout differently. Things that slide out on rollover, and then require you to move away before they disappear, to get to something alongside the original trigger area.

These are really two different problems. The first is when, once a page is loaded and rendered, moving the pointer around on that page will move things around. This may be due to poor implementation (if you apply a bold style to text buttons on rollover, for instance, you need to do something to stop them from changing size), but increasingly people are deliberately opening menus and utility panels on rollover. If you have a series of them, you need to be very careful that panels you open don’t conceal the hit zones to open other panels – or, maybe, open them only on click. Even worse, there are sites that slide down a notification bar – “you have received the using this site for 10 minutes badge!” – right over the top of the navigation bar.

In both cases, a mouse user can backtrack the pointer quite easily to let the pop-up close; if a Dragon user is in the middle of using MouseGrid, at best they can use relative mouse movement to duck the pointer out of the way and back again, without closing the MouseGrid. However you also disrupted the process of using MouseGrid, which is to iteratively move the pointer closer to the thing you want to click on – if the visible context you are using to guide this process changes underneath you, it’s easy to lose track of what’s going on.

The other problem is when moving through a series of pages on a single site, having set the mouse pointer at the right place to advance by one page (such as with a piece of longform journalism split across several pages). Everything is fine here unless the vertical position of whatever we have to click on changes page by page. This can either happen if story navigation is only beneath the text (since the text is unlikely to fill exactly the same vertical height on each page), or – as I’ve seen a lot recently – if there is unrelated content above the story navigation somewhere, such as a large image banner for other content on the site, and that is not a standard height. (Say what you like about Internet advertising, it does at least have standard sizes.) There is also the possibility of elements moving horizontally, although this is rarer – the only example I can think of is where I want to hit “next” in the pagination controls, but they are centered or left aligned and do not contain every page (so they may show 1-2-3-4-5-Next on the first page, but 2-3-4-5-6-Next on the second page; with proportional fonts, the “next” link will move around).

It is, however, generally okay to move things around when I do something deliberate, either by clicking or by a keystroke. There’s a clever trick I’ve seen once or twice recently which does just this specifically for users of some assistive technologies (and possibly also keyboard), which I’ll write about tomorrow.

Accessibility Advent: think carefully before hiding possibilities

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

I wanted to write today about how you should generally mark draggable targets. I hinted in a previous article that voice users can perform drag actions; it sounds something like “Mouse Grid / 1 / 3 / 7 / Press Mouse / Close Mouse Grid / Mouse Grid / 9 / 4 / 3 / Release Mouse / Close Mouse Grid”, so it’s a little lengthy but not terribly cumbersome providing your drag targets are fairly big.

In my notes I’d written “Gmail does this well”. I looked today. No it does not.

To add attachments in Gmail you either drag them into the compose area, which makes logical sense but isn’t particularly discoverable, or in the old compose interface you drag them to the line that contains the “Attach a file” option, which is kind of semi-guessable if you know that Google like implementing draggable targets for files (you can drag files into Google Image Search as well; again, there’s no hint on the page that this is possible). As previously discussed, as a voice user I’m unlikely to speculatively move the mouse around in case you happen to have implemented a useful feature. (However I don’t strictly need to, as there’s a traditional button interface to adding attachments in GMail; also, Dragon Naturally Speaking on Windows is supposed to have better built-in support for Gmail than Dragon Dictate for Mac.)

The more general issue here is one of discoverability, which is a long-standing interface design problem, sometimes thought of as a tradeoff between the interface being too busy and helping the user learn how to use it. There’s an interesting article discussing clutter versus discoverability by John D. Cook (which I’d recommend checking out even if you don’t care about this because it’s got a picture of a nice sailing vessel at the top).

My traditional position on this has been that simple features should be highly discoverable, but complex or expert features can be hidden and require reading documentation to find them. By this guideline, Gmail is actually doing the right thing — attaching files to an email is a rarely-used feature, and in any case there’s an entirely discoverable interface option to do it. (There isn’t a keyboard shortcut for it, however, which is a shame; it means that expert mouse users get better support but expert keyboard users do not.)

However if I had started using Gmail only by voice, there’s a good chance I would never have discovered that I could drag and drop attachments, which even with the MouseGrid dance is often faster than having to navigate the file system through the “open” dialog (which tends to involve a lot of tabbing backwards and forwards between different controls to get where you want to go).

To take full advantage of all web technologies these days requires considerable development time, so – particularly for startups – there’s a good chance you won’t have got to all the details yet. That means I’ll probably assume you haven’t done sophisticated things such as dragging files into your web application, unless you make it obvious. On the other hand, for rare operations there’s a decent argument for not shouting about it.

Simple advice: there isn’t any here. You’re going to have to think, and think carefully. (It’s an interface issue, there’s really no getting away without thinking.)

Accessibility Advent: don't punish errors

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

All users make errors; depending on tiredness, concentration, complexity of task, and so on people can make input errors using keyboard, mouse, voice or whatever. I find that my input error rate is significantly higher by voice, sometimes due to ways the voice system doesn’t quite match up with the way an application works, sometimes due to plain recognition errors where Dragon thinks I said something else, and sometimes due to user error (I think the most common one for me is to dictate text while in Spelling Mode, which means Dragon spins for a while trying to figure out what letters I said and then blasts line noise into whatever has focus).

I’m okay with making errors; that’s just going to happen. But please, please, please don’t punish me for those errors. Here’s one way you should avoid: one form with multiple different things I have to get right.

I came across this while registering for the Santander online banking system, which manages to break almost every rule I could consider writing to make voice users’ lives easier. One, large, scrolling form rather than several sequential requires concentration for longer; I can feel myself getting tired with having to navigate around a large form trying to get all the inputs right. What’s worse than that is that a single form tends to get validated all at once, and either passes or fails. So something like this is bad:

In an earlier step I’d had to input a temporary ID code and pass phrase, which was bad enough (because they were all numbers, and triggered the leading space problem and the US phone number formatting problem as previously discussed). To then have to create two new ones, which of course had requirements such as length and use of punctuation characters, and repeat them, was a huge effort. (Even creating them in a temporary document separate from the web browser and copying them in, which is what I did eventually, is a pain, because switching back and forth to select two different pieces of information is time consuming and runs the risk that you don’t actually copy the second one, pasting the first into all boxes.)

And they’d made it even worse, of course: if you got anything wrong, even the correct values weren’t carried onto the form again. (This was also true of entering the temporary ID and pass phrase, if I remember correctly.) So I’d get the ID code right, twice, but mess up getting the confirmation pass phrase, and then have to do the whole lot again. I got it wrong I think three times in total. If this had been split into two separate steps, setting the ID code and pass phrase on different screens, it would have been much easier.

By the time I’d finished registering I was sufficiently frustrated I just transferred most of my money out of Santander into an account with a bank that has a somewhat more usable online banking interface. If it’d been for anything where I didn’t already have a vested interest, I’d have given up.

So please don’t punish errors; any error I make I should be able to correct independently of any other information I give you. (Another example: if you build one form for shipping information and payment, don’t clear the shipping address if I put in a credit card number that doesn’t validate, and vice versa.)

Also, consider very carefully whether you actually need confirmation of anything the user inputs. For passwords, you should never have to (password reset to the email address on file will work); I’d argue that for banking registration codes you don’t have to either, because I can always get you to send temporary ones out by post. Email addresses, however, probably are important to confirm; anecdotally I saw a fair number of typos in Artfinder signups while I was there (as well as a fair number of deliberate bad emails, but you’re not trying to avoid them by having them input twice).

Accessibility Advent: preserve expected paging behavior

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

I want to introduce you to two of my favorite keys: page up, and page down. Press one and you advance one window-full in your web browser, press the other to go back and reread.

This may seem obvious, even superior of me to point it out. But the number of sites that mess this up is impressive. Here’s one: Pando Daily. I like what Pando’s doing, but when I read one of their articles here’s what it sounds like: “Next page. Up arrow. Up arrow,” then a long pause before repeating.

Why? I blame it on television.

Television, particularly US broadcast, uses a technique called a lower third to provide more information from the broadcast network, these days often including advertisements for upcoming programs while you’re watching another. It obscures a chunk of what you’re trying to see, with variable benefits (some studies show utility in summarizing the current news story in the lower third, for instance – although increasingly if there’s a news story there, it’s a different one to whatever the presenter is talking about).

Lower thirds have leapt into the web space with – perhaps surprisingly, given the radically different technologies – the same problem. (On the web, they’re mostly being used for navigation, social links and – inevitably – advertising.) If you slap something over a scrollable area, paging that area will effectively skip the content originally underneath your floating layer:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
Lower third

This isn’t a problem when using the scroll wheel on a mouse or graphics tablet, or two finger scrolling on an Apple trackpad, or one finger scrolling on a touch device. But all of those are less efficient than just hitting a single key.

What can we do? Dragon has commands to advance by only half a page, but that’s not a big improvement because I don’t know where the half page boundary is, so I read to the bottom, half page advance – then I have to find my place again. Full-page advance and starting again at the top of the window is easier, and it’s a wash (but with lower cognitive load) if I have to scroll back a bit each time.

The floating layer could be moved out of the way of the content – to the side, say. That’s not always possible, depending on design.

Or – radical thought – web developers could make just the content area scrollable, so the fixed layer doesn’t obscure it:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
Lower third

This results in two scrollable areas (content and page), but Mac OS will now hide the scrollbars if you have a suitable scrolling device attached (or you can hide them always), so it’s less messy than it used to be. Anyone know if Windows 8 has gone in that direction too?

Accessibility Advent: beware focus events

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

You know that thing with web Twitter where you hit “reply” on a Tweet and the compose box slides out? Yeah, I hate that.

This is a Dragon-specific issue. The way that Dragon’s dictation system works is that it keeps a “cached” view of what you’re editing inside Dragon, to avoid hitting the accessibility layer of the operating system all the time (it also enables some editing features that wouldn’t otherwise be possible). As you dictate, it updates its cache and issues commands to the application to update the real copy at the same time. The trouble is, under a range of common situations these two views get out of sync, necessitating a special command to force Dragon to update the cache. Worse, it (and what I assume are some other bugs or problems between Dragon and the various applications you want to use) result in errors in the “real” copy in the application – spurious characters, usually.

Dragon’s response to this, as of this year’s release, is the Express Editor, which you can open anywhere, get Dragon’s full dictation and editing facilities without the inter-application problems, and then transfer the text from the Express Editor back into whatever you were using before – in this case a web browser with Twitter open.

That’s where things go wrong with Twitter replies, because when the browser window lost focus (so the Express Editor could grab it) Twitter helpfully collapsed the reply box to a single line – the same as if you’d clicked off it – and in so doing lost the focus on the input element. When the Express Editor subsequently tries to transfer its text back in, the focus is nowhere useful and you lose the text entirely.

Twitter have clearly designed this feature with care – once you start typing your reply, for instance, the input won’t collapse and lose focus. So this particular issue would be unlikely to occur to anyone who didn’t actually use Dragon.

What could Twitter do to fix this? I think that what’s happening is that they’re doing the collapse/unfocus on the blur event from the text area input. This is easier than binding a click handler to the document and making clicks outside the composition area perform the collapse/unfocus. The only differences I can think of are with switching away from the tab or application, which is what causes my problem, and if a browser plug-in takes focus within the same tab. I can’t think of a situation where a plug-in that grabs focus triggering composition area collapse would be a problem, but I also can’t think of a reason why not doing it would be a problem either.

The advice here? Blur events happen for reasons you may have forgotten about. Think carefully about how your site should behave when that happens.

Accessibility Advent: give input elements sensible name attributes

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

Autocomplete is my friend; things like email addresses are a pain to dictate. I could create a “word” in the Dragon vocabulary for each one I use, but it is vastly easier these days to let my web browser take care of it.

This does require that you play along, however. Use expected names for form inputs wherever possible. (They are scoped by the form itself, so collisions for the fields most likely to benefit from autocompletion won’t be a problem in most situations.) While we’re here, quit asking me to confirm my email address – because that second field usually doesn’t autocomplete.

What do you mean your inputs don’t have name attributes? Facepalm.

There’s a service called Launchrock that lets you take prelaunch interest by collecting emails. It’s simple. It’s cute. It doesn’t work.

Here’s what a keyboard user experiences: an email box plus some shiny information about whatever forthcoming service they are signing up for. Tap tap tap, email entered. Next field: choose a password. (It seems to support other fields, such as name, as well.)

Here’s what I get: the same email box and information. Great; email boxes I can just get the first character into, then up pops the autocomplete list and I select the one I want, which will usually be at the top.

“Press the key Juliet.”

Odd. The “j” is there, but no drop-down. Okay, maybe it triggers on two characters…

“Press the key Alpha.”

Still nothing. At this point I tend to give up. Seriously. Your site hasn’t launched, and when you do it will probably have Twitter or Facebook sign in, so there’s rarely much advantage in getting in early – and someone is bound to tell me about it again. If you’re really lucky, I’ll remember I’d heard of you before. And… I’m gone.

Launchrock’s developers are being clever, progressively revealing fields on a single page using JavaScript. Presumably at some point this gets submitted via Ajax. Apparently they forgot to build the non-JavaScript version first, or even at all; if they had, the inputs would have names, autocomplete would work and I’d be happy.

Please don’t do this. Build it right. (Apparently you can customize Launchrock’s HTML. Do so.)

Accessibility Advent: support the escape key

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

I wrote yesterday about how moving the pointer by voice works; it’s a bit cumbersome, so forcing voice users to go that route isn’t great. Here’s another example.

Modal windows are an old desktop interface method requiring a specific interaction (such as confirming an operation) before continuing with normal application use. More generally, user interface modes are different states of the interface where it will respond differently to the same input (mouse clicks, keystrokes or whatever). Web apps are increasingly using modal techniques; done poorly they are a pain for voice users.

Two examples. The first: Twitter. If you click on a Twitter username, you get a modal sheet about that user. Clicking outside, or on the close button, dismisses it. So does pressing escape. Twitter also has a “new Tweet” button top right that is almost the same – except that escape does nothing. By voice, pressing a key takes about a second – compare to the problems discussed yesterday with controlling the pointer.

Second example: Google Reader. This has pretty good keyboard shortcuts, so I spend most of my time in it navigating by macros, saying things like “next news story”, “view original” and so on. The trouble with voice recognition is that it’s imperfect, so occasionally it will hear one thing and recognize it as something else, which ends up as dictation, i.e. basically just random keystrokes. (I could shift out into something called Command Mode, which won’t generate normal dictation; however I usually only think about that once something has gone wrong.)

Three keystrokes in particular do big things in Reader: “u” and “f” change the screen layout (and can be repeated to switch back), and “e” starts the “email this story” feature, from which there is no return (unless you tab to the cancel button, or move the pointer to click on it). The escape key apparently isn’t used while this feature is running, and would make things much easier. (Also, the email interface is inline, rather than floating over the rest of Reader as Twitter does things, making it less obvious that the feature is modal. That only confused me the first time, however.)

Simple rule: for interface modes, provide keyboard shortcuts.

Simpler rule: support the escape key.

Note this isn’t just for web apps; sites like Mother Jones have taken to using a floating layer to ask for donations. While it isn’t strictly modal (the site behind scrolls and links still work), since it obscures part of the content it’s effectively the same issue. Escape should close these kinds of things as well (or they should disappear automatically after a while – as advertising often does).

Accessibility Advent: hitzones should fill their visible area

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

T = a + b ∫C ds/W(S)

It’s probably fairly obvious how dictating text into, say, a mail client works, and even how voice commands to invoke menu items and keystrokes fit together. Let’s look at how voice control of the pointer works in Dragon. There are two different modes of operation, although they can be mingled, plus commands for clicking buttons, and pressing and releasing which enable dragging.

The first is the Mouse Grid: start by dividing the screen into a 3x3 grid, let the user select a grid square then centre the pointer in that square and subdivide into another 3x3 grid. Repeat until the user has moved the pointer to where they need it. You do this by saying things like “Mouse Grid / 1 / 3 / 7 / Close Mouse Grid”. (If you have multiple monitors, the first digit you say selects the monitor.)

The other is relative movement, by saying things like “Move Mouse Up And Left 3 Centimetres”. (You can also use inches, and “points”. None of them is a particularly obvious unit when staring at the screen; I tend to use centimetres.) If you aren’t sure about the distance you can go a bit more interactive, with “Move Mouse Up And Left”, watching it move, maybe saying “faster” and “slower” to control the speed, and finally “stop” when you’ve reached your destination.

My rule of thumb is to use relative for short or repeated movements (because going back and forth by a fixed amount is easy once you know what that distance is), and Mouse Grid for everything else.

Okay, fine. It’s a little cumbersome, but it works. What can we find to make it worse, so you can not do that and generally stop bugging me?

Remember Fitts’ Law: wider, closer things are faster to acquire. Closer doesn’t really apply here (except that longer distances are harder to guess, so relative movement is more likely to miss). Wider gives a margin for error, which applies with both relative and Mouse Grid movements, suggesting Fitts’ Law will apply here; although I haven’t actually checked for research confirming that, anecdotally it feels right based on my experience.

So how can you confound Fitts’ Law, beyond making your buttons tiny? By masking how tiny your buttons are. Here are two links. Which is bigger?

Click me!
Click me!

The left one is div > a with padding on the div. The right one is div > a with padding on the a. They look the same. They aren’t. Again, with an outline on the anchors:

Click me!
Click me!

(Apologies for the shoddy CSS; I didn’t have the patience to think it through any better.)

This affects everyone, but mouse users are close to where they need to be and can just slide around to find the hitzone. This kind of “looking with the mouse”, looking for the cursor to change or similar isn’t practical for voice users; speculatively moving a bit and trying again ("Move Mouse Right / Stop / Mouse Click") is much more expensive. Trying again with Mouse Grid is even worse.

Simple rule: hitzones should fill their visible area.

The formula at the top is for the Accot-Zhai steering law, a path-following generalisation of Fitts’ Law which should model using the Mouse Grid well.

Accessibility Advent: strip leading spaces

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

There are any number of strange effects of using speech recognition to drive a computer, but the one that affects almost everything I do – and that I forget most often that I have to take account off – is leading spaces. Dragon doesn’t always notice when you’ve moved between input elements, and when it doesn’t it assumes any dictation continues the previous sentence. This results in a leading space in the new input.

You can imagine all sorts of situations where this is a pain, but the one that annoys me most is when I have to put in credit card or banking details. (It can also be a problem with CAPTCHA, although usually less than figuring out what the messed up letters are in the first place.)

I’ll note in passing here that, yes, this is a software problem at my end: it should be possible for Dragon to get this right. Indeed if I remember correctly, Dragon Naturally Speaking on Windows doesn’t have this problem, and for that matter if I used Safari on Mac there are commands that can help navigate text inputs while telling Dragon what’s going on. I dislike Safari sufficiently that the alternative would be to go back to Windows, where Dragon Naturally Speaking provides better integration with Internet Explorer.

The leading space issue doesn’t just crop up with web forms – Glee Box itself suffers from it, for one, and sometimes the 1Password unlock screen gets a rogue space as well. I can say “no space” before a word to prevent the leading space, but I don’t always remember. I’d love it if web forms just stripped leading spaces so this never bit me, but it’s particularly annoying when it interacts with naïve validation.

Leading spaces should not be considered invalid in numeric fields. Most string to number conversion functions will happily skip them, and so should your validator, be it a regular expression or something more esoteric. This is equally true whether your validation runs on the backend or front-end (and, sidebar, it should run on both); if the latter then there’s another problem, which banking sites are particularly prone to run foul of.

Long numbers are hard to type in accurately (they’re easier to dictate, as it happens). One way of making things easier is to split them into shorter sequences – for instance, your credit card probably has four groups of four digits on the front instead of just one sixteen digit string. This is fine.

Similarly, to allow people to verify that they typed a number in correctly, some sites use multiple input boxes, building the single number in code somewhere. I haven’t seen this often with credit cards, but it gets used for software registration numbers a lot, and some banks split up the authentication codes coming in from PIN-based token generators, such as Barclays’ PINsentry.

As a web developer you see a requirement for two boxes each taking four digits, so you set the maximum length (maxlength) of each input to four.

[1234] [5678]

Then I come along with my leading space problem and this happens:

[ 123] [5678]

Actually it’s usually worse than that, because you’re clever so you advance the focus once there are four characters in the first input:

[ 123] [4567]

Then some other clever developer adds client-side validation to the form. One of two things happens, depending on where the focus is left. Most commonly, the focus seems to end up on a help link explaining what went wrong:

[ 123] [] ERROR!

If I’m lucky I can then tab back to the first input and try again. It’s around this point that I want to punch someone. (For reasons I don’t understand, inputting numbers in this stop-start staccato fashion is considerably more stressful than dictating prose and then correcting it.)

So what should happen? With client side validation, killed the leading space when it appears. Without (and this is going to upset some people), set the input maximum length to one more than you’re looking for (five, in this case) – and then deal with this properly in the backend so an input like the following still works:

[12345] [678]

Using client-side validation here is a Very Good Thing, clearly.

But wait, it gets worse. Say you read the above and decided one big input is better (as most people do for credit card numbers, for instance). You allow Allow the slightly larger input maximum length, strip any leading space, and then reuse that code when building an e-banking setup system, where an authentication code gets mailed through the post to confirm the user is who they say they are. You parameterize your widget based on code length. Two years later someone rolls it out for codes the same length as US phone numbers. Dragon helpfully formats my code as a phone number:


So you should probably strip internal hyphens, and make the input maximum length even longer – in fact, I’m beginning to wonder if exact input maximum lengths are ever useful.