Voice UI: Exploring the invisible new frontier
The growth of smartphones, Wi-Fi and Bluetooth has changed many human behaviors, and in some cases, created entirely new ones. It’s amazing to think that once ‘chatting’ on your phone meant using your voice, as opposed to text, LOLs and emojis. Or, before smartphones, selfies didn’t really exist, and neither did duck face. Sounds like a lifetime ago, doesn’t it? It was actually only a decade for most people.
We find it hard to remember the days when phones were attached to walls, and couldn’t play Angry Birds, because our daily lives rely on a world of interoperable screen-based digital tools and services. Email, Office 365, Google Docs, social media, ecommerce marketplaces, music streaming, on-demand entertainment, and affordable connected smart-home gadgets. We take the digitalization of everything for granted, as we take our digital lives and devices with us, everywhere.
In this super-connected, hyper-personalized, mega-mobile world, digital voice bridges one of the last major usability gaps in our ubiquitous screen-based digital lives: The previously unmet human need to do two things at once, while possessing only one pair of hands. And only two opposable thumbs. And an expensive gadget you don’t want to drop in the bath or smear with pasta sauce, or worse, lend to your toddler to do the same.
Following a recipe on an iPad is challenging because cooking gets your hands inconveniently messy; dimming the lights when you’re sipping a drink, curled-up watching a movie on the sofa is inconvenient; swapping Spotify playlists while you’re in the bath risks splashing expensive home electronics; turning on the heating without getting out of your nice warm bed is useful; playing a lullaby while you’re feeding a sleepy baby, or checking the latest sports scores while you’re getting the kids ready for school… there are thousands of moments when ‘just saying it’ is clearly more convenient than grabbing your phone, laptop, tablet or reaching for a control panel, a knob on a dial, or a switch on the wall.
Establishing a branded presence in this new digital voice channel is – for brands – as important as smartphone apps and websites, however there’s a problem. The majority of digital voice interactions tend to simple things that barely tap into the potential of digital assistants and smart speakers. They certainly can’t do what a well designed app, or web widget, can do. Yet. So, here’s some important design insights to help create smarter apps for this invisible new frontier…
Design voice UI for communities, households and workplaces, not just individuals
The sheer utility of digital voice (if you can speak, you can use it) means we’ve never had devices quite like this in our homes before. Unlike most computing products (smartphones and PCs) these are one-to-many, shared devices. However, unlike more familiar one-too-many home devices (radios and TVs) they are configured around individual IDs, and personal preferences.
This strange combination means the things our voice assistants can do – the ability for shopping, music, news, information, entertainment – aren’t fixed, they vary from one identical looking device to another, or one identical sounding interface to another, depending on who is doing the asking. This sets-up some unique ‘communal’ design challenges…
- Voice devices get used by the whole household, from young children through to senior citizens, which means the same digital voice app might meet different needs for different users depending on the time of day, or the mix of people interacting around the voice device. It’s important to study the full demographic spread of your target location to really get a feel for the user needs digital voice can solve.
- Where the voice device is placed in the home affects the kind of applications that will perform best. High traffic multi-user spaces like kitchens or living rooms are noisy compared to bedrooms and bathrooms, so the functionality of your voice app may need to change depending on where it’s most likely to be used. Noisy environments increase error rates in voice tasks, so a greater focus on error handling and prevention (through conversation design) might be necessary for a recipe app, but not for a bedroom sleep and relaxation app.
- Users interact with voice devices alone and in groups. Designing for single users represents a different set of user requirements from an app for a group of mixed ages, listening and playing together in a shared space. Typical examples are shopping or paid entertainment apps that are linked to a specific user payment account, but used by the whole family, potentially leading to unwanted purchases (without proper user checks and safeguards).
The combination of very broad demographics, active and passive use, solo users and groups, plus the differences in background noise, acoustics and device placement between different rooms and settings, means it’s imperative to conduct deep design research before designing a voice app. Typically, this means a combination of different fieldwork approaches to get the best results:
- Ethnographic fieldwork:This means observing customers using digital voice at home, in their cars, at work… anywhere there’s a need to design or improve a customer experience. It is a way to record the real-world context of a voice experience, considering physical factors like room acoustics; logical factors like using trigger words and commands; and emotional factors that affect user experiences, because talking to a voice-assistant encompasses many different user modes and moods (e.g. elbow deep in making a soufflé, relaxing with a good book, doing your math homework, rushing to go to work, and so on). Ethnography provides designers with a 360-degree insight into the relationship between the voice user interface (VUI) and whatever else the user is doing at the same time, capturing the voice-assistant user needs of the whole household from younger children to teens, parents and grandparents.
- User diary studies: Diaries are particularly useful because voice-assistants are often performing background tasks (like playing music) so users tend not to pay much attention to the interaction because they’re usually focused on doing something else at the same time. Diary studies help reveal how the ‘on in the background’ device fits into the ebb and flow of household life.
- Voice-assistant apps: Most voice-assistants have companion apps that provide a history of their conversations and responses, which gives a vital insight into how successful – and unsuccessful – user interactions have been. The data from these records can compliment ethnographic observations and diary studies to help design better conversational responses, and refine the linguistic interactions required for executing commands.
Design, test, refine, repeat…
Creating conversational interfaces can be complex, because on a screen, a click is a click regardless of whose hand is on the mouse. However, we use language in much more variable ways. Depending on how old you are, where you are, and who you are talking to, you will use different vocabulary and sentence construction (children, adults, teens, talking to a friend, a waiter, your boss, your elderly grandmother, a police officer, a celebrity, your spouse, your date, talking in a loud room, a quiet bedroom, in the shower etc.)
This means voice systems are constantly challenged by the need to process natural language, disambiguate terms, and identify which words convey categories of data, quantities, operations and intentions. E.g. Is that four candles or fork handles? Do you want to get,buy, shop or order your fork handles, or four candles?
Understanding human vocal complexity is challenging for voice systems, but it also essential to create good voice interfaces. Creating good voice scripts (the interactive responses, prompts, ad hoc comments and use of terminology) is an iterative design process, with scripting teams continually refining interface scripts to make them more effective so they can better adapt to our variable speech patterns, colloquialisms, dialects and so on. This is having an interesting side effect on the design industry, changing the balance of studio skills to include linguistics.
As the voice channel grows in today’s omnichannel environment, this means every GUI and UX team will need to add a linguist to become a complementary VUI and VX (voice experience) team as well.
UX practices are usually weighted towards psychology and anthropology – because they are traditionally working with graphical user interfaces (GUIs) and interactions around physical objects – but the arrival of mainstream digital voice means now UX teams require linguists and writers to facilitate effective VUI design. As the voice channel grows in today’s omnichannel environment, this means every GUI and UX team will need to add a linguist to become a complementary VUI and VX (voice experience) team as well.
● Not every task translates easily into voice. For example, it’s easy to say “set timer for 5 minutes” but editing that timer to change the duration takes a lot more steps and command words than simply canceling the first timer, and setting a second one. Users quickly learn familiar controls like play stop skip and repeat, but the need to remember trigger words, and sequences of commands needs thorough testing to get the balance right.
● Guiding users to give information in structured ways is a useful technique for more complex tasks, for example:
User:“Is the new Ant Man movie playing in Ipswich tomorrow?”
Device:“Checking cinema listings. Did you mean Ant Man and The Wasp?”
Device. “What day would you like to see it?”
Device:“Did you mean Friday, 13th of July?”
Device. “Listing performance times for Friday, 13th July in Ipswich, say restart if you would like to change the date or location of the cinema…”
Around 80% of your voice user interface development time will be writing, not coding. This area is where the voice companion app histories become invaluable, because they show the linguistic forms users find most effective, and the ones that lead to task abandonment. For each interaction, it’s important to prototype a script and see how easily users learn the commands, and complete the tasks you’re designing for, then adapt your voice interface to match.
For each interaction, it’s important to prototype a script and see how easily users learn the commands, and complete the tasks you’re designing for, then adapt your voice interface to match.
Keep on exploring…
Digital voices and AI assistants might very well become technologies we can’t remember living without, like smartphones. Consider the smart-speaker’s stellar sales figures; the sheer number of connected home devices on sale and being launched every day from thermostats to hairbrushes; and the almost limitless hands-free use cases – like checking emails in the bathroom – that are still waiting to be discovered. It’s a whole new world of opportunities for forward-thinking brands.
Whatever comes next, the future for digital voice looks bright, and in many respects we’re just scratching the surface of where it will go. Brands and OEMs are still building-out the functionality to connect the Alexa in your living room with your local pizza delivery customer services, or integrate your shopping list with your local supermarket for frictionless click and collect, but these innovations will come.
In fact, like with today’s streaming music, social media, selfies and emojis, we’ll probably find it hard to remember the days before digital voice was everywhere and we could talk to every brand in our lives, through frictionless voice channel services. Maybe. But, regardless of what the future holds, one thing we’ll never forget is the importance of design research, prototyping and continual improvement when it comes to developing killer apps in new customer channels.