Voice Assistant Speed Dating: I like you as a friend but…

What happens when a room full of anthropologists, UX researchers, designers and behavioral scientists have lunch with bunch of AI voice assistants? Okay, that sounds like the set-up to a joke, but it actually happened this week at the labs. One of the perks of working in Europe’s largest dedicated UX research facility is we’re never short of interesting tech to play with, and with various digital voice channel projects on the go at Sutherland, it was only a matter of time before our regular team lunch turned into an informal conversational design experiment.

Off the back of our lunch session, we’ve decided to conduct a more structured experiment with Amazon, Google and Apple’s latest offerings in the coming weeks (watch this space for the write-up) but this impromptu lunch date gave us a chance to capture some first impressions and gut reactions to these fast-selling home gadgets. The aim was simple, to get a feel for the conversational design elements that makes a voice assistant feel ‘likeable’. Right now, Amazon has around 70% market share of the smart speaker business, so you’d expect Alexa to be miles ahead in the likeability stakes… but is it?

Conversational styles…

If you’re not familiar with the Amazon Echo, Google Home or Apple Homepod, you might assume that – much like competing makes of smartphone, PC or tablet – the differences between their interfaces are fairly superficial. And you’d be right to a point, but with AI voice assistants, the differences feel more substantial. This is because we unconsciously anthropomorphise talking things, i.e. we don’t really feel that emotional about the interactive qualities of a button, a click or a swipe in a graphical interface, but the responses of voice interfaces feel a lot more like an interpersonal exchange. We asked the devices the same questions and compared the responses, and what emerged was a very interesting (but nuanced) variation in the emotional reactions they provoked.

Alexa felt the most naturalistic, the Google Assistant and Siri felt more robotic in tone. It was a great example of how copywriting is the user interface design of voice interactions. Just as fonts, colours and graphics help define the different emotional reactions we have to corporate websites vs. games vs. social media vs. shopping apps (etc.) so the use of words in a voice interface script defines the emotional reactions we have to the voice assistant. On that score, Alexa’s tone felt like the most ‘friendly’, Google’s tone the most businesslike.

Interestingly though, Alexa’s fluency and tone wasn’t automatically seen as a winning trait by the team. After all, these devices aren’t there to make friends with, they’re designed to perform tasks. On that score, some people preferred the direct response and greater precision of the Google and Apple assistants over Alexa’s smoother chattiness. Others felt Alexa’s responses were more accessible – and memorable – because of its more natural sounding conversational rhythms. It’s a variable user experience depending on your personal preference, in the same way some people like a chatty waiter, others just want to order their lunch as quickly and efficiently as possible.

Amazon Alexa

There’s no clear winner in terms of style because just as we use different modes of speech depending on why, when and where we’re talking with other humans, a more formal or more chatty assistant response can work equally well at different times. There’s no one-size-fits-all when it comes to designing an AI assistant’s conversational style, it’s all about context.

Conversing, verbalizing and utility

It’s important to note this wasn’t a planned test, so we hadn’t enabled all the Alexa skills necessary for it to respond fully to every question, however that fact in itself made Siri and the Google Assistant more likeable in one crucial respect: They feel more useful out of the box.

Discovering ‘skills’ (the Alexa equivalent of an app) is a major friction point for users, so for example: If you ask Google Assistant how to make an espresso martini it responds immediately with a step-by-step process to talk you through it; ask Alexa and she first says you need to enable a recipe skill, which you can do verbally, but then she couldn’t find a recipe for an espresso martini and suggested BBQ Australian spare ribs instead. Which is… er… what?

Okay, so there’s clearly an easier experience with the less chatty but more functional Google approach to voice assistance. However, even so, as the Google Assistant took us through the cocktail making process it didn’t quite work seamlessly, because there was no mention of precise quantities for each different ingredient. Also, the pacing and intonation of the recipe sounded off – a bit like screen reader software – making us replay steps to catch the precise instructions, which didn’t feel as accessible as Alexa, whose step-by-step BBQ rib recipe was easier to grasp on first hearing. But not a cocktail, so… fail.

A Google Home mini

Again this illustrates there’s no one-size approach to designing effective conversational UI. The need to set-up functions adds complexity to the Alexa process, and seems to run counter to digital voice’s promise of instant access functionality. That friction is also compounded by discovering that even when the skill is enabled, the assistant doesn’t necessarily have the answer you want. On the other hand, while the instant response and accuracy of Google and Apple’s smart speakers feels more on-target to meet your instant-hit functional expectations, the somewhat garbled pacing of the web-sourced answer adds its own kind of frustration later in the process. The former problem feels like it’s lacking functionality, the latter one like it’s lacking finesse. Neither is quite the droid you’re looking for… yet.

So what are you looking for in an ideal voice partner?

There’s always a sense in their answers that Google and Siri are verbalizing web pages (and Google Assistant often adds citations for almost everything, accenting the fact it’s reading search results to you) whereas Alexa feels more like its digested the information and is relaying it to you like a human would. (It does add citations in the companion app, so you can check sources, but it’s not always clear from its answers). At times, there was also significant overlap in many of the responses we received, where choosing one assistant over another became purely a matter of personal preference for whichever voice you preferred.

Maybe we can categorize them like this: In terms of who you’d like to come and live in your home, Alexa inches ahead in the voice assistant dating stakes. On the other hand, Alexa appears to have a higher frequency of “Sorry, I don’t know the answer” type responses. Conversely, the Google Assistant and Siri have an answer for pretty much every question, they also appear to be slightly better at interpreting the intention of your question and offer slightly more precise responses when they do. Which makes them feel more useful as a functional assistant, but more like a work colleague than Alexa’s home-buddy vibe.

Image of Siri at use on an iPhone

When we run a more structured test in a few weeks, we’ll dig into these fascinating devices to get a keener sense of their ability to process natural language and deliver on the promise of their functionality. However, in our highly unscientific, emotional preview of our Voice Assistant Speed Date, did we reach any conclusions? Yes. It turns out that just like in real life, we find different qualities attractive in different personalities, and nobody is perfect.

The Labs Team

Sutherland Labs