GIVING VOICE TO A ROBOT
I sat down with The Verge to discuss our design philosophy.
Anki Robotics, 2018-2019
While Anki's "Wall-E"-esque Cozmo robot was built for kids, it was also highly successful with adults. However, adults wanted an even more useful, higher-tech companion. Our challenge was to create a robot for all ages.
Vector is a robot who both helps out and hangs out. His central feature is reacting to the world, including speech. We set out to create intuitive, useful, and delightful interactions with both voice and physical modalities.
As a Senior Designer, I led development of Vector's VUI (Voice User Interface) and related human-robot interactions.
Expectations and opportunities
Interviews were conducted with our target demographic, early adopters of technology. We found that both Vector’s engaging character and ability to help were equally important to how they imagined a robot sidekick being part of their lives.
We looked at other voice assistants and social robots, analyzing their personalities and functionality. We researched the most used voice commands to understand which ones users would expect.
Jibo, Alexa, and Pepper all provide different styles of voice interaction.
Based on our users' needs and expectations, we brainstormed a list of possible voice commands and interactions.
This list was curated by asking:
Which commands will users miss if they’re not included?
Which commands are requirements for Vector’s core features?
Which commands could Vector do uniquely well due to his character, mobility, or computer vision?
Brainstorming and story mapping of Vector's voice commands.
Voice interaction principles
I created guiding principles as we built out Vector's voice and physical interactions. These were based on stakeholder interviews, user insights, cross-team collaboration, and competitive analysis.
1. Intuitive Interactions
When I say a voice command, Vector responds in the way I expect.
2. Engaging Character
When I talk to Vector, he feels lifelike. Interacting with Vector is part of my daily routine.
Voice commands are easily discoverable in the Vector app. Expected commands either work or fail elegantly.
4. Useful Feedback
If my voice command fails, I know why and how to succeed next time. If an interaction fails and it wasn't my fault, Vector's reaction is clear and endearing.
Storyboards by the character design team.
Designing the "wake word" interaction
The development of Vector's wake word ("Hey, Vector") interaction started with a massive problem to solve. Vector's mic is right next to his gears and speakers. If he reacts to let a user know he has heard his name, the noise generated obfuscates the user's voice command.
1. Test Design
I held a brainstorming session with technical and creative functions.
We decided to test 3 wake word responses for usability and accuracy:
Prototype A: No reaction
Prototype B: Small reaction (sound but no turn)
Prototype C: Big reaction (sound and turn)
2. User Testing
During user tests, Prototype C (big reaction) was the most usable. Fortunately, even though it had almost 300 milliseconds of the user's intent obfuscated by gear noise, our voice system was still able to match intents even with missing words.
3. Wake Word Interaction Design
Prototype C (big reaction) was the winner, and we went to work implementing a big, clear reaction to the wake word.
VUI (Voice User Interface)
Vector's voice commands
Working across the design team, I led efforts to define and maintain our voice commands and variations. Here are some of the commands I co-wrote and optimized.
User-facing voice command: "Hey Vector, I have a question.", "[How far away is the moon?]"
Variations: Question, I need to ask you something, Knowledge graph, Answer this
User-facing voice command: "Hey Vector, set a timer for [5 minutes]"
Entity: [time-duration], fail on more than 1 hour
Variations: Timer for [5 minutes], Let me know when [5 minutes] is done, Set an alarm for [5 minutes]
User-facing voice command: "Hey Vector, let's play blackjack."
Variations: Let's play cards, Cards, Want to be dealer?, Play a card game
User-facing voice command: "Hey Vector, take a photo [of me]."
Entity: [none] take picture of anything, [me] = find 1 face or fail, [us] = find 2+ faces or fail
Variations: Take a picture, Snapshot, Photo album, Snap an image
User-facing voice command: "Hey Vector, what's the weather [in San Francisco]
Entity: [location-international], if [none] use user's self-reported location or fail
Variations: What's the temperature [in San Francisco], Is it snowing?, Do I need an umbrella?, Should I put on my jacket?
Storyboards by the character design team.
Q&A voice command
The Q&A voice command is an example of a VUI design I defined, collaborating with another UX designer, character design, animation, robotics engineering, and cloud engineering. The command lets a user ask an open-ended question.
The Q&A voice command was complex because it needed to account for several successful and error responses, as well as integrate with two third-party services, Google DialogueFlow and Houndify Knowledge Base.
VUI Flow for Q&A voice command.
Voice System Optimization
Our voice system was built in Google DialogueFlow, where we trained intents (think: the thing the user wants) and specified entities (think: information we need to gather). I also onboarded and partnered with a contract team to localize/culturalize commands for the UK and Australia.
Google's DialogueFlow was used to optimize our VUI system.
The app helps users discover what they can say. We conducted user testing to find the most usable language. For example, the most usable title for this feature was Q&A Mode. Other options I explored included: I Have a Question, Question, Information, Knowledge Graph, Database, Search, and Ask Vector.
App design was lead by the UX/UI design team, and I contributed UX writing for features.
The Vector app helps users to discover features and use voice to interact.
Final Voice Interaction
Here is an example fo the final user experience for the Q&A voice command.