Let Voice rule your world

Monday October 15, 2018 , 4 min Read

Here are five tips to building effective voice-applications.

In just a few years, the smartphones have revolutionised the way we access information and collaborate with each other. Touchscreens and gestures have dethroned the good old keyboard/mouse as our choice of interface.

With the subtle combination of gestures and swipes, we are today able to discover and navigate the digital world; in ways, we couldn’t have a few years back.

Really, our experience in the digital world is directly related to our ability to manoeuvre it.

The fast-evolving frameworks of Conversational Computing (CC) may be the next quantum leap in Human Machine Interface (HMI). Conversational Computing holds promise to make the user experience even more natural, personal & intuitive. Alexa, Siri, Google Assistant and Cortana are the forerunners in building the CC frameworks.

Here are five handy tips to build innovative voice-applications over any platform.

The redundancy test

It may be tempting to convert an existing web/mobile app into a voice-skill. But, it would be a wise move to first evaluate if smart speakers (Alexa/Google Home) are the best choices for your use-case. Needless to say, in a few scenarios, user prefers the native-app to get “the job done” instead of smart speakers. As a thumb rule, general-purpose queries related to Weather, Trivia, Breaking News are best handled by the underlying platform itself and may not warrant a separate app.

For some use-cases, the user may be more comfortable with a visual (touchscreen) interface over voice. A typical example is making financial transactions that involve multiple layers of authentication with passwords.

Also, a multimodal interface like Echo Show and Echo Spot are slowly catching up. These augment “voice-first” experience with GUI to enhance user experience. They provide a more interactive and intuitive experience.

Define the scope of your application

A mobile/web app has inherent limitations with respect to what the user can do. He can navigate and click only the options defined by the application. The visual design sets the expectation and defines scope. A voice application has no such cues. Often, the user may deviate from the intended workflow and drive the conversation in an unanticipated direction. The onus of educating the user about the scope and capabilities of the application rests with the developers. A good Voice-Skill will state its scope and limitations upfront.

For example, a logistics app, when invoked can welcome the user by saying,

“Hi, I can help you move your luggage anywhere in Karnataka and within two days. Would you like to know more or schedule a pickup?”

Note how the statement educates the user to what it can do and set expectations about the timeline of services. It is also leading the conversation into two specific threads.

Conversations and not commands

The simplest way to get a Voice-Skill to execute a task is to gather all input parameters at once. But this simpler approach may come at the cost of user experience. Let’s take the case of an application that can potentially book a cab. It’s a developer’s delight if the user could say this without error.

“Hey, book me a hatchback that can pick me up from Mayo Hall, MG Road, Bangalore at 9:45 AM tomorrow and drop me at the West Gate of Lalbagh, Bangalore”.

But it would be a user’s nightmare to remember the complex syntax and cramp all mandatory parameters (inputs) in one sentence. Clearly, the users will want to have a conversation and not utter tongue-twisting commands.

So, the programmer will need to leverage multi-turn conversation to handle this. Gather the parameters over multiple utterances, explain choices at each step and accommodate customisation.

Emotions – your secret weapon

Alexa is built over Speech Synthesis Markup Language (SSML). It is an important aspect of Conversational Computing systems. A good developer exploits the SSML functionalities & Alexa’s Speechcon features to incorporate fluctuations in pitch, contour, intonation, tone, stress, and rhythm to the conversation.

This feature is unavailable within visual HMIs and can make a world of difference to the user experience.

Error-handling

Get innovative. A generic phrase for catch-all error scenarios will bore the user very fast. They would be tired of hearing the same message several times. Have an array of responses; mix them up and contextualise each response. If a user deviates from the workflow, restate his options. Repeat the parameters that he has already shared and prompted for a response.

Remember, the key to building a blockbuster voice-app is to make the workflow more intuitive, preempt the conversation deviations and handle errors as humanely as possible.

user experience

User interfaces

Human communication

Humancomputer interaction

Follow Us