Voice User Interface Design - Tips and Techniques

 

Michael Smith, III, Senior Voice User Interface Designer
Intervoice, Inc.

Applies to:
    Microsoft® Speech Technologies

Summary: Learn tips and techniques based on first-hand experience in successfully designing highly usable speech applications. For technology providers seeking to design products that are both easy to implement and powerful enough to serve all development requirements, matching the right tool and technology to specific customer needs continues to be a widespread business challenge. As tools for speech application development become increasingly prevalent, it will be critical for developers to learn to design user-friendly applications using these tools. (4 printed pages)

Contents

Introduction
Organization Before Implementation
Prompt Clarity
What if There's a Problem?
So Many Things to Say
Information Playback
Final Words

Introduction

With the tools, technologies and standards (such as SALT, or Speech Application Language Tags) we have available today, and with the ability to "plug in" the right pieces, developing speech-enabled applications would appear—at least at first glance—to be as easy as Web page creation. However, in spite of the many excellent tools available today, Web pages that are difficult to understand and hard-to-use still abound.

The Web world grew at an unprecedented pace, leading to the seemingly overnight availability of a plethora of development tools and options. It is likely to be the same with speech-enabled applications. But the ability to implement a new and avant-garde technique does not necessarily result in a usable application.

Star Trek scenarios are not yet a reality, and current users can't work in such environments. Any technique that is used today to design a speech application must be grounded in a solid grasp of human factors. Developers must organize speech applications around how users in today's real world think—without forcing them to think too much.

Where can you find tips and techniques for creating good voice-enabled interfaces? In partnership with Microsoft® Speech Technologies, Intervoice presented a Webcast on May 20, 2003 that explored techniques for designing the voice-user interface, a crucial aspect of successful speech-enabled applications.

The May 20 Webcast built upon the foundation laid in a previous Intervoice Webcast delivered by Dr. Susan Hura and Microsoft Speech Technologies in January 2003. Hura's Webcast, "Heuristics: Lessons in the Art of Automated Conversation," discussed factors that impact usability, such as application functionality, style and persona. Both Webcasts are available for replay.

Organization Before Implementation

It is quite difficult to drive in a foreign city without a map, even when you know your destination. If you attempt to write application prompts before organizing the application's functionality, you will face a comparable difficulty. First, construct a road map of an existing application by evaluating current usage. Data for this analysis may be available for collection via the Web or a touch-tone application.

However, remember that few Web pages or touch-tone applications can be "lifted" from these environments to the speech environment without modification. This is not to say that a speech application developer should disregard principles of design related to those interfaces. The developer simply needs to keep these principles in their proper perspective. Since each interface is different, each must be designed, implemented and tested with equal thoroughness.

Limiting "current interfaces" to strictly automated solutions will limit the accuracy of the collected data. Because call center agents interact daily with users, they can provide a wealth of information and insight into users' behaviors, needs, request language and even common emotional states. To tap into this knowledge, hold a focus group with agents. Then evaluate this interaction further by listening to recorded user-to-agent phone calls.

Prompt Clarity

For users, listening is a difficult task. For the developer, assuming that users will remove themselves from the world around them and listen intently to each and every word of the application prompts you write is a mistake to be avoided. Seek to define the environment, age and state-of-mind of typical users, and then allow this to assist you in defining the style of your prompts.

For example, a locomotive conductor works in an entirely different environment than an office worker. Likewise, a balance inquiry system for a bank will have a completely different sense of urgency compared to a city's emergency line. These dramatic examples demonstrate that a "one prompt fits all" philosophy would be an abuse of many callers.

By using different sentence structures, pausing at strategic points, and perhaps most important of all, by employing the right persona and voice talent, you will be able to develop prompts that truly facilitate listening and comprehension. Relying on application prompts recorded by an amateur or "in-house" non-professional often produces poor results.

What if There's a Problem?

Let's assume you have done your homework, you know your application's users, you've selected the right voice talent and (hopefully) designed the perfect system. But what if actual usage proves that your application is not perfect after all?

Have you considered potential application exceptions? Exceptions must be considered, and sometimes exceptions can even make your application more enjoyable for users. Try to simulate a common practice used during our everyday conversations when we ask others to clarify a remark.

For the best results, base your speech-enabled application on accepted human conversational methods. The application must be able to clarify options, provide decision assistance, and confirm a user's selection. In addition, the methods you choose must be appropriate to the user situation. For example, the clarification needed for comprehension of an options menu would be different than that required for completion of a transaction such as buying shares of stock.

So Many Things to Say

Listening is a difficult task at best. It can become impossible when too many options are provided at once. This may mean that a user cannot intuitively reach a decision. To avoid this problem, limit menu options to just a few clear choices. Although this can make application organization challenging, few truly usable applications present a user with a large list of options. You can, however, enable a user to request items not presented on the menu. But speech-enabled recognition of options is different from the presentation of options in prompts, and should be treated differently.

The pitfall of application prompts that say too much is not limited to menu options or tasks, but also extends to the design of helpful suggestions. For example, design your speech applications to automatically tell a user they can say "help", "repeat", or "go back" before the user is likely to need this information. Remember that a user may miss some options and forget others the moment a new piece of information is provided. Therefore, present detailed options to the user when a need has been demonstrated. This will allow the user to more easily keep pace with what can be accomplished and how to accomplish it.

Information Playback

An application contains much more than menus or items for users to select. Users of most applications want a fast route to specific information that is important to them. The developer must therefore take care to ensure that this information is easily accessed, heard and understood.

A Web page can effectively display large amounts of information, allowing users to choose what is important and how long to spend on the page. But in the speech environment, you should not try to present too much information—such as the balance on fifteen mutual funds—too rapidly. If you design your application without taking into consideration that the user is hearing and possibly writing down the information, you may cause user frustration or even anger.

This information-overload concept applies not only to the speed of information delivery, but also to the design of content and its presentation. For example, if you crowd together information including a mutual fund name, a balance and an "as of" date for the sake of speed, the user may be unable to hear and understand the information. It will also be very difficult for the user to write it down.

Final Words

By carefully listening to your client and their users, you will have the best chance of designing a highly usable and intuitive speech-enabled application. You will succeed in promoting user satisfaction by skillfully combining many factors, including the design, prompts, personality and functionality of an application. Once your application has been designed, you can improve it further by conducting usability tests with actual users. Usability testing often validates some of your development choices while providing insight into desirable modifications to the design of your application. If you use these techniques, your end result will be an easy-to-use speech-enabled application that promotes a pleasant experience that enhances the user's relationship to the business providing the service.