Speech Application SDK Tutorial

  Microsoft Speech Technologies Homepage

These tutorial topics demonstrate how to build a simple voice-only application using the tools in the Microsoft Speech Application SDK Version 1.1 (SASDK) in Microsoft Visual Studio .NET 2003. You will build a Start page of an imaginary pizza ordering service, for use by a telephony Speech Application Language Tags (SALT) client.

Goals of the Tutorial

The tutorial demonstrates the basic structure and data organization of a simple, voice-only speech application for a telephony environment. It shows the use of the SASDK tools in Visual Studio to create grammars, to record prompts, to create a dialogue with callers, and to test the application under development.

The tutorial does not cover all the requirements for a production speech application. For instance, it does not cover all the error-checking and confirmation strategies that production applications require, such as dealing with mumbled or unrecognized speech, setting recognition or confirmation thresholds, providing the alternative of touch-tone input, or providing error-recovery paths. It does not deal with the many facets of the audio characteristics of prompts, such as increasing or decreasing the pauses or silence at the beginning and ending of prompts, or smoothing concatenations of prompts to get a more pleasing response. It does not provide a way to display, print or store the received and confirmed results, nor does it provide a method to identify a particular session or caller.

The tutorial does not cover multilanguage applications, or multimodal applications with visual components or desktop applications. The sample Using Tap-and-talk covers multimodal applications.

Structure of the Tutorial Application

The tutorial application is designed for a pizza take-out service, with a very limited menu of sizes. The tutorial uses a simple system-initiative dialogue structure to guide the caller through these basic choices. The caller must provide a telephone number to use as identification when picking up the pizza. The application gathers the information that the caller supplies, repeats it back to the caller, and requests confirmation from the caller. A pizza delivery service would be more interesting, but recognizing addresses requires a relatively complicated grammar, with more complicated confirmations, which is outside the scope of this simple tools tutorial.

Design of the Tutorial Application

The tutorial has the following general dialogue structure.

flow chart

The following is a sample dialogue in the tutorial application.

SYSTEM: Welcome to Tony's Pizza. Order a pizza now and we'll have it waiting when you arrive.
Say Cancel at any time to cancel this order. [pause]
SYSTEM: We have small, medium and large sizes.
What size would you like?
CALLER: I'd like a large pizza, please.
SYSTEM: Please say your telephone number, area code first.
CALLER: It's four two five five five five oh nine zero zero.
SYSTEM: You ordered a large pizza and your phone number is
four two five five five five oh nine zero zero. Is that correct?
CALLER: Yes.
SYSTEM: Thanks for your order.
Give us your phone number when you arrive and we'll give you your pizza.

Additional Sources of Information

The SASDK contains a number of sample and reference applications that cover some of the tutorial topics in more depth, and cover other topics not included in the tutorial. Some of the samples are relatively simple. Two reference applications, the Contacts Reference Application and Banking Alerts, are more extensive, and contain features required for production applications. The SASDK also has a grammar library containing many rulesets for common scenarios. The tutorial has links where appropriate to the samples or the grammars.

Before Beginning the Tutorial

Before beginning the tutorial, there are some important steps to take:

  • Configure the microphone used to respond to prompts in the tutorial.
  • If running under Windows Server 2003, add base URLs for the tutorial to the Trusted Sites for Microsoft Internet Explorer.

To configure the microphone

Some of the prompts in the tutorial application allow speech input before the prompt finishes; that is, the control begins listening for input before the prompt output is complete. This is called bargein: The caller can barge in on the prompt. If the microphone is not configured properly, sometimes even breathing or ambient noise can seem like a response to the control; that is, it seems as if the caller has barged in on the prompt. This usually results in an unrecognized phrase from the point of view of the application. Because the tutorial does not have all the recovery mechanisms that a production application would have, it cannot cope well with this behavior.

The configuration is different on different operating systems. The following are general steps for microphone configuration.

  1. From Control Panel, open Speech.
  2. Under Speech Recognition, select the option to configure the microphone, and follow the subsequent instructions.

Tip  If you configure your microphone with a speaking voice that is too quiet (low in volume), your microphone easily picks up ambient noises and perhaps even the output from the headphones. In the tutorial, if it seems like you don't get a chance to respond to the prompts, you can reconfigure your microphone with a louder speaking voice.

To add URLs to Trusted Sites for Windows Server 2003

Note  When Internet Explorer Enhanced Security Configuration is enabled on your server, the security settings for all Internet sites are set to High. If you trust a Web page and need it to be functional, you can add that page to the Trusted Sites zone in Internet Explorer.

  1. Start Microsoft Internet Explorer.

  2. From the Tools menu, select Internet Options.

  3. In the Internet Options dialog box, select the Security tab.

  4. Select Trusted Sites.

  5. Click Sites....

  6. In the Trusted Sites dialog box, enter the following URLs in the Add this Web site to the zone edit box, one after the other, and click Add after each URL:

    *://localhost
    *://YourMachineName*

    The asterisk at the start of the URL permits any protocol (such as http, https or ftp), and the asterisk at the end of the computer name permits any domain specifications for the computer.

  7. Click OK in each dialog box to accept the changes.

Beginning the Tutorial

The procedures for creating the pizza ordering service application build on each other. The sequence in which you perform the procedures is therefore important. The topics are listed here in the suggested order of use.

  • Creating a Speech Project
    Specify the project characteristics with the Speech Web Application Project Wizard.
  • Creating Grammars
    Create a speech grammar for the application with Speech Grammar Editor.
  • Creating the Dialogue Framework
    Insert controls to speak to the caller and grammars to understand the caller's speech with Speech Control Editor.
  • Creating Prompts
    Create a prompt database for the application with Speech Prompt Editor.
  • Adding Semantic Information
    Add semantic information to the application, specifying the semantic information each rule returns and binding that to semantic items.
  • Confirming Responses
    Use a QA control with a prompt select function to repeat the information gathered to the caller and confirm that information, and set up a Command control.
  • Debugging the Tutorial Application
    Ensure full recorded prompt coverage with the Validate Solution tool in Speech Prompt Editor, simulate a connection to a telephony system with Telephony Application Simulator, and monitor and manipulate speech data, events and errors with Speech Debugging Console.