Telephony 101: What Every Web Developer Should Know about the Voice Network

 

Intel Corporation

July 2003

Applies to:
    Microsoft® Speech Technologies

Summary: The voice industry is quickly moving toward Web-based solutions that are easier to develop, deploy and support. Learn about telephony concepts that will provide a general understanding of how voice applications are built. (4 printed pages)

Contents

Introduction
Speech-Enabled Media Server
Network Transport: Circuit and Packet Networks
Network Topology: Connections Are Everything
Signaling and Call Control
Linking The PBX and Data Networks: CTI Links and CSTI
Design Considerations
Conclusion and Next Steps: Where to Go from Here

Introduction

Speech markup languages such as Speech Application Language Tags (SALT) provide a convenient, industry-accepted way to build voice user interfaces to Web applications. The proliferation of these languages is accelerating the voice industry's evolution towards a more open, Web-centric model in which solutions are easier to develop, deploy and support.

For those with a background in Web application development, building voice applications requires a general understanding of telephony concepts. This article introduces these concepts. The reader is also encouraged to view the April 1, 2003 MSDN® Webcast entitled Telephony for Web Developers (Telephony 101), where this material is covered in greater detail.

Speech-Enabled Media Server

In a SALT-based speech solution, the application logic resides on the Web server. The Web server issues SALT commands that are interpreted by a media server. In this application scenario, the media server provides the following functionality:

  • Interpretation of the SALT commands
  • Speech processing (speech recognition, speech synthesis)
  • A physical connection to the voice network
  • Call control (establishing, managing, and terminating the voice connection path)

Because it is the interface between the telephone network and the Web infrastructure, the media server is sometimes referred to as a telephony server or a voice gateway.

A look inside a SALT-based media server reveals the following hardware and software components:

  • Telephony board(s) providing a physical connection to the voice network and supporting signaling protocols used to communicate with the network.
  • Media processing resources that run recognition and synthesis functions. (Note that in higher density systems, the speech recognition, speech synthesis, and telephony server may be deployed on separate physical platforms.)
  • A middleware platform that provides SALT language interpretation and maintains state information of calls, telephony ports, and speech resources.
  • An industry-standard computing platform that houses the above components.

This open, decentralized model is rapidly replacing the traditional interactive voice response system (IVR) architecture, which is typically closed, proprietary, and more vertically integrated.

Network Transport: Circuit and Packet Networks

Voice is transported over circuit-switched or packet-switched networks. Most of today's enterprise voice networks use circuit-switched transport. When deciding on a system architecture, developers must consider the network into which the media server will be deployed. Most legacy private branch exchanges (PBXs) only support circuit connectivity.

Packet networks hold tremendous promise for voice networks. The number of IP-based PBXs and IP contact centers is increasing. As this trend continues, voice-over-IP will become a more common transport mechanism for speech solutions.

Network Topology: Connections Are Everything

An interactive voice response (IVR) system deployed in the enterprise can connect directly to the phone network or sit behind an enterprise switch such as a PBX. In the latter case, the media server is connected to the line-side interface of the PBX. (The PBX's connections to the public network are known as "trunk" connections.) To the PBX, the media server looks like another station set and must communicate in the same way as other stations sets.

The physical connection to the speech server can be analog or digital. Analog connections are deployed using interfaces similar to what you might have in your home. Digital, circuit-switched networks use a time-division multiplexing (TDM) scheme where a single voice channel occupies 64 kbps of bandwidth. This is known as a DS0. In North America, 24 voice channels are multiplexed into a single 1.544 Mbps bit stream known as a T-1 or DS1.

In both cases, the network connection is provided by telephony boards installed in industry-standard computing platforms. This "open" computer telephony model, introduced almost 20 years ago by companies such as Dialogic (now part of Intel), marked the first step in the transition away from proprietary telecommunications hardware. Today's boards come in a variety of densities, ranging from 4 channels up to 96 channels (4 x T-1) in a single PCI slot. It is now possible to achieve densities exceeding 196 voice channels in a single 1U server!

Signaling and Call Control

Call control describes the collection of functions responsible for establishing, maintaining and terminating calls. Common examples include dialing and transfer. To support call control functionality, media servers must have an accurate and reliable way to communicate with the network. This is done using signaling protocols. At your home, signaling is accomplished "in-band" using the tones generated by the handset. This is known as channel-associated signaling (CAS) because the signaling is transmitted in the same channel as the voice. Tones are also used for in-band signaling on digital connections.

Alternatively, call control can be implemented using digital, message-based protocols. These are known as "out-of-band" signaling protocols because the signaling is communicated on a separate channel from the voice. Out-of-band signaling is more reliable and scalable than in-band signaling. For example, in ISDN a single 64 kpbs signaling channel can support 23 voice channels. Examples of out-of-band signaling protocols include SS7-ISUP, Q.931, SIP and H.323, as well as a variety of proprietary protocols created by PBX vendors.

A media server must communicate with the PBX to which it is connected. This requires compatible protocols on the media server and the PBX. (A PBX often supports multiple protocols simultaneously.) How can widespread interoperability be achieved when such a wide variety of standard and proprietary protocols exist? This has been an industry challenge for some time. Today you can purchase telephony boards that support a variety of standard and proprietary communications protocols.

Computer-telephone integration (CTI) links allow PC-based telephony applications to integrate with proprietary PBXs. CTI is a data link that connects a CT server—typically a server running CT server software—to a PBX and the media server connected to it. Information about a caller, such as "account number" and "PIN," can be collected by a media server and sent through the PBX's CTI link to the CT server. The caller's account information is retrieved from the data network and displayed on an agent's screen (known as a screen pop). At the same time, the application can instruct the PBX to transfer the call from the media server to a customer service agent.

Most PBXs today offer a CTI link. Some vendors implement proprietary protocols. There is also a standard CTI message set, known as CSTA, which was created by the European Computer Manufacturers Association (ECMA). Companies like Intel offer CSTA-based server software that is used to implement the CTI link.

First-Party and Third-Party Call Control

First-party and third-party call control are commonly used to describe the relationship between the application and the call. In first-party call control, the application is also a talking party on the call. This implies a direct connection between the caller and the application. In third-party call control, the application is not necessarily a talking party. Using third-party call control, an application can simultaneously monitor several calls.

Many of the more sophisticated customer service applications may require third-party call control. For example, a call is received, transferred, and needs help. The app needs to maintain state information on the call and remember where the user is in terms of needs (account info, password, and so on).

Design Considerations

Enterprises considering speech deployment will already have an established voice communications infrastructure. One of the first considerations to be made when designing a platform is how it will work with this infrastructure. Base-level requirements—such as number of voice ports and the amount of speech resources—must be implemented in a platform which has the proper interfaces to connect with and communicate to PBXs, automatic call distributors (ACDs), voice mail systems, fax servers, email servers and other network elements.

Conclusion and Next Steps: Where to Go from Here

This article has provided a cursory look at the telephony concepts with which developers need to be familiar. A more complete overview is provided in the April 1, 2003 Webcast.

Intel Communications Systems Products

Intel®, the world's largest chipmaker, is also a leading manufacturer of computer, networking and communications products. Intel communications systems products offer developers, service providers, resellers, and communications system owners what they need to succeed in the new world of converged voice and data communications. This includes a broad family of building blocks, a global network of solutions providers, and comprehensive support and consulting services. Ranging from boards to server software, Intel building blocks meet the converged communications needs of environments as diverse as enterprise organizations and service providers. These building blocks include voice, fax, conferencing, and speech technologies; telephone and IP network interfaces; PBX integration products; carrier-class, board systems-level products; and more. Intel communications building blocks enable new, converged Web services including Internet voice browsing.

Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

Copyright © 2003 Intel Corporation.