Joseph Hofstader
November 2007
Summary: Being
an architect on a CaaS solution over the last few years has brought some
interesting perspective into the unique challenges of the development and
deployment of these applications. As I mention throughout this article, CaaS
applications require additional considerations past hosted multi-tenant
solution development, whose design challenges tend to reside in the
orchestration layer and the security services. The SIP services layer requires
knowledge unique to communications solutions, and the lack of an understanding
of the patterns required to implement these solutions can hinder the effort.
Contents
Converged
Communications
Session Initiation Protocol
IP Multimedia Subsystem
Caas Reference Architecture
Thoughts on Caas
About the Author
Reference
Internet-based communications are not a new
phenomenon. For many years, we have sent video, voice, and data across the
public Internet to correspond with others free of charge or at a low cost. Paradoxically,
mainstream adoption of communications services over the public Internet has
been relatively slow. With the cost of service dropping and lingering concerns
about the quality and reliability of these largely unregulated communications
systems, a vast majority of companies and individuals communicate over the same
circuit-switched network that has existed since the invention of the telephone.
It is important to address a common
misperception about IP communications. This misperception is that IP
communications always take place over the public Internet. For many years, IP
has been used for voice communications on dedicated circuits or local area
networks. One example is the routing of long distance phone calls over
dedicated circuits in a telecommunications carrier network. Another example of
IP communications not using the pubic Internet are IP-based telephone systems
used for enterprises telephony. In other instances, like some software-based
“phone” systems running on a personal computer (PC), IP communications does
take place over the public Internet.
In the case of enterprise communications,
IP communications is largely provided through customer-premise Private Branch Exchange
(PBX) systems implementing a signaling protocol that runs over IP: H.323 or
Session Initiation Protocol (SIP). These IP-based PBX systems offer benefits to
companies, such as the ability to run a corporate telephony system on the same
network used for data. The convergence of voice and data over the corporate
network reduces the costs of installing and operating separate networks for
voice and data. Another benefit is the reduction of administrative costs
realized by allowing users of the telephone system to relocate their telephones
within the network without the aid of an administrator. Unfortunately, these
systems are expensive to purchase, configure, and manage; this makes them
costly for a small- to medium-sized business (SMB) to implement.
As a result, there is an underserved market
for providing cost effective corporate telephony to SMBs. Over the last few
years, many incumbent enterprise communications providers have conducted
efforts to offer corporate communications in a hosted environment, similar to
the environment that corporate back-office applications, such as Microsoft
Exchange, are hosted in. Communications applications deployed in this
environment are becoming referred to as Communications as a Service (CaaS).
CaaS builds on the basic foundation of Software as a Service (SaaS), with some
requirements unique to communications applications.
This article is intended for solution
architects interested in learning about CaaS. For the sake of keeping this
article focused on CaaS, there is an assumption that the reader will have, at a
minimum, a cursory understanding of SaaS and an awareness of the basic concepts
implicit in architecting SaaS solutions, such as using metadata services to
support a single-instance multi-tenant deployment and implementing a federated
security model. The article begins with a brief discussion about converged
communications. I will then discuss SIP, covering the concepts supported by the
protocol that is becoming standard in communications software. After the introduction
to SIP, I will discuss the IP Multimedia Subsystem (IMS), which adds additional
possibilities for communications applications by introducing a solution for
fixed-mobile convergence (FMC). I will then provide a conceptual architecture
of a CaaS solution. I will conclude with some thoughts based on my experience
as an architect on a CaaS solution.
Converged
Communications
Over the last decade, converged communications
emerged as a popular phrase in the telecommunications industry. Although the
phrase has different definitions depending on the context in which it is used,
one theme runs consistent in all contexts: multiple services – video; voice;
and data; accessible over multiple devices – wire-line telephone; mobile or
smart phone; and PC (to mention a few).
As more services become available over
multiple devices, the line between software applications and communications
applications begins to blur. Many of the traits traditionally associated with
communications applications—such as event-driven connections established
between endpoints (point-to-point conference, multi-cast) —using communications
protocols will also become requirements of business or entertainment software.
The aforementioned requirements create a
paradigm shift for architects of software solutions. Architects of CaaS
solutions will be required to understand the networks and protocols that will
be used to access the CaaS solution. As with any new technology, until frameworks
and tools are created to abstract the details of the network environment,
application developers will also need to understand the protocols that will be
used within their applications.
Session Initiation
Protocol
SIP is the signaling protocol used by the
early entrants in the CaaS space. SIP is an application layer protocol that
allows a SIP based application to be plugged into a SIP infrastructure without
concern to the underlying transport, such as wireless or wire-line. The SIP
infrastructure provides an environment for creating event-driven applications
that contact the end user on the device of their preference. In other words,
applications using SIP can communicate with end users wherever they are, on
whatever device they specify, using multiple forms of media (video, voice, and
data).
Context-based Software
Even though the initial implementations of
SIP have been created by companies focused on corporate communications, there
is a lot of buzz around the development of SIP applications that bear a closer
resemblance to Internet applications as opposed to traditional telephony. There
are a lot of examples on the Internet of applications streaming sports
highlights to a cell phone or with SIP implementations in emergency services
scenarios. Herein lies the true potential of SIP: the ability to create
event-driven software solutions that communicate with end users wherever they
choose to be contacted.
There are an endless number of scenarios in
which people may want notification or to notify others after an event occurs. Imagine
receiving a notification, wherever you are, about possible traffic problems on
your route to work with alternate routes to take, based on your current
location information. Or imagine registering yourself at a hotel and having
customized sports highlights appear on the television when you turn on the
television. Context-based software implementing SIP can make these solutions a
reality.
SIP Messaging
As mentioned earlier, SIP is an application-level
protocol that creates a media session between two or more user agents (UA)–SIP
enabled endpoints. SIP is a “rendezvous” protocol, which extends it past
signaling and allows a UA to communicate with other UAs in ways that traditional
telephony cannot. SIP allows a UA to request information about or provide
information to another UA. A UA may range from a SIP phone to a multi-media
device, like a laptop or a smart phone.
A SIP request is used to establish a dialog between two
UAs. A SIP response is a three-digit numerical code, where the first
digit indicates the class of the response (for example, a success response would
be a 2xx response code). A SIP transaction is a SIP request and the
final SIP response to that request. Figure 1 shows the establishment of a
Real-Time Transport Protocol (RTP) session within a SIP transaction.
Figure 1. SIP transaction
SIP messages
are similar to an e-mail message with a request Uniform Resource Indicator
(URI) as the destination and the body in Multipurpose Internet Mail Extensions
(MIME). The header of a SIP message contains information used by the SIP
application. The SIP message body supports multiple content types, from plain
text and HTML to presence and conferencing state information. To create a
dialog, SIP uses the Session Description Protocol (SDP), with one party making
an SDP request and the other replying with an SDP answer.
SIP Infrastructure
SIP contains an infrastructure far larger
than the UAs that are establishing the session. Devices in this infrastructure
may include SIP proxy servers, SIP redirect servers, and SIP registrar
servers. It is important to note that all of these devices are logical,
meaning that they do not need to be physically deployed as separate processing
elements.
The SIP proxy server receives a SIP request
and forwards the request. A common usage for the SIP proxy server would be for
Network Address Translation (NAT) or firewall access control. The SIP redirect server
receives a SIP request and performs a query, returning the results to the
requesting UA. The SIP registrar server allows a user to associate a UA with his
or her identity.
SIP URIs
In SIP, a UA has a URI associated with it
in order to be located during a SIP transaction. A SIP interaction may contain
more than one URI. The use of multiple URIs supports context-based solutions by
decoupling the user from the UA that SIP messages are routed to. SIP solutions
can use resource lists as a way of supporting group operations, such as
conferencing.
There are three types of SIP URIs: user, device,
and service. The user URI is the identity of the user in the communications
system. The user URI is the address of record (AOR) for the end user. The device
URI is a single SIP UA instance. A device can be temporarily associated
with a user. For incoming addressing, SIP binds a device URI to the AOR. The service
URI represents a many to many SIP interaction, such as a conferencing
session.
SIP Interactions
Figure 2 shows a SIP interaction that
establishes an RTP session between two UAs in separate domains.
.jpg)
Figure 2. Establishing an intra-domain RTP session
Figure 2 shows a UA establishing a dialog
by sending an INVITE request to another UA. The INVITE request contains the URI
of the user who is being contacted by the request. The proxy servers route the
request between domains and to the UAs. After the media session is established,
the proxy servers can be taken out of the interaction—it is the responsibility
of the individual who architects the SIP solution to decide which SIP devices,
other than the UAs, need to exist in the media session path.
Common SIP Interaction Patterns
Figure 2 shows a basic SIP interaction
involving two UAs and two proxy servers. This interaction pattern can be used
for UA to UA VoIP communications. To support advanced communications-based
services, SIP also supports a few other key interaction patterns for: peer-to-peer,
multicast, and conferencing.
Peer-to-peer SIP interactions do not
involve any intermediaries; by using dynamic DNS, a UA can use a SIP URI
without registering with a SIP server. Multicasting involves a request being
sent to an URI with any registrar available to respond to the request. Conferencing
involves the establishment of a focus, to which each participant establishes a
dialog. The focus logically groups the set of dialogs to provide a conferencing
service.
SIP Events
SIP events allow a SIP element to subscribe
to events at another SIP element. Using this subscribe/notify pattern, a SIP UA
can be notified when a state change occurs in the remote application. The
concept of reactive (event-driven) applications is nothing new, but the ability
to route events to a UA based on presence information is a powerful
communications mechanism
Presence
Presence is a critical component of
context-based software. Presence is such a significant concept that is has been
referred to as the “dial tone of the 21st century.” Presence brings
context to communications. Instant messaging (IM) services use presence to
route messages to the correct computer. If you are logged into an IM account on
a computer and then log on to another computer and sign into your account, your
messages will be routed to the computer you are currently using based on
presence information. The main functionality that is implemented to support
presence is known as “rendezvous,” which allows somebody to contact someone
using an AOR without knowing how they are attached to the network.
There are a number of architectural
patterns for systems that use presence. The peer-to-peer presence architecture
pattern has the UAs subscribe directly to the UA from which it wants to receive
events. The opposite of the peer-to-peer presence architectural pattern can be
implemented by deploying a presence server as a broker, handling all
subscriptions and notifications from UAs.
A common pattern used to architect SIP-based
systems is the Session Initiation Protocol for Instant
Messaging and Presence Leveraging Extensions (SIMPLE) architecture pattern. This
architecture pattern is based on open standards, and it contains a number of
roles among different processing elements. As with the broker architecture,
SIMPLE contains a centralized presence server that controls all subscriptions. Another
component in the SIMPLE architecture is the presentity, or presence
entity. The presentity is a person that has presence that can be described, for
example “busy” in an instance messaging program. This architecture also
contains an entity known as a watcher, which is an endpoint that
subscribes to presence changes. Another role is the presence agent (PA),
which notifies the watcher of state changes of the presence of a resource. The
final role is the presence user agent (PUA), which is a program that
knows the current presence of the presentity.
The following diagram shows all of the software
elements in the SIMPLE architecture and describes how they interrelate.
.jpg)
Figure 3. SIMPLE architecture roles
To be able to publish presence information
across domain boundaries, an edge-proxy is added to the architecture. In this
scenario, the presence servers in a domain are the presence agent to the users
in the other domain and are notified of any changes in the state of the users.
.jpg)
Figure 4. Publishing presence information across domains
When a user has the ability to be
associated with multiple user agents, there is always a chance of conflict. When
publishing presence information, the normal rule is that the last UA to publish
presence is the AOR.
IP Multimedia
Subsystem
Along with understanding SIP, architecting
a CaaS solution requires an understanding of the network context to which the
solution will be deployed. In the section about SIP, two deployment scenarios
were presented: an intra-domain scenario and a cross-domain scenario. However,
this section did not address how to extend CaaS capabilities to the mobile
network and public switched telephone network (PSTN). IMS is a network
component that may have an impact on how communications solutions are
architected. IMS is the element of Third Generation (3G) networks that
facilitates the convergence of the Internet and cellular networks.
Considering wireless networks already offer
forms of Internet communications, like text messaging and web browsing, it is
reasonable to question the necessity of IMS. IMS adds value to the mobile
network in two key areas:
- Quality of Service (QOS) can be
guaranteed in a 3G packet-switched network, improving the quality of the
media session.
- Services can be integrated into a network
environment, facilitated by IMS defined standard interfaces used by
service developers.
These standard interfaces create a “plug
and play” environment for communications services, such as a voicemail service.
IMS is based on a very detailed
specification, on which volumes have been written. The intent of this article is
to describe enough of the high-level concepts of IMS that is required to
understand the potential impact on the CaaS architecture.
IMS Architecture
IMS contains the following functions within
its infrastructure:
- Home Subscriber Server(s) (HSS) database
- Call/Session Control Flow (CSCF) SIP
servers(s)
- Application Server(s) (AS)
- Media Resource Functions (MRF)
- Breakout Gateway Control Functions (BRCF)
- PSTN Gateway(s)
Each of these functions support standard
interfaces and can be deployed as separate nodes or physically co-located on
the same node.
The HSS is the repository for user related
information. User information stored in the HSS includes security information
(authentication and authorization); user profile information, including
services the user is subscribed to; and user location information.
The CSCF processes SIP signals. There are
three types of CSCFs:
- Proxy (P-CSCF). The P-CSCF is the first point of contact between the user endpoint
and the IMS network. The basic functions of the P-CSCF focus around
security and message compression/decompression.
- Interoperating Call Session Control
Flow (I-CSCF). The purpose if the I-CSCF is to
retrieve user information and route the SIP request to the appropriate
destination.
- Serving Call Session Control Flow
(S-CSCF) The S-CSCF is a SIP server that
performs session control. The S-CSCF maintains a binding between the user
URI and device URI.
The AS is a SIP entity that hosts and
executes services. Much of what the AS does is mediate SIP messages and
interacts with the IMS infrastructure. The AS also acts as a host for
applications that interface using SIP. This application hosting functionality
is the layer that proxies the SIP call to the CaaS application.
IMS Message Flow
Figure 5 shows the message flow through the
infrastructure. In this example, there is a home network and a visited network
to illustrate the main concepts in the IMS infrastructure. This is similar to a
cellular network in which the visited network would be a network roamed into and
the home network would be the cellular service provider. First, the UE sends a SIP
message to the P-CSCF in the visited network. The P-CSCF then passes the SIP
request to the I-CSCF, which retrieves user information from the HSS and passes
the request to the appropriate destination, in this case the S-CSCF. The S-CSCF
inspects the SIP message and interfaces with the HSS to retrieve user
information. The S-CSCF determines if the SIP message should be passed to the
AS or to a SIP proxy.
Figure 5 shows the S-CSCF passing the
message to the AS. There are a number of roles the AS can fulfill: a SIP proxy,
a SIP UA, or a concatenation of two SIP UAs, known as a back-to-back user agent
(B2BUA). The AS then sends a SIP message back to the S-CSCF for routing.
.jpg)
Figure 5. IMS message flow
CaaS Reference
Architecture
Combining the inherent complexities of
architecting a multi-tenant solution with the additional requirements of
interacting with a communications network across domain boundaries,
architecting CaaS solutions is a complex task. Because there are many good
references on architecting multi-tenant solutions, this section will focus on
the requirements unique to CaaS solutions.
Requirements for Communications Solutions
One key differentiator between
communications applications and line-of-business (LOB) applications are the
high priority given to a few non-functional requirements, such as 99.999%
availability (commonly referred to as 5-9s), high performance (sub-second
response times for method invocations), a highly secured environment,
interfaces to monitor service usage, and the ability to configure and
administer services.
Another characteristic of communications
applications is their mission-critical nature and their ubiquitous deployment
within an organization. This characteristic makes it nearly impossible to have
control of the environment in which a solution will be deployed. This creates
dependencies that are somewhat unique to communications applications, such as being
required to interface with an existing PBX system to interoperate with desktop
phones or a network dependency in the environment in which the solution will be
deployed.
Another variable inherent in hosted
solutions, including CaaS applications, is the availability of operations support
systems (OSS) in the deployment environment. Each deployment of the solution
can, and most likely will, have investments in different OSS products. The deployment
environment makes architecting a solution dynamic, and decisions need to be
made about the different protocols and interaction patterns that will be used
to integrate the solution into the hosted environment.
Conceptual Architecture for CaaS Solution
The following diagram contains the
conceptual architecture for a CaaS solution. The colors in the diagram
highlight different components of the solution:
- Blue represents for the CaaS application
itself.
- Yellow represents components that may
have to be provided by the creator of the CaaS solution.
- Red represents components that will be in
place in the service provider environment.
As stated earlier, the deployment
environment will influence the requirements for the components that are not
part of the CaaS application itself.
.jpg)
Figure 6. CaaS reference architecture
It is important to note that the conceptual
architecture is high level and each component will be decomposed into specific
services to fulfill the solution requirements. Another important thing to note
is that the conceptual diagram does not imply any physical implementation.
CaaS Application
The CaaS application is the set of services
that fulfill a communications function, such as IP telephony. Because this set
of services contains the business value of the CaaS solution, this is the part
of the architecture that most development organizations will focus on.
The CaaS application is implemented using a
layered architecture. A layered architecture abstracts different parts of the
application at different levels allowing different layers to be reused in
different contexts; for example, the communications service layer, if correctly
decomposed, can be customized in different ways to fulfill specific
requirements for different customers. In the conceptual architecture, the CaaS
application constitutes the following layers:
- Management interface layer
- Communications service orchestration layer
- Communications service layer
This layering is purely logical, because
different components may run in the same process as other components. The next
sections describe each of these layers in more detail.
Management Interface Layer
Every communications application requires
the ability to have the implementer of the solution (service provider,
enterprise, or hosting provider) perform the tasks related to the following
functions on the solution: fault, configuration, accounting, performance, and
security (FCAPS). This layer provides interfaces for those functions.
Different functions of the components in
the management interface layer will interface with external applications in
different fashions. Some interfaces may be implemented with a request/response
message exchange pattern (MEP) exposed as a Web service; other interfaces may
report events by sending Simple Network Management Protocol (SNMP) traps.
Current and emerging technical trends will dictate the method and protocol of
the interface.
The following table describes the
components contained in the management interface layer of the CaaS application.
| Component | Description |
| Configuration interface | These interfaces are for the configuration of the
hardware and software that facilitate the CaaS application. The interfaces
permit for the configuration of the physical environment, such as hardware
and networking, and the configuration of the logical environment, such as the
services that are running on the communications solution. |
| Billing interface | These interfaces provide billing information to the
service provider. This interface may be event-driven, providing a data
stream, or polled regularly for formatted files. |
| Reporting interface | These interfaces provide information about the
faults and performance. Depending on the content, information can be sent as
events, such as SNMP traps, or polled by the consumer of the data. |
Communications Service Orchestration
Layer
The communications service orchestration
layer orchestrates the communications services in the communications service
layer. The orchestration layer provides a level of agility in the CaaS solution
by allowing the autonomous services in the communications service layer to be
composed into customized communications processes based on customer
requirements. The communications service orchestration layer helps provide a
customized communications experience in a multi-tenant environment.
Customization is often a requirement of corporate communications solutions.
| Component | Description |
| Communications orchestration | The communications orchestrations compose
communications services to provide a communications function. For example, an
Automatic Call Distribution (ACD) application may have specific routing
requirements based on the keypad input of the user, such as “Dial 0 to reach
the operator.” These routing requirements can be implemented as an
orchestration: (1) answer call, (2) prompt for input, (3) process input, and
(4) route call to appropriate destination. |
Communications Service Layer
The communications service layer contains
autonomous services that perform communication tasks; for example, the routing
logic of a “forward call” function would be implemented in this layer of the
CaaS solution. There are no specific technical requirements of this layer.
Services can be implemented as class libraries or they can even be an entire
communications system, such as a PBX or a voicemail system.
| Component | Description |
| Communications service | The services that perform a set of communications
tasks, such as call routing or voicemail. |
SIP Services
The SIP services provide the ability for
the CaaS solution to communicate using SIP. These services understand the
capabilities of SIP, such as how to route a call to a user based on presence
information and how to change the state of a user based on SIP events. These
services also contain IMS services that interface with the HSS and the S-CSCF.
In initial CaaS implementations, the solution architect will most likely have a
high degree of influence about the provider of the SIP services.
Management Application
The CaaS management application is
optional, depending on the needs of the service provider. The management
application can supplement the CaaS application by providing additional
services that are typically contained in the business support systems (BSS) and
operations support systems (OSS). These services may include the management of
the physical environment (servers, processes and networking), such as taking
remedial action for faults; mapping the logical model (customers and services)
to the physical environment, such as Customer X’s service is running on server
1234; and providing billing information.
Security Services
The CaaS security services may also be part
of the existing service provider environment. In a CaaS environment, there is a
need to federate security from the customer’s location to the service provider.
If a federated security system is not part of the service provider environment,
the provider of the CaaS application may have to add these capabilities to
their solution.
Directory Services
Similar to the security services, federated
directory services are a requirement of a CaaS solution. If a service provider
does not have federated directory services, the provider of the CaaS
application may have to add these services to its CaaS solution.
Business Support Systems/Operations Support Systems
BSSs/OSSs are carrier-grade systems that
automate service provider’s business processes and operations. The functions of
these systems include, among others, billing, provisioning, and fault
management. BSS/OSS systems are almost always part of a service provider
environment and CaaS architects will have very little influence over these
systems. CaaS architects will have to define interfaces for their solutions to
interoperate with the BSS/OSS. There are standards bodies, such as the
TeleManagement Forum (TMF), that define standards for BSS/OSS interoperability.
IMS Infrastructure
The IMS infrastructure will be supplied by
the communications service provider. Analogous to the network infrastructure
for a Web site, CaaS architects will have no influence about the vendor of IMS
equipment and will have to adhere to the standards that are part of the IMS
specification.
Remote Domain
The remote domain in the conceptual
architecture illustrates branch office or enterprise-to-enterprise (E2E) SIP
communications. Early CaaS implementations may be able to restrict branch
office or E2E communications to enterprises using SIP services from a single
vendor. As the implementation of Internet-based communications evolves, the influence
of a CaaS architect over the SIP servers at a remote domain will diminish.
Thoughts on CaaS
Being an architect on a CaaS solution over
the last few years has brought some interesting perspective into the unique
challenges of the development and deployment of these applications. As I
mention throughout this article, CaaS applications require additional
considerations past hosted multi-tenant solution development, whose design
challenges tend to reside in the orchestration layer and the security services.
The SIP services layer requires knowledge unique to communications solutions,
and the lack of an understanding of the patterns required to implement these solutions
can hinder the effort.
Another unique consideration of CaaS
solutions are the implementation environments. Communications service providers
all have unique environments and different criteria for qualifying and
deploying a solution into their environments. Getting a CaaS solution qualified
for deployment may take a significant amount of time after the software itself
has been developed. As mentioned in the description of the conceptual
architecture, parts of the application may be required for some deployments and
not others, such as security services and the management application. These
services will have to be developed and tested as part of the initial
implementation, even though they may not be part of the initial deployments.
After those words of caution, you may
wonder whether developing a CaaS solution is worth it. From a financial
standpoint, it will be. Based on Gartner, Inc. Worldwide projection, “CaaS is projected to total $251.9 million in 2007, a 37.6% increase
from last year. The market is expected to total $2.3 billion in 2011,
representing a compound annual growth rate at more than 105% for the period.”
Along with the potential financial
windfall, architecting CaaS solutions should appeal to the most fervent of
alpha geeks. Along with defining a solution that implements a pure
service-oriented architecture (SOA), the technologies involved with
architecting a CaaS solution, the hardware, networking and software, should
pique our technical interests for quite some time. The potential for CaaS
applications that permit for multiple services (voice, video, and data) over
multiple devices (mobile, wire-line, and computer) are boundless and will
surely produce some killer applications in the near future.
About the Author
Joseph Hofstader is an Architect/Evangelist
for Microsoft’s Communications Sector of North America. Joseph has spent his
career architecting, designing, and developing applications in the
telecommunications industry. Joseph has spent the majority of the last few
years as an architect on a solution that would now be classified as CaaS.
Reference
Gartner Forecasts Worldwide
Communications-as-a-Service Revenue to Total $252 Million in 2007; September, 2007. http://www.gartner.com/it/page.jsp?id=518407&format=print