by Larry Clarkin and Josh Holmes
Summary: A mashup
is a technique for building applications that combine data from multiple
sources to create an integrated experience. Many mashups available today are
hosted as sites on the Internet, providing visual representations of publically
available data. This article describes the history and architecture of mashups,
and explores how you can create mashups for use in your enterprise. We also
impart some wisdom gained from projects with customers and systems integrators
who have implemented mashups for the enterprise.
Contents
History of
Mashups
Architecture of a Prototypical Mashup
Key Success Factors
Risks
Conclusion
Resources
About the Authors
History of Mashups
Mashups have gained popularity within the last
few years, swept along with the momentum around Web 2.0. Early mashups took
data from sources such as Craigslist (http://www.craigslist.org) and combined
them with mapping services or photo services to create visualizations of the
data (for example, http://housingmaps.com). Many of these early mashups were
consumer-focused, although recently there has started to be both interest and
acceptance of mashups in the enterprise. Organizations are starting to realize
that they can put their well-defined services that do discrete bits of business
logic together with other existing services, internal or external to the organization,
to provide new and interesting views on the data.
As techniques for creating mashups have matured,
we are starting to see companies build business models around mashups. For the
real estate market in the United States, both Redfin (http://www.redfin. com)
and Zillow (http://www.zillow.com) use large amounts of public and private real
estate data (from sources such as county record offices and the Multiple
Listing Service), combined with internal “value added” services, the result of
which is displayed to the user on a map (using Microsoft’s Virtual Earth and
Google Maps, respectively). There are many possible types of information that
you could add to a real estate site; other similar listings, information on
local schools, local hospitals, recent crime rates, classifieds for job
postings and more.
Architecture of a Prototypical Mashup
Although there is a great variation in the user
interface and the sources of data for many mashups, we can still derive common
architectural patterns that they all share. For example, all mashups are
RESTful in nature (they conform to the Representational State Transfer
principles). Figure 1 shows an architectural rendering of a typical mashup.
Figure 1: The architecture of a typical mashup application (Click on the picture for a
larger image)
Data
The core element of any mashup is the data being
aggregated and presented to the user. Although the above diagram depicts the
source of the data as a database, the concept of a mashup does not require a
database that is local to the mashup software or the client. The data can
strictly come from Web services where data is serialized to XML or JSON (this
is the most common pattern in Internet-based mashups). There are architectural
trade-offs to be made from storing the primary data in a local data store and
accessing the data with every request. As mashups move from being
Internet-based applications to internal to the enterprise, they tend to depend
less on external data stores.
RSS feeds
Use of RSS (Really Simple Syndication) feeds is
a common source of primary or supplemental data for mashups. RSS feeds are easy
to consume as they are XML documents, and many libraries exist to manipulate
the feeds. The format and specification for RSS is well documented and
understood with only a few variations from version to version. The
extensibility of RSS is also well known, as demonstrated by the number of
extensions in use today, such as adding attachments to the feeds, creative
commons licensing information and location information.
Web services
It is also common to include calls to Web
services within mashups. It is common to see both WSDL-based Web services and
REST-based Web services, with some services exposing both styles. Web services
can be used to provide additional data or used to transform the data being
mashed up. For a map-based mashup, the data may only contain street addresses
and a call to a WSDL or REST-based Web service may be required to translate the
street address to a Longitude / Latitude coordinate for the map.
Platform services
Figure 2 depicts a special class of services
that are used to create mashups. We are calling these platform services because
they provide functionality beyond the typical request/response model of
traditional Web services. A typical example of this is the mapping service
provided by Virtual Earth. Virtual Earth includes an entire array of
server-side and client-side processing capabilities, as well as the “services
in the cloud.” We see the emergence of cloud-based building block services that
begin to create value. For example, the Amazon S3 service offers storage “in
the cloud”; this makes it easier to expose any static data by uploading it to a
hosted storage provider. Microsoft’s BizTalk Services is a platform service
that provides a different capability – the ability to relay communications from
the Internet across a corporate firewall, thus exposing internal services for
consumption by business partners or third parties building their own enterprise
mashups. Relayed communication as provided by BizTalk Services is also useful even
within a single enterprise, with many business units or with numerous network
segments. An Internet-based communications relay can eliminate physical network
topology as a communications obstacle.
Figure 2: Using BizTalk Services as a platform service to
relay information (Click on the picture for a larger image)
Mashup applications
Thus far we have identified many of the types of
services that can be used to create mashups, but we have not addressed the
importance of the software that creates and delivers the mashup experience.
Think of the mashup application as a combination of middle-tier services and
some lightweight business logic. For Internet-based mashups, the software is
usually written using Web technologies (like PHP or ASP. NET), but we are
starting to see the line between server processing and client application blur
with the emergence of Rich Internet Applications (RIAs). RIAs are applications
that run inside the browser with rich functionality similar to that of many
desktop applications. These typically do not require a client side installation
beyond a generic plug-in such as Adobe Flash or Microsoft Silverlight.
Client application
The client application is how the mashup is
delivered and presented to the user. For public Internet mashups the most
common client application is a Web browser that receives HTML and JavaScript
from delivered from a Web server over HTTP. However, we have started to see
mashups being delivered with RIA platforms as well. In this model, the client
can provide more visual richness and can even provide some of the mashup
processing on the client side.
Future Direction of Mashups
With early versions of mashups much of the
implementation was very tedious and time consuming. Many of these used
server-side processing (often with PHP or PERL) and tedious client side
scripting in the form of JavaScript in order to create the mashup experience.
It was common for the person creating the mashups to create custom code to
parse the XML return sets that they received from their data sources.
As time has passed and the development process
has matured, a lot of the tedious coding has been replaced by frameworks and
better codification of standards. Custom scripts on the server side are
starting to be replaced by standardized libraries that will automatically
generate the required client-side script. We are also seeing standardization in
the message formats. An example of this is the GeoRSS extension to the RSS
standard that allows you to specify the Longitude and Latitude that is related
to the items in the feed. All three major mapping service providers (Google,
Microsoft, and Yahoo) support GeoRSS, which means mashups that use this RSS
extension require almost no coding.
The creation of mashups was once only the domain
of the developer, but there is a movement to put the ability to create mashups
directly into the hands of the end customer. As the frameworks to create
mashups are becoming simpler to use and the message formats are becoming more
standardized, the next logical step is to build tools that can create mashups.
Some of these tools will be targeted at the end consumer of the mashups. Pipes
by Yahoo and Popfly by Microsoft are examples of frameworks and tools for
allowing users to create their own mashups.
We are seeing an increase in the importance of
common schema and metadata in the development of mashups. We described in the
previous section how the common schema of RSS feeds makes it easy to
incorporate them into mashups. The same principles will need to be applied to
other types of data as well (as robust as RSS is, we cannot model all of our
data into that format). We are already seeing the emergence of other standard
schemas, such as KML (Keyhole Markup Language) to describe geospatial data.
Even more interesting will be Microformats, a promising framework for
delivering semantic meaning which can easily be read by software such as
mashups.
Mashups in the Enterprise
A great way of exploring mashups in the
enterprise is with an example. Let’s imagine that you are an application
architect for a call center system that receives calls about warranty and parts
service. Using the phone number of the caller, we could display the records for
that user, including a purchase history. This interesting application has
already been implemented today in most call centers. But what if, in addition
to looking up the customer information, we plotted the phone number on a map
using a publicly available service and also displayed a list of local service
centers or parts suppliers for our products overlaid on the map? With this data
in hand, we might be able to answer the customer’s questions in seconds. What
if we also looked up the current weather conditions in that area or the local
sports teams and their recent games for a conversation starter or filler on
long running calls?
Using this example, Figure 3 shows a mock
customer service mashup that could be used by a representative when they
receive a call. Combining data from public services (such as weather and news)
with internal sources of data (a customer service alerts blog and a database of
service locations), this application combines the mashup elements with the
accessibility of a portal.
Figure 3: An example mashup for our call center (Click on the picture for a larger image)
As shown in Figure 3, the embedded mashup puts a
tremendous amount of ready information at the call service agent’s finger tips
– ranging from the top service items so that they can answer questions quickly
to reverse lookups on the phone number to get conversation starters. This type
of mashup is strictly internal but provides tremendous value on the initial
call with any client.
There are several key elements to making this
mashup successful. First and foremost, it is contextual to the task at hand.
Second, it prioritizes the information in the order in which is most likely to
be useful. You have to know who you are talking to, what products they have
registered, what the top service items are for those products, and then start
trying to answer questions that they may have such as where the local service
centers are for the customer’s location. Third, once we have the customer
information and locale, each of the panes of content are standalone and do not
require interaction with the other parts. This allows gathering and aggregation
of the data for each of the parts to be performed asynchronously.
Realizing Return on Investment (ROI) is a common
concern for Service-Oriented Architecture (SOA) deployments. Many organizations
find it difficult to justify the upfront investment for creating services for
different functionality when it would be easier to create a single application.
Enterprise mashups are a great example of how an investment in SOA can
provide great value. (See the sidebar “Mashups deliver ROI for IDV Solutions.”)
Leverage the services you have built
Immediate return for a SOA can be realized when
organizations start mixing and matching these services for new and exciting
purposes. It can be exciting to start leveraging services or applications in
ways that couldn’t have been imagined when they were written. (See the sidebar,
“Quicken Loans leverages mashups for fast results.”)
Leverage the services others have built
It’s important to realize that you can’t own all
of the information in the world; and there’s a fairly high return on investment
when you can simply leverage someone else’s hard work instead of us inventing
that particular wheel.
Build services that others may leverage
Another opportunity for enterprises, as mashups
become more and more mainstream, is to build services that can easily be
consumed by mashup applications. Going back to the example we cited, the
customer service representative can make her customer happy by providing nearby
store locations. But imagine if the stores themselves exposed their inventory
and product availability to the mashup. Now the service representative can
provide even more detailed and valuable information to the customer on the
phone. That kind of service would be valuable for the customer, the call
center, as well as the store itself.
Platform agile
As discussed earlier, many mashups have been
created for delivery on the standards-based Web platform (HTML and JavaScript).
This is not a limitation of mashups themselves, but merely the standard way of
delivering applications on the Internet. As we see mashups move into the
enterprise, we will see a growing number of mashups built on RIA platforms
(Like Adobe’s Flash and Microsoft’s Silverlight) and even the emergence of full
rich desktop mashups built on Windows Presentation Foundation. Full 3-D
rendering on a rich client platform can increase the visual appeal of the
mashup. Enterprise mashups can take full advantage of these richer platforms as
many will have greater control of the desktops.
Figure 4: Quicken Loans leverages mashups for fast results (Click on the picture for a larger image)
Quick turnaround
If you are leveraging existing Web services and
robust platform services, the development time of mashups can be measured in
hours or days, rather than weeks or months. The quick turnaround time for
mashups can change the way that IT departments interact with the users in their
organizations. Joint development and multiple iterations of the applications
being delivered in a short time can become more realistic goals.
Rich visualization of the data for your users
Perhaps the greatest benefit achieved from
mashups is the visualization that they create for the users. Data that is
visual in nature is far easier to understand and can create greater meaning for
the customer. This is not limited to the mapping examples that we have
discussed so far. Other techniques such as heat maps and tree maps can also be
created as data visualizations in mashups.
Key Success Factors
Look for utilitarian services to consume (simple
before complex). The more things that a given service does, the harder it is to
mash it in with other services. The right services do one thing and do that one
thing well. For example, a service that returns people data with all of their
addresses, loans, cars, trains, planes and automobiles, may be offering too
much data. You might want to create a subset service that returns just name and
address to mash with, as it’s a waste of bandwidth and processing to pull data
that you are not going to use.
Keep mashups read-only
Mashups are agile views into the data that they
present. They are not an all-powerful editing surface capable of editing any
and all data thrown at them. If someone wants to edit that data, they should go
back to the application that created the data to do the editing. The mechanism
for doing this should be obvious as well. This will dramatically reduce the
complexity (and improve the agility) of your mashup applications.
Data freshness matters
Business decisions will be made based on
enterprise mashups. This is dangerous when pulling data from multiple sources
as the data might be stale. As with any report that you want to make decisions
on, it’s important to have the timestamp for when the data was last refreshed.
That will greatly improve the clarity and usefulness of your mashed data, and
the confidence with which the decisions can be made.
Don’t try to solve enterprise data problems
As mashups move into the enterprise, it is
inevitable that you will run into data fragmentation and normalization issues
that are a result of years of legacy application development. For instance, to
mashup sales by geographical area might seem like a logical mashup to create
for your general sales manager. But what if the data is stored in three
different systems by product line and each has different data structures and
business rules governing them? Our advice is to find a better candidate for
mashup technologies and let a more traditional project (such as a systems
consolidation effort) resolve the more complex problems.
Understand authentication and authorization issues
Authentication and authorization can be huge
impediments to doing a mashup of several different services if they all require
authentication with different schemes. For example, issues can arise if you
have one service that requires username and password and another that requires
an X509 certificate. This is not insurmountable, but can be a large roadblock
that has to be overcome. There are several strategies to attacking this
problem. You can avoid services that require authentication or that require
authentication via methods that you don’t support. While you can get started
this way, the reality is that there are services that you need to leverage that
you can’t get to without authentication. You could try to force others, or pay
them, to offer the type of authentication that you are comfortable with. The
reality, however, is that your application will most likely have to support
service authentication using many different mechanisms, which all need to be
taken into consideration.
Risks
When implementing mashups, four areas of risk
should be considered:
Dependency on services
One of the major risks in creating enterprise
mashups is when you create a dependency on services that are external to your
company (such as the “services in the cloud”). The terms of the service
agreements should be investigated before the dependency is created. For
example, some services require that the software using the service be a public
facing Internet site; this might occur when the service has an ad-based revenue
model. The terms of service may also be subject to change in some cases in ways
that could be detrimental to your use of that service. To mitigate this
concern, look for service providers that have a model that fits your usage.
Loss of data fidelity
Loss of fidelity in the data being displayed is
another key risk. As data is visualized, there is a tendency to make the data
fit the confines of the presentation surface. There will be a natural tendency
to not visualize small amounts of data or to group data into larger collections
in order to conserve space on the presentation surface. This has the potential
to “warp” the end user’s view of the data.
Figure 5: Mashups deliver ROI for IDV Solutions (Click on the picture for a larger image)
Politics
Politics can also be a hurdle when creating
mashups. If you didn’t create the service, then it might or might not do
exactly what you wanted and it takes too long to get the originator of the
service to change it to your needs. This Not-Invented-Here (NIH) mentality is
fatal to mashups. This also manifests itself in trust. If you don’t trust the
provider of the service, then you will not rely on that service in your
mission-critical application.
Uncontrolled consumerization
Consumer technologies are increasingly being
used inside the enterprise without the awareness or governance or corporate IT,
according to a recent report from Gartner, Inc. Consumer-facing tools are
tightly focused on the creation of a mashup and the visualization, so it can be
very easy to create the initial mashup, but longer term maintenance of the
mashup is not taken into account. Also consider how dangerous would it be to
have an end user upload corporate data into a public mashup tool like Pipes or
Popfly.
There are several ways to mitigate these risks.
First, for internal or external services, put into place a Service Level
Agreement (SLA) that clearly describes responsibilities of both parties,
response time for change requests, uptime requirements, bandwidth restrictions,
and all other relevant details. Second, lay out the possible fall back
requirements in your application when there is a failure calling a given
service. For some services, it might be acceptable to just not show that data;
with other services, you could cache data that you can fall back on anytime
there’s a failure. You might need a secondary service lined up as a backup to
call in case something goes horribly wrong.
Finally, you will need to address the
consumerization issues and the politics. Unlike increasing the reliability and
redundancy of services, this requires a governance process. If your
organization has a mature process for governing the use of services, then you
should leverage that process for mashup creation and consumption as well.
Conclusion
A January 2007 survey by McKinsey asked
corporate customers about their adoption of Web 2.0 technologies. As you would
expect, a great many of them had either invested or were planning on investing
in one or more Web 2.0 technologies. The surprising part of the survey to us
was that mashups were only in use or under consideration by 21 percent of the
respondents and a majority of respondents, 54 percent, were not considering
their adoption at all.
We think that the low response to mashups in the
enterprise is due to the relative newness of the technology compared to other
ones that were included in the survey (like Web services, podcasts and RSS
feeds). The fact that tools to build mashups are just starting to emerge is a
factor as well (many of the ones mentioned in this article are in beta and
alpha stage as we write this article). We are certain that the techniques will
become more common and the tools will mature. Concepts such as the Internet
Service Bus (page 2) should make build these enterprise mashups both easier and
more useful. We feel that mashups have a place in the enterprise and we
encourage you to investigate their adoption.
Resources
·
“Consumerization Gains Momentum: The IT Civil
War,” Gartner Special Report, 2007 (summary) http://www.gartner.com/it/products/research/consumerization_it/
consumerization.jsp
·
“How Businesses are Using Web 2.0: A McKinsey
Global Survey,” The McKinsey Quarterly, August 2007 http://www.mckinseyquarterly.com/article_abstract_visitor.aspx?ar=1913
·
Mashup (Web application hybrid), Wikipedia http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29
·
Redfin http://redfin.com
·
“Web 2.0 in the Enterprise,” Michael Platt, The
Architecture Journal, Journal 12
·
Zillow http://zillow.com
About the Authors
Larry Clarkin is an architect evangelist with
Microsoft. Larry has over 15 years of experience in the design and construction
of applications in a variety of technologies and industries. He specializes in
integrating Web technologies with Legacy and ERP systems, which he has been doing
for over 10 years. When not working with customers, you will find Larry talking
technology at local User Groups. You can contact Larry through his blog at http://larryclarkin.com.
Josh Holmes is an architect evangelist with
Microsoft. Prior to joining Microsoft last October, Josh was a consultant
working with a variety of clients ranging from large Fortune 500 firms to
smaller sized companies. Josh is a frequent speaker and lead panelist at
national and international software development conferences focusing on
emerging technologies, software design and development with an emphasis on
mobility and RIA (Rich Internet Applications). Community focused, Josh has
founded and/or run many technology organizations from the Great Lakes Area .NET
Users Group to the Ann Arbor Computer Society and was on the forming committee
for CodeMash. You can contact Josh through his blog at http://www.joshholmes.com.
This article was published in the Architecture Journal, a print
and online publication produced by Microsoft. For more articles from this
publication, please visit the Architecture Journal Web site.