Planes, Trains and Automobiles: Choosing Alternate Transports for Web Services
Simon Guest
Microsoft Corporation
July 2005
Applies to:
Enterprise architecture
Web Services
Message Queues
HTTP
SMTP
Summary: Is HTTP a great fit for every Web Services issue that exists today? This article reviews scenarios in which alternate transports for Web Services may offer a better solution over HTTP. (15 printed pages)
Introduction
Approach I
Approach II
Many implementations of Web Services exist today across multiple platforms and environments. The majority of these share one thing in common—they all use HTTP as the underlying transport. The ubiquitous nature of HTTP has helped Web Services achieve their current level of adoption.
Despite this, is HTTP a great fit for every problem? Are there application architectures that would benefit from using other transports? What are the advantages and disadvantages of doing so?
This article sets out to answer these questions and more, looking at scenarios in which alternate transports for Web Services may offer a better solution over HTTP.
As many people know, HTTP had a long history before Web Services. It has been the default transport for browsing Web pages since early versions on NCSA Mosaic were released to the world.
HTTP as is stands is a pretty good fit for Web Services. Being pervasive, it typically works well across Firewalls and Proxy Servers, elements (such as WSDL) are easy to test using HTTP, and many HTTP stacks and servers are available to build implementations on. If we also go back and look at some of the initial "design goals" for Web Services, we see a lot of momentum around publicly facing Web Services. As a result, HTTP makes for a perfect choice.
Despite the pervasiveness of HTTP, however, there are scenarios in which it doesn't always fit. In my experience, I've seen these fall into three categories:
- Asynchronous connections
- Offline and local applications
- Peer-to-Peer computing
Let's take each of these areas in turn, and examine scenarios in which HTTP may not be the perfect match:
Contoso consulting, a fictitious organization, employs nearly 5,000 consultants worldwide. In recent years they have undergone a phenomenal expansion, hiring many staff members at the end of the dotcom era. Due to this increase in headcount, one of the problems they now face is with the submission of timesheets. Every week, each consultant must submit a timesheet so that clients' bills are accurate and timely.
Their current model involves e-mailing a timesheet (created using a template in Microsoft Excel) to the accounts department. After the timesheet arrives, a member of the accounts group enters it into their accounting system to record hours to be billed. The accounting system is currently mainframe based.
As you can imagine, this model for submitting timesheets is not scaling well with their current expansion. Despite hiring more people for the accounts department, the method of dealing with incoming timesheets using e-mail is becoming laborious due to the manual process of taking the data from Excel into the accounting system.
To help with this, internal IT has designed a new timesheet submission service. Using Web Services, this service will sit in front of the mainframe, accept a timesheet from a consultant, and automatically enter the details into the accounting system. The Web Service design has been kept simple, and a Smart Client application has been developed for the submitting these, as shown in Figure 1.
One of the design goals early on, however, has been to ensure that the submission of timesheets is performed asynchronously. The accounting system is becoming dated, and the thought of 5,000 consultants all submitting timesheets on the last day of the week is somewhat overwhelming. To overcome this, the architect for the system has decided to implement a message queue between the Web Service and the Accounting system, as shown in Figure 2.
The responsibility of the queue will be to batch requests from the clients before submitting them to the accounting system. This asynchronous design will avoid overloading the accounting system with too many concurrent requests, and at the same time "release" the Smart Client application to do more tasks (that is, the Smart Client doesn't have to wait until the accounting system has processed the timesheet before moving on to other tasks)
This all looks good in principle, but the IT group notices one potential problem. What happens if there is a problem with the timesheet submission?
Imagine the following sequence: A consultant submits a timesheet using the Web Service in Figure 2. Everything works and the timesheet request is placed on the message queue. After a couple of hours (it's Friday evening and the system is busy), the accounting system reads the timesheet from the message queue. In this process it's discovered that the consultant has incorrectly assigned some hours to a project that was previously marked as complete. The accounting system needs this information before it can continue to process the request.
Using the current design, what are the options?
The accounting system could raise an alert to a member of the accounting staff (maybe a system message) to indicate that more information is required. The member of the accounting staff could then go chase the consultant for the correct data. This would close the loop, but is still a manual task—and will only scale so far with the number of people in the organization.
Alternatively, the accounting system could raise an alert to the consultant directly—maybe sending an e-mail to the consultant for the additional information. Again, this should close the loop, but the request for information is still disconnected from the original process. How does the consultant correlate the e-mail to the timesheet submitted? How does the e-mail accurately describe what information is missing? How does the accounting system correlate a new or modified timesheet submission with the old timesheet and the e-mail that was sent? How does the accounting system follow up if the consultant just deletes the e-mail? What happens to the existing submission? There is a possibility this loop will never close.
Let's take a step back and ask why an e-mail was required in the first place. Why couldn't the accounting system just communicate the information directly to the Smart Client application? The answer: HTTP is a request/response protocol.
Once the timesheet has been submitted and the HTTP request/response has been completed, as shown in Figure 3, it's nearly impossible to communicate back to the client for more information. Even if the client is running a local Web Server (to accept incoming Web Services requests), what happens if they go offline, or are behind a firewall at a customer's site? How if their IP address and/or hostname has changed since the last communication? Even more frightening is the question of who manages the Web server implementations on each of the 5,000 consultant's laptops.
Because HTTP is a request/response protocol, and as such it's very difficult for the service to follow up with the client, alternative asynchronous measures (that is, the e-mail) have to be taken. Unfortunately, because this e-mail is effectively "disconnected" from the original request, it can often take a lot more work to correlate what needs to happen.
To see how we can design a solution using transports other than HTTP, let's look at a couple of alternate approaches.
Coming to the conclusion that HTTP was the culprit was easy. Our first approach for a solution requires thinking about an alternative transport to replace or retrofit the HTTP connections in our design.
Taking our existing design, let's replace the HTTP communication with a message queue. The implementation is unimportant at this stage (it could be MSMQ, IBM MQ Series, Tuxedo, and so on), as long as it's able to reliably handle asynchronous requests.
It's important to note that we are still going to be using Web Services—we are very much sending and receiving SOAP messages, except they are being sent using an asynchronous message queue as opposed to HTTP. As illustrated, we can use some kind of Web Service enabled transport for the message queue. So, how does this architecture work with our new scenario?
The Smart Client submits a timesheet (and a corresponding Web Service request is created). This request is placed on the message queue directly instead of sent over HTTP. Once the message is placed on the queue, the client can disconnect safely.
The server-based Web Service will in turn pick up this message, and then communicate with the accounting system to process the request. In the case where the accounting system has to ask for additional information, a new message (a request to the consultant) is placed on the queue—the addressee is the Smart Client application.
Figure 1. A Web Services Façade is used to expose the Web Service
Figure 2. A Message Queue is used to batch requests from clients to the mainframe
Figure 3. Asynchronous response back to the client is difficult
Figure 4. A timesheet is submitted directly to the message queue
Figure 5. The accounting system requests more information using the message queue
Figure 6. Using SMTP for the Web Services Request
Figure 7. A local message queue is used to hold the request offline
This message will remain on the queue until the Smart Client reconnects. To ensure that the message is picked up in a timely fashion, we may consider a background service on the client that connects to the message queue, and launches the timesheet application "missing information" dialog box for the consultant.
This provides for a solution to our closed loop issue, and could well offer a more automated approach—but it has one flaw. The Smart Client must be able to connect to the message queue in order to process incoming requests from the accounting systems. The majority of message queue vendors do this by accessing some proprietary message queue APIs. How about if the consultant is on the road? How about if the consultant only has Web and e-mail access in an airport? These messages are not going to be picked up until he or she connects to the corporate network, which could be unacceptable.
Using another transport, let's look at a second approach:
One of the main problems with the original design is that once the accounting system sent the e-mail to the consultant for more information, it effectively created an open loop. This design relies on the consultant having to manually associate the incoming text mail message with the process in the application.
The transport itself, however, is reasonably effective. With the reliability of e-mail these days, it's more than likely that the consultant will have received the e-mail.
With this in mind, we could consider a new design that builds on this:
Here, a Web Service request over HTTP is still used to make the timesheet submission. As with the original design, this is committed to a message queue to release the Smart Client connection. In this design, if there is a problem with the timesheet, an e-mail is sent—but not an e-mail to the consultant. Instead, an e-mail is generated that contains a Web Service request for the originating Smart Client application. What we are doing is initiating a Web Service request for the additional information, but using SMTP as the transport.
The Smart Client needs a couple of modifications to make this work. There needs to be some way of retrieving the SOAP request using e-mail—this could either be a filter on the consultant's inbox, or a separate e-mail account for the Smart Client application. Secondly, once the e-mail is received, the Smart Client application must process this and cause the correct action to happen on the client (a dialog asking the consultant for the missing information, for example). One advantage about this approach is that it uses existing transports, allows the accounting application to initiate a request to the Smart Client application, and (providing we can access e-mail from a remote location) does not restrict the consultant to having to connect to the corporate network to submit expenses. Remember also that because the request is a Web Service, other standards (such as WS-Security) can be equally applied—providing integrity and confidentiality for the message even though it's being sent over public SMTP servers.
This concept of asynchronous connections can also equally apply on the client. Taking our previous example, let's imagine that the consultant is about to submit a timesheet. They generate the timesheet within the Smart Client application and then submit (using HTTP) to the Web Service.
This works perfectly well—providing there is a connection to the Web Service. What happens when the consultant submits the timesheet at a location where there is no connectivity (when he or she is on an airplane at 30,000 feet, between customer's sites, for example)? In this instance we would have to think of some kind of offline approach. Upon clicking the "Submit" button, the design of the Smart Client would have to detect that there was no network connection and the operation would be suspended or saved to a local database or queue.
Another alternative is to consider a second transport. Instead of using HTTP directly from the Smart Client, we could consider a local queuing transport to provide this offline functionality automatically.
Here, a local queue (using MSMQ, for example) is installed on the consultant's laptop. Instead of using HTTP, the SOAP request is placed on the queue by default. A second process, potentially running in the background on the consultant's machine, would monitor the local MSMQ instance for new messages and, on a frequent basis, would check to see whether a connection to the HTTP-based Web Service could be established. Once these two can be connected, the message is forwarded between the transports.
Figure 8. Two applications on the same machine communicating using HTTP
Figure 9. TCP or In Process provides a more direct way for local applications to communicate
For Smart Client applications, the use of alternative Web Services transports also opens up other options:
Imagine that you have two Smart Client applications running on the same machine that need to communicate. Initiating calls using Web Services over HTTP could be overkill, as it would require a local instance of a Web server, and each request would likely traverse the network stack on the machine.
A more efficient way of doing this could be to use either a TCP (socket) based transport or an inprocess (or shared memory transport). Here, as shown in Figure 9, the two applications on the same machine can communicate using standard Web Service requests and responses, but using a lightweight and manageable transport.
In addition, using our timesheet example, how about if we wanted to implement a way of logging all Web Service requests (for auditing purposes). We would probably approach this by creating a log of the message before it leaves for the service, which would involve a filter or class to take the message to the database.
This works, but an easier approach (as shown in Figure 10) may be to implement a Web Services transport to do this. A transport could use a SQL database to log outgoing requests, yet to the Smart Client application it looks like just another transport.
Here, the Web Service request is sent using two transports. The first goes to the intended recipient (using HTTP). The second is sent to the database for logging using a SQL transport.
Figure 10. Web Services transport is used to log message to SQL
Figure 11. Bob's Web Service processes requests and responses over SMTP
Figure 12. Joe's Laptop sends 50 Web Services Requests using SMTP
Figure 13. The Smart Client Application processes the responses returned
Finally, another area that has great potential for alternate Web Service transports is peer-to-peer computing. Let's take a look at an example:
Bob is a consultant at Contoso. On his laptop, he has a directory of PowerPoint slides that he uses for presentations with customers. This directory travels with him wherever he goes. It's constantly being worked on, and must work in both online and offline scenarios.
Being a good citizen, Bob wishes to share these PowerPoint slides with his fellow co-workers—both inside the company and with members of other organizations. Many people e-mail him today asking whether he has a particular PowerPoint slide on a topic—and while this works, searching and reply to these is consuming a lot of Bob's time.
Bob is considering building a centralized Web Service to host his PowerPoint slides. It should be available to everyone, yet he must be able to access these in offline scenarios. He considers the steps required to implement such a service.
- Setup of a Central Server. Bob is going to have to take his directory of slides and host it somewhere centrally. This will include not only finding enough disk space, but also a consideration for managing backups and updating with the latest versions.
- Exposing a Web Service. With the server setup, Bob is going to have to install a Web server on the machine, create a Web Service, and work with the local IT group to make sure that it is correctly hosted behind Contoso's Firewall (probably in the DMZ).
- Create a Smart Client Application to Access. Bob is thinking of creating a Smart Client application that will let him keep an offline version of the slides he needs at a moment's notice.
Bob looks at this—it certainly looks like a lot of work, plus he's uncertain how well it will scale. How about if the other 5,000 consultants in the organization want to do something similar? Will they have to go through the same approach? How about if they are not as technically savvy as Bob?
Bob takes a step back and thinks about why he wants to do this—the current system works pretty well, it's just that he gets flooded with too many e-mail requests about presentations that he's recently delivered, or may have.
He could potentially create a Web Service on his laptop to handle these incoming requests—the Web Service could search his directory of PowerPoint slides and retrieve certain ones for clients. The problem with this approach using HTTP is that Bob's laptop has to be on and accessible for this to work. Generally, Bob is out of the office a lot, and how does he allow access to his laptop through a Firewall for external customers? It's looking fairly unmanageable.
After reading about using alternate transports for Web Services, Bob comes up with a new design:
He is going to create a Web Service for his laptop, but instead of accepting incoming connections over HTTP he will use SMTP (e-mail), as shown in Figure 11. Clients can send him Web Service requests to search and retrieve his local store of PowerPoint files. To do this, Bob will create a small Smart Client application that generates these requests.
The beauty of this design is that Bob and others can now take advantage of the distributed functionality that e-mail provides. Bob shares his new Web Service application with 50 of the other consultants at Contoso. What we now have is a very dynamic way of using Web Services to look up PowerPoint files that are held locally on a number of machines.
For example, Joe is looking for a PowerPoint presentation on the topic of C#. He enters the query "C#" into a Smart Client application. This creates a Web Service request that is sent using SMTP to an e-mail distribution list that contains the 50 consultants running Bob's Web Service.
Once the message is received, the Web Service running on each of these laptops performs a search based on Joe's criteria. The list of results is then sent back to Joe's calling application, which can display them as responses are received (again, using SMTP).
Joe can now start searching through the results as they come back (remember, just like e-mail he doesn't need everyone to reply—just enough people that have the PowerPoint slide he is looking for). When he finds the correct one, a similar request, using Web Services over SMTP, can be made to actually acquire the presentation.
The approaches you have seen in this article may raise more questions than they provide answers for. Hopefully, though, you can see that using Web Services with alternate transports can open up a new range of applications that have until now been restricted by the use of HTTP.
One of the questions that you may have is "When should I implement a transport other than HTTP?" To help answer, and to summarize the scenarios listed in this article, you can refer to Figure 14.
Although relatively a new area, significant progress is being made around implementing alternative transports for Web Services. These include:
"Indigo"
Indigo is the codename for the next generation distributed computing environment from Microsoft. Indigo offers the promise of multiple transports for Web Services, together with a unified programming model.
Indigo natively supports HTTP, TCP, and MSMQ in the March 2005 CTP. The programming model allows an easy-to-extend interface for other transports.
WSE (Web Services Enhancements)
For those wanting to implement this today, alternate transports can also be realized using WSE. WSE provides an API called a custom transport, which allows transports other than HTTP to be used. Custom transports today include samples for MSMQ (https://www.codeproject.com/cs/webservices/SoapMSMQ.asp), IBM MQ Series (https://workspaces.gotdotnet.com/wsemqs), SMTP (https://hyperthink.net), UDP (https://dynamic-cast.com), TCP, and In Process (both samples ship with WSE).*
*Please note: Interoperability using TCP is not supported using WSE.
JMS (Java Message System) API
A number of Java application server vendors are now providing Web Services support through JMS. This allows SOAP request and responses to be processed on a JMS queue.
JAXMail (Java API for XML Mail)
JAXMail (part of Sun JWSDP) is a JAX-RPC (Java Web Services) extension to provide support for the SMTP protocol.
Figure 14. Alternative Transports
Simon Guest is a Program Manager in the Architecture Strategy team at Microsoft Corporation, and specializes in interoperability and integration. Simon holds a Masters Degree in IT Security from the University of Westminster, London, and is the author of the Microsoft .NET and J2EE Interoperability Toolkit (Microsoft Press, Sept. 2003).
Simon can be reached via his blog at https://www.simonguest.com.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.