Building Search for Your Web Site with the Windows Live Search Service

Implement Bing on your own Web site with code and instructions provided in this article.

Adding search capabilities to a Web site can be daunting—Complex problems such as indexing, relevance logic, and hosting issues such as CPU and storage can all be difficult issues to solve. As a potential solution to these issues, Microsoft offers Web site developers the ability to implement the Windows Bing Service on their sites. This article explains how site search capabilities were implemented for dev.live.com, and at the same time offers a starting point for you to add site search to your Web site.

Note Microsoft allows you to use the Web service for up to 25,000 queries per day. If you have more traffic than 25,000 queries per day, contact Search API TOU feedback to enable search past that fixed amount.

The search feature on dev.live.com itself is a sample application that uses the Windows Bing SOAP APIs from an ASP.NET Web server. You can use this sample application to add search features for your Web site in eight steps (estimated at 15 minutes):

  1. Download and install the Anti-Cross Site Scripting Library from MSDN and copy the appropriate CLR version specific AntiXSSLibrary.dll from \program files\microsoft\Anti-Cross Site Scripting Library V1.0\Library to the \bin directory at the root of your ASP.NET Web server.

  2. Copy webservices.discomap and webservices.wsdl into \App_WebReferences\com\msn\search\soap at the root of your ASP.NET Web server.

  3. Place searchProxy.aspx in the same directory as your search page.

  4. Place searchClient.js in your scripts folder.

  5. Create two div tags in your search page, one for showing instructions and the other for showing search results.

  6. Create a script tag in your search page that references searchClient.js.

  7. Create a script tag in your search page with the following.

    <script language='javascript' type="text/javascript">
    //
    // instantiate a searchclient by
    // - passing in the id of the div to show instructions as the first parameter
    // - passing in the id of the div to show search results as the second parameter
    // - passing in the scope to search, by following the (site:xxx OR ) syntax
    //
    var sc = new searchClient('keyAndCount', 'returnArea', "(site:msdn.microsoft.com OR site:dev.live.com OR site:microsoftgadgets.com OR site:forums.microsoft.com)");
    </script>
    
  8. Add a text box to enter the search keyword and a button to execute search.

    <table>
     <tr>
      <td>
       <input
        type="text"
        id="keyword"
        title="Enter search keywords"
        size="30"
        value="" />
      </td>
      <td>
       <input type="button"   onclick="sc.search(document.getElementById('keyword').value);"   value="Search" />
      </td>
     </tr>
    </table>
    
  9. At this point, search is functional.

  10. Go to http://search.msn.com/developer, get an application ID for your search application, and replace the string in searchProxy.aspx.

Building up this feature involves many complexities. The major challenges are listed in the following section.

Application IDs

Application IDs are identification tokens that are required by many Web service providers to be passed in with any API call. The primary benefit and value that application IDs provide Web service providers is the means to monitor the traffic any application may be generating, while also providing a switch to turn off a specific application ID in case it is demonstrating malicious behavior. For the purpose of this article, an application ID is acquired at http://search.msn.com/developer to get access to the SOAP APIs offered by Windows Bing.

Cross-Zone Cross-Domain

Most Web browsers employ security models to prevent a Web site from accessing data in a different domain. These security models are primarily based on the Netscape Same Origin Policy: http://www.mozilla.org/projects/security/components/same-origin.html. Internet Explorer also has a policy to enforce security zone separation: http://msdn.microsoft.com/workshop/security/szone/overview/overview.asp. For the purpose of this article, the direct effect these security models have is that a Web page (say your search page) cannot directly access the SOAP APIs at http://soap.search.msn.com. There is a need to post search requests once to your Web server, from which a request to the SOAP APIs at http://soap.search.msn.com can be sent.

Paging and Chunking

Often, Web services are capable of returning huge amounts of data—large enough for it to be unrealistic to download all the data at once. A paging behavior in which users can continue to get more data is required to manage the data chunking behavior.

Asynchronous Control Flow

The latency involved in making Web service calls is far larger than it is with typical API calls that you can do locally on your computer. In order to build a usable application that makes Web service calls, it is important to structure your code with three goals in mind: first, that your code makes the Web service call; second, that the control flow is returned so that user interaction is possible while the application waits for a response; and third, that your code responds when a response is returned from the Web service. In this method of managing control flow (typically referred to as callbacks), there is an increased need to explicitly maintain the application state and be able to piece together the intended processing of data across multiple non sequential function calls.

What makes this even more complex is the non-existence of assurance that your code will get a response from the Web service, as well as the unpredictability of the sequence of returns. (For example, you may issue Web service calls in the order of A -> B, but the response may be B -> A, or B and no A, or A and no B, or no responses for both.)

Cross Site Scripting

Cross-site scripting, often referred to as CSS or XSS, is a vulnerability in a Web site that permits an attacker to leverage the trust relationship that you have with that site. It is caused by the failure of a site to validate user input before returning it to the client's Web browser. The crux is that ill intended code causes a legitimate Web server to send a page that contains malicious script and HTML of choice. It is difficult for users to identify that the page they are seeing is the result of ill intended code.

How This Sample Can Help You

This sample works through all of these challenges, and provides a set of solutions by providing the following:

  • The sample exercises an application ID, and tips for how to manage the application ID.

  • The sample contains a server side proxy, which is one of the approaches to cope with cross-zone cross-domain issues.

  • The sample demonstrates a scrolling behavior that copes with paging and chunking.

  • The sample uses a timer to fetch more results if the user has scrolled close to the end of the current results.

  • The sample issues Web service calls, while maintaining the application state as member variables in a javascript class.

  • The sample utilizes the Anti-XSS Library, downloadable from MSDN, which is recommended above Server.HtmlEncode in ASP.NET.

Let's take a closer look.

Factoring and Files

There are three files in this sample:

default.htm

This file includes all layout definitions. It also includes event handlers to conduct layout updates when the size of the browser window gets updated.

<html >
  <head>
    <title>Search Sample</title>
    <script src="searchClient.js" type="text/javascript"></script>
     <script src="layoutLibrary.js" type="text/javascript"></script>

Those definitions reference searchClient.js, which contains a javascript class implementation encapsulating paging/chunking and asynchronous control flow.

<script language='javascript' type="text/javascript">
  //
  // instantiate a searchclient by
  // - passing in the id of the div to show
  // instructions as the first parameter
  // - passing in the id of the div to show
  //search results as the second parameter
  // - passing in the scope to search,
  //by following the (site:xxx OR ) syntax
  // var sc = new searchClient('keyAndCount', 'returnArea',
    "(site:msdn.microsoft.com OR site:dev.live.com OR 
    site:microsoftgadgets.com OR site:forums.microsoft.com)");

This statement instantiates a searchClient instance, while handing in the IDs for two DIV tags and a search scope. You would modify the search scope to include only your site if you wish to implement site search.

//
// register events
//
window.onload=winLoad;
window.onresize=winResize;

function winLoad() {
     document.getElementById("keyword").focus();
     winResize();
}

function winResize() {
     var oReturn = document.getElementById('returnArea');
     var oKeyAndCount = document.getElementById('keyAndCount');
     var util = new layoutLibrary();
     var oReturnHeight =
    (
        util.getWindowHeight() -
        (util.getOffsetTop(oKeyAndCount) +
        oKeyAndCount.offsetHeight) -
        50
    ) + "px";
    oReturn.style.height = oReturnHeight;
}

function searchOnKeyPress(e) {
    if (e.keyCode == 13) {
        sc.search(document.getElementById("keyword").value);
        return false;
    } else {
        return true;
    }
}

These statements are behaviors that improve the look and feel of your search page.

        </script>
    </head>
    <body>
      <table>
        <tr>
          <td>
            <input
               type="text"
               id="keyword"
               title="Enter search keywords"
               size="30"
               onkeypress="return searchOnKeyPress(event)" 
               value="" />
          </td>
          <td>
            <input
                type="button"
onclick="sc.search(document.getElementById('keyword').value);
          " value="Search" />
          </td>
        </tr>
      </table>

      <div class="CommonContentArea">
        <div class="CommonContent">
          <div id="keyAndCount">
            Enter search keywords
          </div>
          <hr />
          <div id="returnArea"
                 style="overflow:auto;height:auto">
           </div>
          </div>
     </div>
    </body>
</html>

As you can see, the layout of the page in the sample is quite simplistic. It has two input elements for user interaction and two div tags to show the response from the Web services.

searchClient.js

This file hides a significant portion of the challenges involved.

// JScript File - searchClient class
//
// The constructor accepts the ID of a DIV tag where instructions are shown,
// and the ID of a DIV tag where search results are shown.
// There is only one public method, namely search(keyword).
// The search results are shown in a way similar to what www.live.com had at
// one point where results can be scrolled through.
//
// This client side AJAX class is designed specifically to be used together
// with searchProxy.aspx.
//
function searchClient(eleInstruction, eleResult, strScope) {
     //
     // member variables
     //
     var m_strScope = strScope;
     var m_iInterval = null;
     var m_iPage = 0; // offset page count for search results
     var m_iPageMax = 25; // bounds to ensure no endless requests
     var m_iCount = 10; // count of search results
     var m_strKey; // keyword being searched
     var m_xmlhttp = null;
     var m_oInstructions;
     var m_oResult;
     //
     // member methods
     //
     this.search = search;

The class is defined to only have one public method—namely search—and no public properties.

//
// m_xmlhttp creation
//
if (typeof window.XMLHttpRequest != "undefined") {
    m_xmlhttp = new window.XMLHttpRequest();
} else if (window.ActiveXObject) {
      var ver = [
          "MSXML2.XMLHttp.5.0",
          "MSXML2.XMLHttp.4.0",
          "MSXML2.XMLHttp.3.0",
          "MSXML2.XMLHttp",
          "Microsoft.XMLHttp"];
      for (var i = 0; i < ver.length; i++) {
           try { m_xmlhttp = new window.ActiveXObject(ver[i]); break; }
           catch (e) {}
      }
}
if (m_xmlhttp == null) {
    m_oResult.innerHTML = "<p><h4>Failure to create
    m_xmlhttp</h4></p><p>Please refresh the page.</p>";
}

Each instance of this class contains one instance of an xmlhttprequest object, as shown in the previous example, to access data from the server without navigating users away from their current page.

//
// user interaction
//
function search(strKey) {
     // initialize references to elements
     m_oInstructions = document.getElementById(eleInstruction);
     m_oResult = document.getElementById(eleResult);

     if (strKey == "") {
         m_oInstructions.innerHTML = "Enter search keywords";
         return false;
     }
     if (m_xmlhttp.readyState>=1 && m_xmlhttp.readyState<=3) {
         return false; // prevents starting a new search if
         //an xmlhttp request is in progress
     }
     m_strKey = strKey;
     m_iPage = 0;

     m_oResult.innerHTML = "";
     m_oInstructions.innerHTML = hesc(m_strKey);

     update();
     return false;
}

The public method search internally verifies the state of the xmlhttprequest instance to ensure it is not in the midst of an operation. If a proper search keyword is provided and the xmlhttprequest instance is ready to issue another Web service call, the update function is called to show the response from the Web service.

//
// obtain search results
//
function update() {
    if (m_strKey != "" && m_xmlhttp != null &&
     (m_xmlhttp.readyState==0 || m_xmlhttp.readyState==4)) {
         try {
              m_oInstructions.innerHTML += " - Loading results...";
              var s = "http://" + window.location.host +
                             window.location.pathname;
              s = s.split('/');
              s[s.length - 1] = "searchproxy.aspx";

              m_xmlhttp.open("GET",
                  s.join('/') +
                  "?searchKey=" + encodeURIComponent
                       (m_strScope + " " + m_strKey) +
                  "&searchCount=" + m_iCount +
                  "&searchOffset=" + (m_iPage * m_iCount),true);
              m_xmlhttp.onreadystatechange=xmlhttpCallback;
              clearTimer();
              m_xmlhttp.send(null);
          } catch (e) {
              document.getElementById(eleResult).innerHTML =
                  "<p><h4>Xmlhttp error</h4></p>" +
                  "<p>Please repeat the query.</p>" +
                  e;
              clearTimer();
          }
      }
}

The actual implementation of the Web service call itself is quite simple—an HTTP GET request is sent to the server from which the current page came, and URL parameters are sent along with it.

Note that the user input (m_strKey) is encoded using the javascript encodeURIComponent function.

The timer to evaluate whether to fetch more results is stopped while a Web service call is issued.

function xmlhttpCallback() {
    try {
        if (m_xmlhttp.readyState==4) {
          if (m_xmlhttp.status ==200) {
              var s = m_xmlhttp.responseText.split('#');
              m_oInstructions.innerHTML = hesc(m_strKey) +
              "\t(" + s[0] + ")";
              if (s[0] == 0 || s[1] == "") {
                 clearTimer();
              } else {
                  s.shift();
                  document.getElementById(eleResult).innerHTML +=
                  s.join('#');
                  setTimer();
              }
          } else {
              document.getElementById(eleResult).innerHTML =
              "<p><h4>Xmlhttp error</h4></p>" +
              "<p>Please repeat the query.</p>" +
              "<p>" + m_xmlhttp.status +
              "</p>" + "<p>" + m_xmlhttp.statusText +
              "</p>" + "<p>" + m_xmlhttp.responseText + "</p>";
          }
        }
      } catch(e) {
          document.getElementById(eleResult).innerHTML =
                "<p><h4>Xmlhttp error on callback</h4></p>" +
                "<p>Please repeat the query.</p>";
      }
}

The previous example is the function that is called when a response is received from the Web service. When the response is valid, the response that is in proprietary format— <number of hit results>#<HTML fragment which shows search results> —gets parsed and appended as appropriate to the divs to show results.

//
// timer utilities
//
function setTimer() {
    clearTimer();
    m_iInterval = window.setInterval(getMoreResults, 1000);
}
function clearTimer() {
    if (m_iInterval != null) {
       window.clearInterval(m_iInterval);
       m_iInterval = null;
    }
}

The previous examples are timer utilities ensuring that there is only ever one instance of a timer running.

function getMoreResults() {
    var oReturn = document.getElementById('returnArea');
    if (oReturn.scrollTop > oReturn.scrollHeight –
        oReturn.offsetHeight * 2) {
        m_iPage ++ ;
        if (m_iPage > m_iPageMax) {
          clearTimer();
        } else {
          update();
        }
    }
}

The previous function is called when the timer fires to evaluate whether to get more results. The condition to get more results is set if the scroll position is within two pages worth of results.

    // escape functions
    function hesc(p_str) {
    return
p_str.replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;");
    }
}

The previous example is a simple function to encode and present strings in the user interface to avoid cross-site scripting concerns.

searchProxy.aspx

This file leverages the integration capabilities of ASP.NET with SOAP Web services. A proxy class definition generated by wsdl.exe is used to access the SOAP Web service.

<%@ Page Language="C#" AutoEventWireup="true" Debug="true" %> <%@ Import Namespace="com.msn.search.soap" %> <%@ Import Namespace="System.Web.Configuration" %> <%@ Import Namespace="System.Globalization" %> <%@ Import Namespace="Microsoft.Security.Application" %>

This Microsoft.Security.Application namespace is where the AntiXSSLibrary resides.

<script language="C#" runat="server">
    void Page_Load(object sender, EventArgs e) {
      try {
          string key = Request.QueryString["searchKey"];

When a GET request to this page is received, the searchKey query string is interrogated. If this query string is valid, a SOAP call is created and the response is returned using response.write.

if (!String.IsNullOrEmpty(key)) {
    string offset = Request.QueryString["searchOffset"];
    string count = Request.QueryString["searchCount"];
    CultureInfo ci = new CultureInfo("en-us");
    MSNSearchService s = new MSNSearchService();
    SearchRequest sr = new SearchRequest();
    SourceRequest[] srcr = new SourceRequest[1];

In the proxy class, there is essentially an MSNSearchService, which takes a SearchRequest. Within a SearchRequest there can be multiple SourceRequests: for example, PhoneBook, news, or the Web.

                srcr[0] = new SourceRequest();
                srcr[0].Source = SourceType.Web;
                if (!String.IsNullOrEmpty(count)) {
                    int iCount = Int32.Parse(count, ci);
                    if (iCount > 0 && iCount <= 50)
                        srcr[0].Count = iCount;
                    else
                        srcr[0].Count = 10;
                } 
                else
                    srcr[0].Count = 10;
                if (!String.IsNullOrEmpty(offset)) {
                    int iOffset = Int32.Parse(offset, ci);
                    if (iOffset > 0 && iOffset <= 250)
                        srcr[0].Offset = iOffset;
                    else
                        srcr[0].Offset = 0;
                } 
                else
                    srcr[0].Offset = 0;
                sr.Requests = srcr;
                sr.CultureInfo = "en-us";
                sr.Flags = SearchFlags.MarkQueryWords;

The Windows Bing Service is capable of highlighting the keywords that are queried within the search result. This is accomplished by setting the flags on the SearchRequest to MarkQueryWords.

                sr.Query = Server.UrlDecode(key);

Since searchClient.js encodes the keywords when sending them to server side proxy, the keywords are first decoded.

                sr.SafeSearch = SafeSearchOptions.Strict;
                
   // The recommended way to store application IDs 
   //are in your web.config file.
   // The code would look as follows:
   //sr.AppID = WebConfigurationManager.AppSettings.Get("appIDSearch");
                // This would look up the configuration defined as:
                //  <configuration>
               //      <appSettings>
   //    <add key="appIDSearch" 
   //       value="4C7A03F8DDBDD91F4EE428A59BFAB606E1DF372A" />
                //      </appSettings>
                //  </configuration>
   // Here though for convenience sake of where you can place
   //this sample, the application ID is hardcoded.
                sr.AppID = "4C7A03F8DDBDD91F4EE428A59BFAB606E1DF372A";

This is an example in which an application ID is demonstrated. Note that the application ID should ideally be in web.config. For the purpose of portability of this sample, the application ID is hard coded in the source for now.

                SearchResponse srsp = s.Search(sr);
                PrintResults(srsp);

After the Web service call is issued and a response is retrieved, an HTML fragment that represents the search results is generated.

            }
        }
        catch (Exception fault) {
            Response.Write(fault.ToString());
        }
    }

    private void PrintResults(SearchResponse searchResponse)
    {
        StringBuilder sb = new StringBuilder(4096);
        CultureInfo ci = new CultureInfo("en-us");
        
        int iResponse = 0;
        foreach (SourceResponse sourceResponse in 
searchResponse.Responses) iResponse += sourceResponse.Total;
        sb.Append(Convert.ToString(iResponse, ci) + "#");
        foreach (SourceResponse sourceResponse in searchResponse.Responses) {
            Result[] sourceResults = sourceResponse.Results;
            if (sourceResponse.Total > 0) {
                foreach (Result sourceResult in sourceResults) {
                        if (!String.IsNullOrEmpty(sourceResult.Title)) 
                            sb.Append(
                                "<h5><a href=" + 
                                sourceResult.Url + 
                                ">" +
                                
AntiXSSLibrary.HtmlEncode(sourceResult.Title) + 
                                "</a></h5>");
                        if (!String.IsNullOrEmpty(sourceResult.Description)) 
                            sb.Append(
                                "<p>" +
                                
AntiXSSLibrary.HtmlEncode(sourceResult.Description) + 
                                "</p>");
                        if (!String.IsNullOrEmpty(sourceResult.DisplayUrl)) 
                            sb.Append(
                                "<p>" +
                                
AntiXSSLibrary.HtmlEncode(sourceResult.DisplayUrl) + 
                                "</p>");
                        if (!String.IsNullOrEmpty(sourceResult.Url)) 
                            sb.Append(
                                "<p>" +
                                AntiXSSLibrary.HtmlEncode(sourceResult.Url) + 
                                "</p>");
                        if (!String.IsNullOrEmpty(sourceResult.SearchTags)) 
                            sb.Append(
                                "<p>SearchTags: " +
                                
AntiXSSLibrary.HtmlEncode(sourceResult.SearchTags) + 
                                "</p>");
                        if (!String.IsNullOrEmpty(sourceResult.CacheUrl)) 
                            sb.Append(
                                "<p>CacheUrl: <a href=\"" + 
                                sourceResult.CacheUrl + 
                                "\">" +
                                
AntiXSSLibrary.HtmlEncode(sourceResult.CacheUrl) + 
                                "</a></p>");
                }
            }
        }
        
        sb.Replace("&#57344;", "<strong>");
        sb.Replace("&#57345;", "</strong>");
        this.Response.Write(sb.ToString());
    }
</script>

Conclusion

This sample is an example of the kind of application that uses the Windows Bing SOAP APIs. Furthermore, the sample is factored in a way that makes it is easy to reuse the code and implement site search capabilities for your Web site. This can be accomplished in a surprisingly short period of time (estimated around 15 minutes).

Microsoft may choose to deny Web service calls issued from Web sites on which there are more than 25,000 queries per day, so please be aware of your traffic volume. If you have more traffic than 25,000 queries per day, contact Search API TOU feedback to continue receiving search results past that fixed amount.