Training
Learning path
Microsoft Search fundamentals - Training
Learn about Microsoft Search including where users can search, the answers and results they'll see, and how you can tailor the search experience for your organization.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Windows Desktop Search 2.x is an obsolete technology that was originally available as an add-in for Windows XP and Windows Server 2003. On later releases, use Windows Search instead.
Creating a protocol handler involves implementing ISearchProtocol to manage UrlAccessor objects, IUrlAccessor to generate metadata about and to identify appropriate filters for items in the data store, IProtocolHandlerSite to instantiate a SearchProtocol object and identify appropriate filters, and IFilterto filter proprietary files or to enumerate and filter hierarchically stored files. The protocol handler must be multithreaded.
This sections contains the following topics:
Microsoft Windows Desktop Search (WDS) uses URLs to uniquely identify items in a file system, inside a database-like store, or on the Web. A URL that defines an entry node is called a start page; WDS begins at that start page and recursively crawls the data store. The typical URL structure is:
protocol://host/path/name.extension
Note
When you want to add a new data store, you'll need to select a name to identify it that does not conflict with current ones. We recommend this naming convention: companyName.scheme.
ISearchProtocol
The ISearchProtocol interface invokes, initializes, and manages UrlAccessor objects. For more information on implementing the ISearchProtocol interface, see ISearchProtocol Interface reference.
IUrlAccessor
For a specified URL, the IUrlAccessor interface generates metadata about the location structure as well as contained items, and it binds those items to an filter. The IUrlAccessor object is instantiated and initialized by an SearchProtocol object; however, you can also implement an internal initialization method so your IUrlAccessor object can perform initialization tasks specific to your protocol handler, such as validating the URL for an item being accessed or checking the last modified time to determine if a file must be processed in the current crawl.
Note
Modified times for directories are ignored. The IUrlAccessor object must enumerate the child objects to determine whether there have been any modifications or deletions.
Much of the design of the UrlAccessor object is dependent on whether the structure is hierarchical or link-based. For hierarchical data stores, the UrlAccessor object must find an filter that can enumerate their contents. Another distinction between hierarchical and link-based protocol handlers is the use of the IsDirectory method. In link-based protocol handlers, this method should return S_FALSE. Hierarchical protocol handlers must return S_OK for containers.
For further instructions on implementing an IUrlAccessor interface, see the IUrlAccessor Interface reference.
IProtocolHandlerSite
This interface is used to instantiate a SearchProtocol object and also provides the UrlAccessor object with an appropriate filter for a specified class ID (CLSID).
If you are implementing a hierarchical protocol handler, you must implement a container IFiltercomponent that enumerates URLs representing containers or folders. The enumeration process is a loop through the GetChunk and GetValue methods of the IFilter interface that return a list of URLs that represent each item in the container.
First, GetChunk returns a FULLPROSPEC with the property set GATHER_PROPSET and either:
The property set GUID for GATHER_PROPSET is 0B63E343-9CCC-11D0-BCDB-00805FCCCE04. The PROPSPEC Property is either PID_GTHR_DIRLINK=2 or PID_GTHR_DIRLINK_WITH_TIME = 12 decimal.
Returning PID_GTHR_DIRLINK_WITH_TIME is more efficient because the indexer can immediately determine whether the item needs to be indexed without calling the ISearchProtocol->CreateUrlAccessor() and IUrlAccessor->GetLastModified() methods.
Then GetValue returns a PROPVARIANT for the URL (and last modified time if used), as either:
The following sample code demonstrates how to build the proper PID_GTHR_DIRLINK_WITH_TIME.
Note
THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright (C) Microsoft. All rights reserved.
// params are assumed to be valid
HRESULT GetPropVariantForUrlAndTime(PCWSTR pszUrl, const FILETIME &ftLastModified, PROPVARIANT **ppPropValue)
{
*ppPropValue = NULL;
// allocate the propvariant pointer
*ppPropValue = (PROPVARIANT *)CoTaskMemAlloc(sizeof(*ppPropValue));
HRESULT hr = *ppPropValue ? S_OK : E_OUTOFMEMORY;
if (SUCCEEDED(hr))
{
PropVariantInit(*ppPropValue); // zero init the value
// now allocate enough memory for 2 nested PropVariants.
// PID_GTHR_DIRLINK_WITH_TIME is an array of 2 PROPVARIANTs
PROPVARIANT *pVector = (PROPVARIANT *)CoTaskMemAlloc(sizeof(*pVector) * 2);
hr = pVector ? S_OK : E_OUTOFMEMORY;
if (SUCCEEDED(hr))
{
// set the container PROPVARIANT that it is a vector of 2 PROPVARIANTS
(*ppPropValue)->vt = VT_VARIANT | VT_VECTOR;
(*ppPropValue)->capropvar.cElems = 2;
(*ppPropValue)->capropvar.pElems = pVector;
PWSTR pszUrlAlloc;
hr = SHStrDup(pszUrl, &pszUrlAlloc);
if (SUCCEEDED(hr))
{
// now fill the array of PROPVARIANTS
// put the pointer to the URL into the vector
(*ppPropValue)->capropvar.pElems[0].vt = VT_LPWSTR;
(*ppPropValue)->capropvar.pElems[0].pwszVal = pszUrlAlloc;
// put the FILETIME into vector
(*ppPropValue)->capropvar.pElems[1].vt = VT_FILETIME;
(*ppPropValue)->capropvar.pElems[1].filetime = ftLastModified;
}
else
{
CoTaskMemFree(pVector);
}
}
if (FAILED(hr))
{
CoTaskMemFree(*ppPropValue);
*ppPropValue = NULL;
}
}
return S_OK;
}
Note
A container IFiltercomponent should always enumerate all child URLs even if the child URLs have not changed, because the Indexer detects deletions through the enumeration process. If the date output in a DIR_LINKS_WITH_TIME indicates that the data has not changed, the indexer does not update the data for that URL.
The physical URL is the URL that the UrlAccessor object processes. If the filter does not emit a user-friendly DisplayUrl, WDS displays the physical URL to the user as part of the search results. The WDS schema contains two properties to control what is displayed to the end user, as shown in the table below.
GUID | PROPSPEC | Description |
---|---|---|
D5CDD505-2E9C-101B-9397-08002B2CF9AE | DisplayFolder | Folder Path displayed to the user in search results |
D5CDD505-2E9C-101B-9397-08002B2CF9AE | FolderName | Display name of the parent folder |
If your code does not emit a DisplayFolder or FolderName, these values are computed from the DisplayUrl. Forward slashes in the URL denote containers within the store or file system.
For your protocol handler to have a default start page (and entry node URL), you must implement the ISearchProtocolOptions interface. In future versions of WDS, this interface will provide hooks to the Options dialog for an enhanced user experience. This interface provides the following functionality:
The following table describes the methods you need to implement for the ISearchProtocolOptions interface.
Method | Description |
---|---|
CheckRequirements | Determines whether a custom protocol handler's minimum requirements are met |
GetDefaultCrawlScope | Returns a list of default URLs within a specified store for a custom protocol handler |
GetRequirements | Identifies a user-friendly, localized description of minimum requirements for a custom protocol handler |
Training
Learning path
Microsoft Search fundamentals - Training
Learn about Microsoft Search including where users can search, the answers and results they'll see, and how you can tailor the search experience for your organization.