Training
Learning path
Microsoft Search fundamentals - Training
Learn about Microsoft Search including where users can search, the answers and results they'll see, and how you can tailor the search experience for your organization.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Installing a protocol handler involves copying the DLL(s) to an appropriate location in the Program Files directory, and then registering the protocol handler through the registry. The installation application can also add a search root and scope rules to define a default crawl scope for the Shell data source.
This topic is organized as follows:
Windows Search uses URLs to uniquely identify items in the hierarchy of your Shell data source. The URL that is the first node in the hierarchy is called the search root; Windows Search will begin indexing at the search root, requesting that the protocol handler enumerate child links for each URL.
The typical URL structure is:
<protocol>:// [{user SID}/] <localhost>/<path>/[<ItemID>]
The URL syntax is described in the following table.
Syntax | Description |
---|---|
<protocol> | Identifies which protocol handler to invoke for the URL. |
{user SID} | Identifies the user security context under which the protocol handler is called. If no user security identifier (SID) is identified, the protocol handler is called in the security context of the system service. |
<path> | Defines the hierarchy of the store, where each forward slash ('/') is a separator between folder names. |
<ItemID> | Represents a unique string that identifies the child item (for example, the file name). |
The Windows Search Indexer trims the final slash from URLs. As a result you cannot rely on the existence of a final slash to identify a directory versus an item. Your protocol handler must be able to handle this URL syntax. Ensure that the protocol name that you select to identify your Shell data source does not conflict with current ones. We recommend this naming convention: companyName.scheme
.
For more information on creating a Shell data source, see Implementing the Basic Folder Object Interfaces.
Creating a protocol handler requires the implementation of the following three interfaces:
Other than the three mandatory interfaces listed, the other interfaces are optional, and you are at liberty to implement whichever optional interfaces are most appropriate for the task at hand.
The SearchProtocol interfaces initialize and manage your protocol handler UrlAccessor objects. The ISearchProtocol2 interface is an optional extension of ISearchProtocol, and includes an extra method to specify more information about the user and the item.
The IUrlAccessor interfaces are described in the following table.
Interface | Description |
---|---|
IUrlAccessor | For a specified URL, the IUrlAccessor interface provides access to the properties of the item that is exposed in the URL. It can also bind those properties to a protocol handler-specific filter (that is, a filter other than the one associated with the file name). |
IUrlAccessor2 (optional) | The IUrlAccessor2 interface extends IUrlAccessor with methods that get a code page for the item's properties and its display URL, and that get the type of item in the URL (document or directory). |
IUrlAccessor3 (optional) | The IUrlAccessor3 interface extends IUrlAccessor2 with a method that gets an array of user SIDs, enabling the search protocol host to impersonate these users to index the item. |
IUrlAccessor4 (optional) | The IUrlAccessor4 interface extends the functionality of the IUrlAccessor3 interface with a method that identifies whether the content of the item should be indexed. |
The UrlAccessor object is instantiated and initialized by a SearchProtocol object. The IUrlAccessor interfaces provide access to important pieces of information through the methods described in the following table.
Method | Description |
---|---|
IUrlAccessor::GetLastModified | Returns the time that the URL was last modified. If this time is more recent than the last time the indexer processed this URL, filter handlers (implementations of the IFilter interface) are called to extract the (possibly) changed data for that item. Modified times for directories are ignored. |
IUrlAccessor::IsDirectory | Identifies whether the URL represents a folder containing a child URLs. |
IUrlAccessor::BindToStream | Binds to an IStream interface that represents the data of a file in a custom data store. |
IUrlAccessor::BindToFilter | Binds to a protocol handler-specific IFilter, which can expose properties for the item. |
IUrlAccessor4::ShouldIndexItemContent | Identifies whether the content of the item should be indexed. |
The IProtocolHandlerSite interface is used to instantiate a filter handler, which is hosted in an isolated process. The appropriate filter handler is obtained for a specified persistent class identifier (CLSID), document storage class, or file name extension. The benefit of asking the host process to bind to IFilter is that the host process can manage the process of locating an appropriate filter handler, and control the security involved in calling the handler.
If you are implementing a hierarchical protocol handler, then you must implement a filter handler for a container that enumerates child URLs. A filter handler is an implementation of the IFilter interface. The enumeration process is a loop through the IFilter::GetChunk and IFilter::GetValue methods of the IFilter interface; each child URL is exposed as the value of the property.
IFilter::GetChunk returns the properties of the container. To enumerate child URLs, IFilter::GetChunk returns either of the following:
The URL to the item without the last modified time. IFilter::GetValue returns a PROPVARIANT containing the child URL.
PKEY_Search_UrlToIndexWithModificationTime:
The URL and the last modified time. IFilter::GetValue returns a PROPVARIANT containing a vector of the child URL and the last modified time.
Returning PKEY_Search_UrlToIndexWithModificationTime is more efficient because the indexer can immediately determine whether the item needs to be indexed without calling the ISearchProtocol::CreateAccessor and IUrlAccessor::GetLastModified methods.
The following example code demonstrates how to return the PKEY_Search_UrlToIndexWithModificationTime property.
Important
Copyright (c) Microsoft Corporation. All rights reserved.
// Parameters are assumed to be valid
HRESULT GetPropVariantForUrlAndTime
(PCWSTR pszUrl, const FILETIME &ftLastModified, PROPVARIANT **ppPropValue)
{
*ppPropValue = NULL;
// Allocate the propvariant pointer.
size_t const cbAlloc = sizeof(**ppPropValue);
*ppPropValue = (PROPVARIANT *)CoTaskMemAlloc(cbAlloc));
HRESULT hr = *ppPropValue ? S_OK : E_OUTOFMEMORY;
if (SUCCEEDED(hr))
{
PropVariantInit(*ppPropValue); // Zero init the value
// Now allocate enough memory for 2 nested PropVariants.
// PKEY_Search_UrlToIndexWithModificationTime is an array of two PROPVARIANTs.
PROPVARIANT *pVector = (PROPVARIANT *)CoTaskMemAlloc(sizeof(*pVector) * 2);
hr = pVector ? S_OK : E_OUTOFMEMORY;
if (SUCCEEDED(hr))
{
// Set the container PROPVARIANT to be a vector of two PROPVARIANTS.
(*ppPropValue)->vt = VT_VARIANT | VT_VECTOR;
(*ppPropValue)->capropvar.cElems = 2;
(*ppPropValue)->capropvar.pElems = pVector;
PWSTR pszUrlAlloc;
hr = SHStrDup(pszUrl, &pszUrlAlloc);
if (SUCCEEDED(hr))
{
// Now fill the array of PROPVARIANTS.
// Put the pointer to the URL into the vector.
(*ppPropValue)->capropvar.pElems[0].vt = VT_LPWSTR;
(*ppPropValue)->capropvar.pElems[0].pwszVal = pszUrlAlloc;
// Put the FILETIME into vector.
(*ppPropValue)->capropvar.pElems[1].vt = VT_FILETIME;
(*ppPropValue)->capropvar.pElems[1].filetime = ftLastModified;
}
else
{
CoTaskMemFree(pVector);
}
}
if (FAILED(hr))
{
CoTaskMemFree(*ppPropValue);
*ppPropValue = NULL;
}
}
return S_OK;
}
Note
A container IFilter component should always enumerate all child URLs even if the child URLs have not changed, because the indexer detects deletions through the enumeration process. If the date output in a PKEY_Search_UrlToIndexWithModificationTime indicates that the data has not changed, the indexer does not update the data for that URL.
Installing protocol handlers involves copying the DLL(s) to an appropriate location in the Program Files directory, and then registering the DLL(s). Protocol handlers should implement self-registration for installation. The installation application can also add a search root, and scope rules to define a default crawl scope for the Shell data source, which is discussed in Ensuring that Your Items are Indexed at the end of this topic.
You should follow these guidelines when registering a protocol handler:
You need to make fourteen entries in the registry to register the protocol handler component, where:
To register a protocol handler:
Register the version independent ProgID with the following keys and values:
HKEY_CLASSES_ROOT
<Ver_Ind_ProgID>
(Default) = <Protocol Handler Class Description>
HKEY_CLASSES_ROOT
<Ver_Ind_ProgID>
CLSID
(Default) = {CLSID_1}
HKEY_CLASSES_ROOT
<Ver_Ind_ProgID>
CurVer
(Default) = <Ver_Dep_ProgID>
Register the version dependent ProgID with the following keys and values:
HKEY_CLASSES_ROOT
<Ver_Dep_ProgID>
(Default) = <Protocol Handler Class Description>
HKEY_CLASSES_ROOT
<Ver_Dep_ProgID>
CLSID
(Default) = {CLSID_1}
Register the protocol handler's CLSID with the following keys and values:
HKEY_CLASSES_ROOT
{CLSID_1}
(Default) = <Protocol Handler Class Description>
HKEY_CLASSES_ROOT
{CLSID_1}
{InprocServer32}
(Default) = <DLL Install Path>
Threading Model = Both
HKEY_CLASSES_ROOT
{CLSID_1}
<ProgID>
(Default) = <Ver_Dep_ProgID>
HKEY_CLASSES_ROOT
{CLSID_1}
<ShellFolder>
Attributes = dword:a0180000
HKEY_CLASSES_ROOT
{CLSID_1}
TypeLib
(Default) = {LIBID of PH Component}
HKEY_CLASSES_ROOT
{CLSID_1}
VersionIndependentProgID
(Default) = <Ver_Ind_ProgID>
Register the protocol handler with Windows Search. In the following example, <Protocol Name> is the name of the protocol itself, such as file, mapi, and so forth:
HKEY_LOCAL_MACHINE
SOFTWARE
Microsoft
Windows Search
ProtocolHandlers
<Protocol Name> = <Ver_Dep_ProgID>
HKEY_CURRENT_USER
SOFTWARE
Microsoft
Windows Search
ProtocolHandlers
<Protocol Name> = <Ver_Dep_ProgID>
Prior to Windows Vista:
HKEY_CURRENT_USER
SOFTWARE
Microsoft
Windows Desktop Search
DS
Index
ProtocolHandlers
<Protocol Name>
HasRequirements = dword:00000000
HasStartPage = dword:00000000
You need to make two entries in the registry to register the protocol handler's file type handler (which is also known as a Shell extension).
HKEY_LOCAL_MACHINE
SOFTWARE
Microsoft
Windows
CurrentVersion
Explorer
Desktop
NameSpace
{CLSID of PH Implementation}
(Default) = <Shell Implementation Description>
HKEY_LOCAL_MACHINE
SOFTWARE
Microsoft
Windows
CurrentVersion
Explorer
Shell Extensions
Approved
{CLSID of PH Implementation} = <Shell Implementation Description>
After you have implemented your protocol handler, you must specify which Shell items your protocol handler is to index. You can use the Catalog Manager to initiate re-indexing (for more information, see Using the Catalog Manager). Or you can also use the Crawl Scope Manager (CSM) to set up default rules indicating the URLs that you want the indexer to crawl (for more information, see Using the Crawl Scope Manager and Managing Scope Rules). You can also add a search root (for more information, see Managing Search Roots). Another option available to you is to follow the procedure in the ReIndex sample in Windows Search Code Samples.
The ISearchCrawlScopeManager interface provides methods that notify the search engine of containers to crawl and/or watch, and items under those containers to include or exclude when crawling or watching. In Windows 7 and later, ISearchCrawlScopeManager2 extends ISearchCrawlScopeManager with the ISearchCrawlScopeManager2::GetVersion method that gets the version, which informs clients whether the state of the CSM has changed.
Conceptual
Training
Learning path
Microsoft Search fundamentals - Training
Learn about Microsoft Search including where users can search, the answers and results they'll see, and how you can tailor the search experience for your organization.