Click to Rate and Give Feedback
MSDN
MSDN Library
Windows Search
Windows Search
 Developing Filters for Windows Sear...
Developing Filters for Windows Search

Microsoft Windows Search uses filters to extract the content of items for inclusion in a full-text index. You can extend Windows Search to index new or proprietary file types by writing filters to extract the content and property handlers to extract the properties of files. Filters are associated with file types, as denoted by file extensions, MIME types or class identifier (CLSID)s. While one filter can handle multiple file types, each type works with only one filter.

This topic is meant to supplement the IFilter and Using Custom Filters with Indexing Service topics with information specific to Windows Search. Please read that documentation first and return here.

This topic contains the following sections:

 

Indexing Service-Specific Implementation Information

The following information found in the Indexing Service IFilter documentation is not applicable to writing filters for Windows Search or Windows Vista:

  • Windows Search does not supply the same helper functions as Indexing Service.
  • Filters for Indexing Service run in the protocol host process (cidaemon.exe) under the Local System security context, while filters for Windows Search run in a filter isolation process (SearchFilterHost.exe) under the Local System security context with restricted rights.

In this filter isolation process, a number of rights are removed:

  • Restricted Code
  • Everyone
  • Local
  • Interactive
  • Authenticated Users
  • Built-in Users
  • Users' security identifier (SID)

The removal of these rights means the filter does not have access to the disk system or network or to any user interface or clipboard functions. Furthermore, the isolation process runs under a job object that prevents child processes from being created and imposes a 100 MB limit on the working set. The filter host isolation process increases the stability of the indexing platform, due to the possibility of incorrectly implemented 3rd party IFilters.

 

Windows Search-Specific Implementation Information

There are two major differences between legacy applications like Indexing Service and newer applications like Windows Search that you should be aware of when implementing filters: use of IPersistStream and use of property handlers. First, Windows Vista and Windows Search 3.0 and later require you use IPersistStream for the following reasons:

  • To ensure performance and future compatibility.
  • To help increase security. Filters implemented with IPersistStream are more secure because the context in which the filter runs does not need the rights to open files on the disk or over the network.

While Windows Search uses only IPersistStream, you can also include IPersistFile and/or IPersistStorage interface implementations in your filters for backward compatibility.

The second major difference is that Windows Vista and Windows Search 3.0 and later have a new Property System that uses property handlers to enumerate properties of items. There are, however, times when you need to implement a filter that handles both content and properties in order to:

  • Support legacy MSSearch implementations
  • Traverse links
  • Preserve language information
  • Recursively filter embedded items

In these situations, you need a full filter implementation, including the IFilter::GetValue method to access property values.

Other Implementation Notes

Native versus Managed Code Filters must be written in native code due to potential CLR versioning issues with the process that multiple add-ins run in.

Property Size Limitations There are two potential limitations on property size: the maximum size of data that Windows Search accepts per file, and the maximum size per property as defined in the property description file. Currently, Windows Search does not use the defined property size when calculating the amount of data it accepts from a file. Instead, the limit Windows Search uses is the product of the size of the file and the MaxGrowFactor (file size N * MaxGrowFactor) read from the registry at HKEY_LOCAL_MACHINE->Software->Microsoft->Windows Search->Gathering Manager->MaxGrowFactor. The default MaxGrowFactor is four (4).

Consequently, if your file type tends to be small in total size but have larger properties, Windows Search may not accept all the property data you want to emit. However, you can increase the MaxGrowFactor to suit your needs.

Methods Not Implemented There are a number of methods that filters need not implement:

Interface::MethodDescription
IPersistStream::IsDirtyFilters should return E_NOTIMPL
IPersistStream::SaveFilters should return E_NOTIMPL
IPersistStream::GetSizeMaxFilters should return E_NOTIMPL
IFilter::BindRegionFilters should return E_NOTIMPL

IFilter::GetChunk and Locale Code Identifiers (LCIDs) The language code identifier (LCID) of text can change within a single file. For example, the text of an instruction manual might alternate between English (en-us) and Spanish (es) or the text may include a single word in a language other than the primary language. In either case, your filter must begin a new chunk each time the LCID changes.

Furthermore, because the LCID is used to choose an appropriate word breaker, it is very important that you correctly identify it. If the filter cannot determine the locale of the text, it should assume the default system locale, available by calling GetSystemDefaultLCID. If you control the file format and it currently does not contain locale information, you should add a user feature to enable proper locale identification. Using a mismatched word breaker can result in a poor query experience for the user.

 

Ensuring Your Items Get Indexed

Now that you've implemented your filter, you want to make sure the items your filter is registered for get indexed. You can use the Catalog Manager to initiate reindexing, and you can also use the Crawl Scope Manager to set up default rules indicating the URLs you want the Indexer to crawl. Another option is to follow the ReIndex sample in the Windows Search SDK Samples.

For further information, refer to Using the Catalog Manager and Using the Crawl Scope Manager.

 

Registering Filters for Windows Search

You need to make a total of four entries in the registry to register your filter add-in, where:

  • PH_CLSID is the PersistentHandler CLSID for this extension's file type
  • IF_CLSID is the CLSID for the IFilter implementation
  • 89BCB740-6119-101A-BCB7-00DD010655AF is the IFilter interface GUID, which is a constant for all IFilter implementations

First, register the persistent handler for the file extension with the following key and value:

  • HKEY_CLASSES_ROOT\<.ext>\PersistentHandler

  • (Default) = <PH_CLSID>

Second, register the IFilter implementation with the following keys and values:

  • HKEY_CLASSES_ROOT\CLSID\<IF_CLSID>

  • (Default) = <Filter Description>

  • HKEY_CLASSES_ROOT\CLSID\<IF_CLSID>\InprocServer32

  • (Default) = <DLL Install Path>
    ThreadingModel = Both

  • HKEY_CLASSES_ROOT\CLSID\<PH_CLSID>\PersistentAddinsRegistered\{89BCB740-6119-101A-BCB7-00DD010655AF}

  • (Default) = <IF_CLSID>
Note  We recommend registering your filter's threading model as "Both". Associating your persistent handler with a specific file extension is the preferred method for registration.

 

Legacy Issues

As noted earlier, Windows Vista and Windows Search include a new property system that encapsulates an item's properties from its content. This property system does not exist in earlier versions of Microsoft Windows Desktop Search (WDS) 2.x. If your filter must support other applications as described above, it may need to handle both content and properties. We recommend you refer to Developing IFilter Add-ins or IFilter for more information on developing such a filter.

Related Topics

Tags What's this?: Add a tag
Community Content   What is Community Content?
Add new content RSS  Annotations
Filter host security context?      alegr1   |   Edit   |  

"while filters for Windows Search run in a filter isolation process (SearchFilterHost.exe) under the Local System security context with restricted rights"

Do you mean Local Service security context instead?

Tags What's this?: Add a tag
Flag as ContentBug
Processing
© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Page view tracker