Crawling Hierarchical Content Sources

Crawling Hierarchical Content Sources

If the content source has a hierarchical structure, such as a file share has, an IFilter interface must be implemented to enumerate the contents of the content source. This information is passed back to the Gatherer, and it uses the enumeration to create a queue for items that need to be crawled. The enumeration process is a loop through the GetChunk, GetText, or GetValue methods of the IFilter interface.

Another distinction between hierarchical and link-based protocol handlers is the use of the IsDirectory method of the IUrlAccessor interface. In link-based protocol handlers, this method should return S_FALSE. Hierarchical protocol handlers must return S_OK for containers.

For more information about the IFilter interface, see the IFilter section of the Microsoft Platform SDK.