C++ and STL: Take Advantage of STL Algorithms by Implementing a Custom Iterator

Samir Bajaj
This article assumes you�re familiar with C++ and STL
Level of Difficulty     1   2   3 
Download the code for this article: STL.exe (34KB)
Browse the code for this article at Code Center: CacheIter
SUMMARYThere are many benefits to using the Standard Template Library (STL) for C++ development, including the ability to use generic data structures and algorithms. To use the STL algorithms, an STL-conforming container is required. Iterating through the Internet Explorer cache is an informative exercise, but the cache is not an STL-conforming container. So, to use the STL algorithms to search and enumerate the Internet Explorer cache, an adapter is needed. Building such an adapterâ€"an STL-conforming iteratorâ€"is the topic of this article. Also provided is an overview of the components of the STL and the Win32 Internet APIs used.
T he C++ Standard Template Library (STL) is a powerful and versatile collection of classes and functions that provides an efficient, lightweight, and extensible framework for application development. STL also offers a sophisticated level of abstraction that promotes the use of generic data structures and algorithms without the overhead of a generic solution. With the exception of requirements that are very specific to an application domain and perhaps some other rare situations, the STL is sufficiently flexible to address the development needs of all kinds of applications. Contrary to popular belief, applications do not have to sacrifice performance in order to use STL constructs, since the STL makes specific performance guarantees for the algorithms it supplies and you can select the one that best meets your needs.
      To use the STL algorithms, an STL-conforming container is required. The Microsoft® Internet Explorer cache is not an STL-conforming container. Therefore, if you want to use the STL algorithms with the Internet Explorer cache, you'll need an adapter.
      In this article I'll demonstrate the design of an STL-conforming iterator for the cache of temporary files stored by Internet Explorer. I'll provide an overview of the components of the STL and then look closely at STL iterators. I'll explain how to get information about the Internet Explorer cache entries using the Win32® Internet (WinInet) APIs. Finally, I'll show you how I put this information together to develop the cache iterator and explain how it works.

STL Components

      The STL container classes do not form a hierarchy; in other words, they do not inherit from any base class. Therefore, the STL template classes and functions must follow certain conventions. For example, all standard containers are required to provide certain type definitions and implementations for a specific set of methods. This means that if C is an STL-conforming container, then C::iterator is the type of the iterator for that container. This has the significant advantage of moving a substantial portion of error checking to the compilation phase, thereby reducing the probability of expensive, and sometimes fatal, runtime errors.
      STL components can be broadly classified into four categories: containers, iterators, algorithms, and function objects. Containers manage collections of objects. There are a variety of containers, and each offers unique advantages. Standard containers include vector, list, deque, set, map, and so on. What will be the most efficient container type for your particular task will depend on the way an application manipulates its data and the nature of operations it wants to perform on the objects.
      Iterators are generalized pointer-like objects that are used to step through the elements of a container. Iterators can work on an entire collection of objects or on a subset of objects. Typically, iterators operate on a range of elements, which could include anywhere from no elements to all the elements in a container. In the next section of this article, I'll discuss iterators in more detail and show you how I applied the concept of iterators to the Internet Explorer cache of temporary Internet files.
      Algorithms process elements in containers, or more specifically, in ranges. The STL offers a wide variety of algorithms to search, sort, and operate on elements in containers. In fact, the STL actually has more than 70 useful, generic algorithms in the form of function templates that can be applied to containers or sequences of elements. A simple example of an STL algorithm is the count function. As its name suggests, it counts the number of elements in a container.
      However, if you want to use any of the STL algorithms, the container (or sequence of elements) has to be STL-compliant. Although iterators act as the glue that allows algorithms to be applied to containers, they also serve to decouple the components so that new algorithms can be written independently of the containers, and vice-versa, without having to derive from any base classes or override any virtual functions. This orthogonal design of components facilitates the use of a multitude of algorithms on practically all containers that conform to STL's conventions.
      Function objects (or functors) encapsulate policy. These objects behave like functions and can be invoked as functions. In essence, function objects are smarter than functions because they can encapsulate state and attributes that reflect a particular state and also have a specific type associated with them. Later in this article, I will show the function objects that customize the operation of a few algorithms to work on the Internet Explorer cache.
      There are other utility classes and functions in the STL that do not strictly fall into any of these categories, but are nonetheless associated with one or more components.

Iterators

      An iterator is modeled after a pointer, and for the most part can be used as a pointer. Iterators are divided into categories based on the operations they support. Although the categories are collectively referred to as a hierarchy, they don't correspond to classes or templates. In other words, the iterator categories do not define an inheritance hierarchy. The iterator hierarchy has five categories, each corresponding to a set of operations supported by instances of that category.
      In increasing order of capability, iterator categories are as follows:
Input These iterators support single-pass, read-only operations. Once an element in a range has been read, the iterator cannot backtrack to that element. If you need to backtrack, you have to start over again.
Output These are similar to input iterators, except that output iterators support write-only operations. Modeled on ostreams (output streams), they closely emulate the behavior of a typical file pointer in write mode. It is important to note that output iterators are also single-pass.
Forward These iterators offer a combination of operations supported by input and output iteratorsâ€"specifically, they allow read and write operations. Additionally, they can make multiple passes over the range of elements that they traverse.
Bidirectional These are essentially forward iterators with the ability to move backward through a range of elements. At minimum, all standard STL containers offer bidirectional iterators.
Random access As true pointer-like objects they offer similar capabilities of bidirectional iterators as well as pointer arithmetic. Needless to say, these built-in pointers are a perfect example of random access iterators.
      Before I explain the design for the Internet Explorer cache iterator, let's examine the structure of the cache and the support provided by the WinInet APIs.

The Internet Explorer Cache

      Internet Explorer maintains a cache of resources that it downloads from the network. Any data received from the network is cached on the local disk and retrieved in response to subsequent requests. Clients that need persistent caching services can use the WinInet caching APIs to allow applications to save data in the local file system for subsequent use. This is extremely useful for situations where a low-bandwidth link limits access to data, or access to the Internet is not available at all. In the iterator sample, I treat the cache as a read-only data structure in order to use several algorithms in the STL and perform some operations on the cache.
      The contents of the persistent cache can be enumerated using a familiar search procedure in Win32. You start the search by creating a search handle. Subsequent invocations of one or more functions return a structure with information about each file until no more files need to be processed. File enumeration in a directory and resource enumeration on an FTP site are common examples of where you would use this procedure. The corresponding WinInet APIs for searching through and enumerating the Internet cache are described here:
  • FindFirstUrlCacheEntry starts the enumeration by taking a search pattern, a buffer, and a buffer size to create a handle and return the first cache entry.
  • FindNextUrlCacheEntry takes the handle created by FindFirstUrlCacheEntry, a buffer, and a buffer size to return the next cache entry.
  • FindCloseUrlCache closes the specified cache enumeration handle.
      The signatures of these functions are shown in Figure 1.
      Information about the cache entry is returned in an INTERNET_CACHE_ENTRY_INFO structure in the buffer. The size of this structure varies for each entry. If the buffer size passed to either FindFirstUrlCacheEntry or FindNextUrlCacheEntry is insufficient, the GetLastError API will return ERROR_INSUFFICIENT_BUFFER. The buffer size variable contains the actual size that was needed to retrieve that cache entry. The application should then reallocate a buffer of the appropriate size and call the function again. The INTERNET_CACHE_ENTRY_INFO structure contains information such as the URL of the cached resource, the local file name, size, last modified time, last access time, expiry, and so on. Figure 2 shows the complete structure definition.
      Several other WinInet APIs that provide additional services and information related to the contents of the cache are documented in the MSDN® Library and the Platform SDK.
      Now that I've discussed iterators, the Internet Explorer cache, and the APIs for enumerating the cache, let's take a look at the cache iterator program I have developed.

The Cache Iterator

      Given the functionality offered by the cache enumeration APIs that I discussed, an input iterator seems to be the best fit. There appears to be a direct correspondence between the properties of an input iterator (single-pass, read-only), and the functionality of the FindFirstUrlCacheEntry and FindNextUrlCacheEntry APIs. While it is true that the input iterator is the least powerful member of the iterator hierarchy, a significant number of algorithms in the STL can work with objects of the input iterator category, as you will see shortly. So let's take a look at the design of the cache iterator.
      In order to establish the iterator as an input iterator type and to define some critical types associated with the iterator, I first need some typedefs. I could do that by including the typedefs in the iterator class, or the typedefs could be inherited from the std::iterator class that is provided in the STL for exactly this purpose. The std::iterator class is shown in the following code.
  template<class C, class T, class Dist = ptrdiff_t>
    struct iterator {
    typedef C iterator_category;
    typedef T value_type;
    typedef Dist distance_type;
    };

      Note that std::iterator defines types for the iterator category as well as the element that the iterator will point to (value_type). In addition, it defines the type for the "distance" between two iteratorsâ€"this is naturally ptrdiff_t for pointer types, and is important for random access iterators, but doesn't really apply to input iterators.
      Moreover, the cache_iterator program will provide an interface to the Internet Explorer cache, and therefore needs to maintain state variables that can be manipulated by the WinInet cache APIs. Adding a cache enumeration handle and variable of type INTERNET_CACHE_ENTRY_INFO*, the class definition looks like this:
  class cache_iterator: public std::iterator<std::input_iterator_tag,
                                         INTERNET_CACHE_ENTRY_INFO>
{
   HANDLE                      m_hCacheEntry;
   LPINTERNET_CACHE_ENTRY_INFO m_lpCacheEntryInfo;
   // ...
};

      I have indicated that I am designing an input iterator and that the iterator will enumerate elements of type INTERNET_CACHE_ENTRY_INFO.
      Next, I need to determine the minimum set of operations that an input iterator is expected to support. You should recall that the STL follows certain conventions, and components of the STL make certain assumptions about other components in order to interoperate in a generic, orthogonal manner. If a component doesn't satisfy the expected requirements, the compiler will then issue an appropriate error.
      It is usually helpful to have a picture of a built-in pointer in mind when discussing iterators. Consider the following piece of canonical code and see how it can lead to insights on iterator design:
  int* find(int* first, int* last, const int value) 
{
   while (first != last && *first != value) ++first;
   return first;
}

This function is invoked as follows:
  int numbers[10];
// fill numbers[] with some ints
int* zero = find(numbers, numbers+10, 0);

      If an iterator were to emulate a built-in pointer, what operations would it need to support? Evidently, it should offer at least the following operations: pass-by-value, comparison for inequality, dereferencing, and preincrement. Pass-by-value semantics can be implemented for a class via a copy constructor. For the remaining three operations, C++ operator overloading is both a convenient and natural mechanism.
      Next, I'll describe the cache iterator's implementation of these four operations.
Pass-by-value The cache_iterator copy constructor should initialize member data and place the object in a valid state. It is important to distinguish between a valid state and an invalid state because the latter will be used to indicate the completion of cache enumeration. In addition, the cache_iterator object will be placed into an invalid state as soon as an irrecoverable error is encountered, thereby terminating the enumeration. Using the WinInet API FindFirstUrlCacheEntry, the cache_iterator object can be initialized with valid values for an enumeration handle and a pointer to the first entry in the cache. Null values for the two instance variables indicate that the object is in an invalid state.
  cache_iterator(LPINTERNET_CACHE_ENTRY_INFO) : m_hCacheEntry(0), 
    m_lpCacheEntryInfo(0)
    {
    }

For the complete code listing see the file CacheIter.h, which can be found at the link at the top of this article.
Comparison for inequality It is usually a good idea to overload both the == operator and the != operator when implementing either one. The implementation of the == operator is simple: two instances of cache_iterator are equal if their state is equal, and since I am only interested in distinguishing between a valid state and the invalid state, the following will suffice:
  bool operator ==(const cache_iterator& cit)
{
    return m_hCacheEntry == cit.m_hCacheEntry;
}

Dereferencing At any point during the enumeration, the iterator will be pointing to a cache entry. The implementation of operator* can simply return the INTERNET_CACHE_ENTRY_INFO structure to which the member variable m_lpCacheEntryInfo points, as you can see in this code.
  INTERNET_CACHE_ENTRY_INFO operator*()
{
    return 
       *m_lpCacheEntryInfo;
}

Preincrement This is implemented by overloading the ++ operator. The method invokes the API FindNextUrlCacheEntry, updating the object state appropriately so that it points to the INTERNET_ CACHE_ENTRY_INFO structure corresponding to the next element in the enumeration order. This method also must ensure that the object transitions to an invalid state when all entries have been enumerated, as you can see in Figure 3.

Manipulating the Cache

      By designing an STL-conforming iterator for the Internet Explorer cache, components in the STL that can work with input iterators will be able to traverse the cache and enumerate its contents. However, that alone is not sufficient since I also need to process those entries and extract information of interest. This is where function objects enter the scene. As described earlier, function objects are functions or classes that define behavior or policy.
      As an example, let's say all I want to do is display the name of the local file corresponding to each cache entry. Here's a function object to do just that:
  struct display_cache_entry
{
    void operator()(INTERNET_CACHE_ENTRY_INFO cei) const
    {
        if (cei.lpszLocalFileName)
            cout << cei.lpszLocalFileName << endl; 
    }
};

The desired effect can be achieved by using the previous function object in conjunction with the STL algorithm for_each, as shown in the following example:
  for_each(cache_iterator(),cache_iterator(0),display_cache_entry());

Notice how the default constructor creates a valid cache_iterator object that points to the first entry in the cache, and a different constructor is used to create an instance of the iterator object that indicates the invalid state.
      The code in Figure 4 shows how simple function objects can be written to take advantage of a sophisticated library of algorithms in the STLâ€"all I had to do was implement an iterator that honors the STL conventions.

Conclusion

      This article introduced the C++ STL and demonstrated the design and implementation of an STL-conforming iterator for the Internet Explorer cache. The technique is sufficiently general that it can be applied to a variety of problems in all application domains. The time and effort invested in creating such a design is handsomely rewarded by providing you with seamless interoperability with a rich array of STL components.
Samir Bajaj is a senior software developer in the Design Platforms Group at Autodesk Inc. As a member of the AutoCAD team, Samir contributes to the design and implementation of Internet-based features across the product line and on multiple platforms. Samir coauthored Inside AutoCAD 2000 (New Riders, 1999). He can be reached at samir.bajaj@autodesk.com.

From the April 2001 issue of MSDN Magazine.