STL.NET Primer

 

Stanley B. Lippman
Architect, Microsoft Visual C++ team

August 2004

Applies to:
   Microsoft Visual C++ 2005
   Standard Template Library (STL) and STL.NET

Note   This paper is based on an early implementation, and details of the technology may change prior to the final release. STL.NET did not ship in the Visual Studio 2005 Technology Preview release.

Summary: With Visual C++ 2005, the Standard Template Library (STL) has been re-engineered to work under the .NET Framework. This article, the first in a series, provides a general tour of STL.NET. (15 printed pages)

Contents

A Veritible Plethora of Choice
Why STL.NET?
Defining a Common Ground
A Simple Demonstration
Algorithms for the Choosing
Acknowledgements

This marks the first in a series of articles on STL.NET, a re-engineering of the Standard Template Library (STL) implemented using both the CLI generic and the C++ template mechanisms. STL.NET is new to Visual C++ and will ship in Visual Studio 2005. In my judgment, when combined with the revised C++/CLI dynamic programming support, this coming version of Visual C++ is simply the most satisfying language in which to program. I think STL.NET is a very exciting library addition, and I hope that after you finish this series of articles you will agree with me.

A Veritible Plethora of Choice

There are three container libraries that the Visual C++ programmer can choose from to manipulate collections of CLI types, built on three models of type parameterization. These are summarized in the following bulleted list, together with a common code sample implemented within each model.

  • The original Systems::Collection library based on storage of an element type through the Object base class common to all CLI types. For example, here is an ArrayList, which implements the IList interface. It represents an array of type Object and is used in this example to hold elements of type String.

    void objectCollection()
    {
    using namespace System::Collections;

    ArrayList ^as = gcnew ArrayList; as->Add( "Pooh" ); as->Add( "Piglet" ); as->Add( "Eeyore" ); as->Add( "Rabbit" ); as->Sort();

    Console::WriteLine( "ArrayList holds {0} elements: ", as->Count );

    for ( int i = 0; i < as->Count; i++ ) Console::WriteLine( as[ i ] );

    int index = as->IndexOf( "Pooh" ); if ( index != -1 ) { // requires an explicit downcast String^ item = safe_cast<String^>( as[ index ]); as->RemoveAt( index ); }

    as->Remove( "Rabbit" );

    Console::WriteLine( "\nArrayList holds {0} elements: ", as->Count );

    IEnumerator^ is = as->GetEnumerator(); while ( is->MoveNext() ) Console::WriteLine( is->Current ); }

  • A new container library based on the CLI generic mechanism. It is found within the System::Collections::Generic namespace. This represents the Visual Studio 2005 Beta1 implementation, and may change before final release. Collection<T> is a concrete generic base class from which users are urged to derive their own specialized container classes. Here is the same general code sample as earlier in this article,

    void genericCollection()
    {
    using namespace System::Collections::Generic;

    Collection<String^> ^cols = gcnew Collection<String^>;

    cols->Add( "Pooh" ); cols->Add( "Piglet" ); cols->Add( "Eeyore" ); cols->Add( "Rabbit" );

    // no Sort method associated with Collection

    Console::WriteLine( "Collection holds {0} elements: ", cols->Count );

    for ( int i = 0; i < cols->Count; i++ ) Console::WriteLine( cols[ i ] );

    int index = cols->IndexOf( "Pooh" ); if ( index != -1 ) { // no downcast required ... String ^item = cols[ index ]; cols->RemoveAt( index ); }

    cols->Remove( "Rabbit" );

    Console::WriteLine( "\nCollection holds {0} elements:", cols->Count );

    IEnumerator<String^> ^is = cols->GetEnumerator(); while ( is->MoveNext() ) Console::WriteLine( is->Current ); }

  • STL.NET provides a very different model of type parameterization, which is the topic of the next section. Here is its implementation of the String container—we'll drill down into the details later in the series.

    #include <cli/vector>
    #include <algorithm>

void stlCollection() { vector<String^> ^svec = gcnew vector<String^>;

svec-&gt;push_back("Pooh");   svec-&gt;push_back("Piglet");
svec-&gt;push_back("Eeyore"); svec-&gt;push_back("Rabbit");

// generic algorithm: sort 
sort( svec-&gt;begin(), svec-&gt;end() );

Console::WriteLine( "Collection holds {0} elements: ",
                     svec-&gt;size() );

for ( int i = 0; i &lt; svec-&gt;size(); i++ )
      Console::WriteLine( svec[ i ] );

// generic algorithm: find
vector&lt;String^&gt;::iterator iter = 
       find( svec-&gt;begin(), svec-&gt;end(), "Pooh" );

if ( iter != svec-&gt;end() )
{
     // no downcast required ...
     String ^item = *iter;
     svec-&gt;erase( iter );
}

// generic algorithm: remove ...
remove( svec-&gt;begin(), svec-&gt;end(), "Rabbit" );

Console::WriteLine( "\nCollection holds {0} elements:",
                      svec-&gt;size() );

IEnumerator&lt;String^&gt; ^is = svec-&gt;GetEnumerator();
while ( is-&gt;MoveNext() )
        Console::WriteLine( is-&gt;Current );

}

Why STL.NET?

Before we begin drilling down into STL.NET, let me briefly address a somewhat inevitable question: Why should the Visual C++ programmer choose the STL.NET container library rather than the language neutral System: Collections or System::Collections::Generic libraries?

One generally discards the System::Collections library pretty much out of hand for the same reason that Visual Studio 2005 chose to provide the Generic library: the Object model of parameterization is complicated and unsafe because of the loss of type information. It is fine for simple uses, just as a bubble sort is fine for containers holding 16 or fewer elements. But when your applications address real-world problems, you need to provide more sophisticated solutions.

So, the alternatives for a systems programming language such as Visual C++ are STL.NET and the System::Collections::Generic library. Why should the Visual C++ programmer prefer STL.NET? And isn't that going to separate our programs from the rest of the .NET languages? These are valid questions, and deserve a response.

One answer is extensibility. The original design pattern of STL, invented by Alex Stepanov, separates algorithms and containers into separate domain spaces. Thus, one can add to the set of algorithms that apply to all containers, or, one can add to the set of containers to which the algorithms can be applied. The Generic library is a more conventional container model. And that leads us to a second response.

A second answer is unification. The working C++ programmer has built up an expertise with this library as well as an existing body of code. We would like to not only provide a migration path for existing code, but also for existing expertise as well. If you have previously depended on the STL in your C++ programming, and it is difficult to imagine a working C++ programmer who does not depend on the STL, its absence under .NET feels like a significant loss—at least that has been my experience. It is a concern raised by many expert C++ programmers whom I have spoken to, and who have expressed reservations about migrating to .NET.

A third answer is performance. But C++ programmers are stereotyped as being . . .  well, obsessive about performance issues, and so I'll just mention it here in passing—we'll get to it in one of the later articles.

Finally, the question is, this is all very well and good, Stan, but doesn't this isolate the C++ programmer and the C++/CLI programs from the rest of the .NET community? The answer to this, I believe, is a simple no. The STL.NET architects, including Anson Tsao, Martyn Lovell, and P.J. Plauger, have given this considerable thought, and we feel confident in our ability to interoperate with other .NET languages through support of IEnumerator, IList, and ICollection. We'll look at this in some depth in one of my subsequent articles.

Defining a Common Ground

There are two ways to approach STL.NET: one approach addresses what is different between STL and STL.NET. The other approach addresses what they share in common. While an itemization of their differences is likely to appeal to those with experience of the STL, it is also likely to leave those unfamiliar with the library coughing in the exhaust fumes, so to speak, as we speed off into the esoteric corners of this and that container and how they interoperate with the System collection libraries—all of which is neat stuff, but wouldn't it be nice if the potential newbie could be first brought up to snuff. At least that is my intention in the following short introductory sections. In this way, those new to the library can get a feel for the extensible model of parameterized collections provided by both STL and STL.NET.

So what do STL and STL.NET share in common? Both consist of two primary components: sequential and associative container types, and a set of generic algorithms. (Yes, if you are an experienced STL developer, you know what's coming. Still, this provides for a common vocabulary and background, and so I would beg your patience.) The generic algorithms do not operate directly on the container types. Instead, we pass them an iterator pair marking the range of elements over which to operate—typically referred to as first and last. The element range notation (formally named a left-inclusive interval) is written as

   // to be read as: includes first and 
   // each element up to but not including last
   [ first, last )

indicating that the range begins with first and ends with but does not include last. When first equals last, the range is said to be empty.

A sequence container holds an ordered collection of elements of a single type. The two primary sequence containers are the vector and list types. (A third sequence container, deque—pronounced deck—provides the same behavior as a vector but is specialized to support efficient insertion and deletion of its first element. A deque is preferred over a vector, for example, in the implementation of a queue.)

Before we can refer to one of the sequential container types, we must include the appropriate header file, one of the following:

#include <cli/vector>
#include <cli/list>
#include <cli/deque>

These header files also include the declaration of the shared base interface, such as interface_vector, and the generic sibling of the container, such as generic_vector.

The STL.NET containers are reference types; a container declaration provides us with a tracking handle—one that is automatically initialized to nullptr. We allocate an actual container using the gcnew operator. We saw that briefly in the previous section, but let me be explicit here:

void f()
{
    // allocate an empty vector . . . 
    vector<String^>^ svec = gcnew vector<String^>;

    // allocate a list of 10 elements
    // each by default is set to nullptr
    list<Object^>^ olist = gcnew list<Object^>( 10 );

    // a tracking handle automatically set to nullptr
    deque<int>^ ideck;

    // do something interesting . . .
};

We'll cover the declaration and use of the sequential containers in the next article in this series.

An associative container supports efficient query as to the presence and retrieval of an element. The two primary associative container types are the map and set. A map is a key/value pair: the key is used for lookup, and the value contains the data we want to store and retrieve. A telephone directory, for example, is easily represented by a map: the key is the individual's name, and the value is the associated phone number.

A map orders the entries in ascending value using an underlying tree abstraction. The hash_map container provides a more efficient retrieval ordering. However, the iteration of a hash_map accesses the elements in a somewhat random order. If retrieval is the primary activity of the map, a hash_map container is to be preferred.

A set contains a single key value and supports the efficient query of whether it is present. For example, a text query system might build a set of common words to exclude, such as the, and, but, and so on, when building a database of the words present within a text. The program would read each word of the text in turn, check whether it is in the set of excluded words, and either discard or enter the word within the database depending on the result of the query. In addition to the set, there is also the hash_set container, with the same general characteristics as between the map and hash_map.

Both the map and the set can contain only a single occurrence of each key. A multimap and a multiset support multiple occurrences of the same key. Our telephone directory, for example, will probably need to support multiple listings for a single individual. In this case, we would use a multimap. Both a hash_multimap and a hash_multiset are also available.

The associated header files for these container types are the following:

// for both map and multimap
#include <cli/map>

// for both set and multiset
#include <cli/set>

// for both hash_map and hash_multimap
#include <cli/hash_map>

// for both hash_set and hash_multiset
#include <cli/hash_set>

A Simple Demonstration

Let's make this a bit more concrete with an example. Here is a simple program that implements a word count of a text file. It illustrates the use of a map, a hash_set, and a vector. The function is declared as follows:

map<String^,int>^ 
build_word_count( vector<array<String^>^>^ words,
                  array<String^> ^common_words )

Template syntax can look staggeringly complex until our eyes adjust. Let's see what sense we can make sense of it. The return type of build_word_count() is a map whose key is a string—each unique word of the file—and whose value is an integer count of the number of times the associated word occurs. The first function parameter is a vector of CLI arrays holding the words of the file as string elements. The second parameter holds the set of words we want to exclude from our word count.

This looks complex because of the multiple hats interspaced with the closing right brackets. We can lexically simplify this with a pair of typedefs:

typedef array<String^>^ words;
typedef vector<words>^ text;

The redeclaration of the function, using the typedefs, looks like this:

map<String^,int>^ 
build_word_count( text theText, words common )

We could also eliminate the explicit map declaration, but I have left that as an exercise for the reader. The next work is the implementation. We can break that up into two portions: (1) the initialization of our map and hash_set, shown below, and (2) the actual processing of the text itself.

// item #1: initialization of our containers

// allocate an empty map ... we have no idea of the size ...
map<String^,int> ^wordOccur = gcnew map<String^,int>;

// fill the hash_set with the elements of the array
hash_set<String^> ^comWords = 
         gcnew hash_set<String^>(
               &common[0],
               &common[common->Length]);

If you are not familiar with the STL, the initialization of the hash_set probably stubbed your sense of decency. But apart from the syntax, the declaration is actually straightforward and quite powerful.

What we want to do is initialize the hash_set to the elements in the array. We can explicitly do that, of course, by iterating across the elements using a for each statement. For example,

// simplify hash_set declaration 

// part #1: allocate an empty hash_set
hash_set<String^> ^comWords = gcnew hash_set<String^>;

// part #2: fill hash_set with array elements
for each ( String^ wd in common )
           comWords->insert( wd );    

The passing in to the hash_set constructor of the address of the first array element and the address of one past the last valid array element provides an equivalent filling in of the hash_set. The two addresses provide the range of elements over which the hash_set constructor iterates, inserting them in turn into the container. This is the iterator range idiom I mentioned at the beginning of this section.

The second half of our implementation is the actual processing of the text. Since we haven't officially seen iterators yet, we'll stomp through the vector using the conventional for loop for now. We'll use a for each statement to walk the individual array elements of the vector. Here's what that looks like:

    // access each array in turn
    for ( int ix = 0; ix < theText->size(); ++ix )
    {
        for each ( String^ wd in theText[ ix ] )
        if ( common->find( wd ) == common->end() )
             word_map[ wd ]++;
    }
     
    // let's not forget to return it . . . 
    return word_map;

find() is the hash_set member function that we use to determine whether the item is present in the set—all the associative containers support a find() member. (Yes, the sequential containers, as well as the built-in arrays, and any collections you may integrate into the STL/STL.NET model, use the find() generic algorithm. The separation of algorithm and container is theoretically more pronounced than it is in practice. We'll see that in particular with the list sequential container.) It returns just that thing I haven't talked about yet—an iterator into the container. For the moment, let's just think of an iterator as a fancy sort of pointer type. find() returns an iterator to the item, if it is present in the container; otherwise, it returns an iterator one past the last contained item. This is the same iterator value returned by end(). The way we determine whether an element search succeeds under the STL is to compare the iterator that is returned by the search method with the iterator returned by end(). If the two are equal, the element is not in the container.

Each container provides a begin() and end() member function. begin() returns an iterator to the first element of the container. As I said earlier, end() returns an iterator one past the last element of the container. For example, here is how we might declare two iterators and initialize them using these two members,

vector<words>::iterator first = theText->begin();
vector<words>::iterator last = theText->end();

A traversal of the container typically increments first within a for or while loop, terminating when first equals last. For example,

for ( ; first != last; ++first )

A requirement of the iterator pair is that it must be possible to reach last beginning with first through repeated application of an increment operator. However, the compiler cannot itself enforce this; failure to meet this requirement causes undefined run-time behavior.

We access the element referred to by an iterator through the dereference operator (*). For example, to retrieve each CLI array element of our vector, we write the following within the body of the above for loop,

array<String^> ^as = *first;

We'll cover the declaration and use of the associative containers in more detail in a future article in this series.

Algorithms for the Choosing

The generic algorithms provide operations over the elements of both these container types and the two built-in array types. The operations include Search (find, count), Sort (merge, partition, permutate, reverse, rotate, shuffle, sort), Deletion/Subtitution (remove, replace, swap, unique), Copy, Relational (equal, min, max, includes), Generate (fill, for-each, generate, transform), Set (union, intersection, difference), Heap (make, sort, pop, push), and a number of esoteric numeric operations such as accumulate, partial sum, inner product, and adjacent difference.

Before we can use the generic algorithms, of course, we must include the appropriate header file. For all the algorithms other than the numeric, we write:

#include <algorithm>

The four numeric algorithms, accumulate, inner_product, partial_sum, and adjacent_difference, are factored out into a separate header. To use these algorithms we write:

#include <numeric>

(I tried to discover why these four are factored out into a separate header—it certainly complicates the discussion of using the algorithms—but could not discover any explicit explanation either in the documentation or walking the halls. In the original Stepanov implementation, these four are contained in the same header with the other algorithms. During the standard process, the numeric header was introduced and is actually discussed in a separate subsection of the standard devoted to numeric libraries.)

You might be thinking, Gosh, Stan, shouldn't these read <cli/algorithm>? After all, that is how the container headers are qualified. Didn't you make a goof here? And amazingly, the answer is no. I say amazingly because the same implementation of the algorithms is used across both the native STL and STL.NET. Gosh, talk about code reuse!

OK. Let's look at how we might use the algorithms. For example, let's sort the array of String elements before inserting them into the map. We'll do this using the sort() generic algorithm. As with nearly all the generic algorithms, the arguments are the range pair of iterators:

sort( &as[0], &as[ as->Length ] );

In the opening code salvo, two other generic algorithms—find() and remove()—are used:

// generic algorithm: find
vector<String^>::iterator iter = 
           find( svec->begin(), svec->end(), "Pooh" );

if ( iter != svec->end() ){ ... }

// generic algorithm: remove ...
remove( svec->begin(), svec->end(), "Rabbit" );

We should by now be sensitive to the pattern of usage here: the range marked off by the iterator pair serves as the binding of the algorithm to the container. A search algorithm returns an iterator either to the found item within the container, or, failing that, the iterator marking one past the last item of the range.

We'll do some interesting things with the generic algorithms—but that must wait for a future article. I'm hopeful that this has given us a reasonable overview of what's in store, and given all of us a common vocabulary and background by which we can all move forward as a group. See ya next time.

Acknowledgements

There are always a thousand unseen hands that help keep even the simplest article afloat. I owe a great deal of thanks to Scott Currie for a thoughtful review of what I had mistakenly thought of as a—ha! —final draft. Anson Tsao patiently tutored me in the ways of STL.NET, for which I am extremely grateful. Martyn Lovell is steering the STL.NET towards deployment, and has allowed me lookie-loos along the way. Finally, thanks to Brian Johnson and Ronald Laeremans for getting me unbeached to begin with.

 

About the author

Stan Lippman is an Architect with the Visual C++ team at Microsoft. He began working on C++ with its inventor Bjarne Stroustrup back in 1984 within Bell Laboratories. In between, he worked in Feature Animation at Disney and DreamWorks, and was a Software Technical Director on Fantasia 2000.

© Microsoft Corporation. All rights reserved.