This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

MSDN Magazine

Type Fundamentals
Jeffrey Richter
I

n the October issue, I introduced many of the fundamental concepts related to types in the Microsoft® .NET common language runtime. In particular, I discussed how all types are derived from the System.Object type, and showed the various mechanisms (for example, C# operators) that a programmer can use to cast from one type to another. Finally, I mentioned how namespaces are used by compilers and how they are ignored by the common language runtime.
      In this month's column, I'll continue my discussion of type fundamentals. I'll start off by introducing primitive types and then quickly move on to reference types and value types. It is crucial that all developers be familiar with the different behaviors that the two latter types exhibit. In fact, I believe that a developer who misunderstands the difference between reference types and value types will introduce subtle bugs and performance issues into their code.

Primitive Types

      Certain data types are used so commonly that many compilers allow your code to manipulate them using simplified syntax. For example, you could allocate an integer using the following syntax in C#:
  int a = new int(5);

      But I'm sure you'll agree that declaring and initializing an integer using this syntax is rather cumbersome. Fortunately, many compilers (including C#) allow you to use syntax similar to the following instead:
  int a = 5;

This certainly makes the code more readable. And, of course, the intermediate language (IL) that is generated when using either syntax is identical.
      Any data types directly supported by the compiler are called primitive types. Primitive types map directly to types that exist in the base class library. For example, in C# an int maps directly to the System.Int32 type. Because of this, the following two lines of code are identical to the two lines of code shown previously:
  System.Int32 a = new System.Int32(5);
System.Int32 a = 5;

Figure 1 shows the base class library types that have corresponding primitives in C# (other languages will offer similar primitive types).

Reference and Value Types

      When an object is allocated from the managed heap, the new operator returns the memory address of the object. You usually store this address in a variable. This is called a reference type variable because the variable does not actually contain the object's bits; instead, the variable refers to the object's bits.
      There are some performance issues to consider when working with reference types. First, the memory must be allocated from the managed heap, which could force a garbage collection to occur. Second, reference types are always accessed via their pointers. So every time your code references any member of an object on the heap, code must be generated and executed to dereference the pointer in order to perform the desired action. This adversely affects both size and speed.
      In addition to reference types, the virtual object system supports lightweight types called value types. Value type objects cannot be allocated on the garbage-collected heap, and the variable representing the object does not contain a pointer to an object; the variable contains the object itself. Since the variable contains the object, a pointer does not have to be dereferenced in order to manipulate the object. This, of course, improves performance.
      The code in Figure 2 demonstrates how reference types and value types differ. In Figure 2, the Rectangle type is declared using struct instead of the more common class. In C#, a type declared using struct is a value type, while types declared using class are reference types. Other languages may have different syntax for describing value types versus reference types. For example, C++ uses the __value modifier.
      Recall the following line of code discussed in the section on primitive types:
  System.Int32 a = new System.Int32(5); 

When this statement is compiled, the compiler detects that the System.Int32 type is a value type and optimizes the resulting IL code so that this "object" is not allocated from the heap; instead, this object is placed on the thread's stack in the local variable a.
      When possible, you should use value types instead of reference types because your application's performance will be better. In particular, you should declare a type as a value type if all of the following are true:
  • The type acts like a primitive type.
  • The type doesn't need to inherit from any other type.
  • The type will not have any other types derived from it.
  • Objects of the type are not frequently passed as method arguments since this would cause frequent memory copy operations, hurting performance. The next section on boxing and unboxing will explain this in more detail.
      The main advantage of value types is that they are not allocated in the managed heap. Of course, value types have several limitations compared with reference types. Here are some of the ways in which value types and reference types differ.
      Value type objects have two representations: an unboxed form and a boxed form. Reference types are always in a boxed form.
      Value types are implicitly derived from System.ValueType. This type offers the same methods as defined by System.Object. However, System.ValueType overrides the Equals method so that it returns true if the values of the two objects' instance fields match. In addition, System.ValueType overrides the GetHashCode method so that it produces a hash code value using an algorithm that takes into account the values in the objects' instance fields. When defining your own value types, it is highly recommended that you override and provide explicit implementations for the Equals and GetHashCode methods.
      Since you cannot declare a new value type or a new reference type using a value type as a base class, value types should not have virtual functions, cannot be abstract, and are implicitly sealed (a sealed type cannot be used as the base of a new type).
      Reference type variables contain the memory address of objects in the heap. By default, when a reference type variable is created, it is initialized to null, indicating that the reference type variable doesn't currently point to a valid object. Attempting to use a null reference type variable causes a NullReferenceException exception. By contrast, value type variables always contain a value of the underlying type. By default, all members of the value type are initialized to zero. It is not possible to generate a NullReferenceException exception when accessing a value type.
      When you assign a value type variable to another value type variable, a copy of the value is made. When you assign a reference type variable to another reference type variable, only the memory address is copied.
      Because of the previous point, two or more reference type variables may refer to a single object in the heap. This allows operations on one variable to affect the object referenced by the other variable. On the other hand, value type variables each have their own copy of the object's data, and it is not possible for operations on one value type variable to affect another.
      There are rare situations when the runtime must initialize a value type and is unable to call its default constructor. For example, this can happen when a thread local value type must be allocated and initialized when an unmanaged thread first executes managed code. In this situation, the runtime can't call the type's constructor but still ensures that all members are initialized to zero or null. For this reason, it is recommended that you don't define a parameterless constructor on a value type. In fact, the C# compiler (and others) consider this an error and won't compile the code. This problem is rare, and it never occurs on reference types. There are no restrictions on parameterized constructors for both value types and reference types.
      Since unboxed value types are not allocated on the heap, the storage allocated for them is freed as soon as the method that defines an instance of the type is no longer active. This also means that unboxed value type objects cannot receive a notification when their memory is reclaimed. However, a boxed value type will have its Finalize method called when it is garbage-collected. You are strongly discouraged from implementing a value type with a Finalize method. Like a parameterless constructor, C# considers this an error and will not compile the source code.

Boxing and Unboxing

      There are many situations in which it is convenient to treat a value type as a reference type. Let's say that you wanted to create an ArrayList object (a type defined in the System.Collections namespace) to hold a set of Points. The code might look like Figure 3.
      With each iteration of the loop, a Point value type is initialized. Then, the Point is stored in the ArrayList. But let's think about this for a moment. What is actually being stored in the ArrayList? Is it the Point structure, the address of the Point structure, or something else entirely? To get the answer, you must look up the ArrayList's Add method and see what type its parameter is defined as. In this case, you see that the Add method is prototyped in the following manner:
  public virtual void Add(Object value) 

      The previous code plainly shows that Add takes an Object as a parameter. Object always identifies a reference type. But here I'm passing p, which is a Point value type. For this code to work, the Point value type must be converted into a true heap-managed object, and a reference to this object must be obtained.
      Converting a value type to a reference type is called boxing. Internally, here's what happens when a value type is boxed:
  1. Memory is allocated from the heap. The amount of memory allocated is the size required by the value type plus any additional overhead to consider this value a true object. The additional overhead includes a pointer to a virtual method table and a pointer to a sync block.
  2. The value type's bits are copied to the newly allocated heap memory.
  3. The address of the object is returned. This address is now a reference type.
      Some language compilers, like C#, automatically produce the IL code necessary to box the value type, but it is important that you understand what's going on under the covers so that you are aware of code size and performance issues.
      When the Add method is called, memory is allocated in the heap for a Point object. The members currently residing in the Point value type (p) are copied into the newly allocated Point object. The address of the Point object (a reference type) is returned and is then passed to the Add method. The Point object will remain in the heap until it is garbage-collected. The Point value type variable (p) can be reused or freed since the ArrayList never knows anything about it. Boxing enables a unified view of the type system, where a value of any type can ultimately be treated as an object.
      The opposite of boxing is, of course, unboxing. Unboxing retrieves a reference to the value type (data fields) contained within an object. Internally, the following is what happens when a reference type is unboxed:
  1. The common language runtime first ensures that the reference type variable is not null and that it refers to an object that is a boxed value of the desired value type. If either test fails, then an InvalidCastException exception is generated.
  2. If the types do match, then a pointer to the value type contained inside the object is returned. The value type that this pointer refers to does not include the usual overhead associated with a true object: a pointer to a virtual method table and a sync block.
      Note that boxing always creates a new object and copies the unboxed value's bits to the object. On the other hand, unboxing simply returns a pointer to the data within a boxed object: no memory copy occurs. However, it is commonly the case that your code will cause the data pointed to by the unboxed reference to be copied anyway.
      The following code demonstrates boxing and unboxing:
  public static void Main() {
   Int32 v = 5;    // Create an unboxed value type variable
   Object o = v;   // o refers to a boxed version of v
   v = 123;        // Changes the unboxed value to 123

Console.WriteLine(v + ", " + (Int32) o); // Displays "123, 5" }

From this code, can you guess how many boxing operations occur? You might be surprised to discover that the answer is three! Let's analyze the code carefully to really understand what's going on.
      First, an Int32 unboxed value type (v) is created and initialized to 5. Then an Object reference type (o) is created and it wants to point to v. But reference types must always point to objects in the heap, so C# generated the proper IL code to box v and stored the address of the boxed version of v in o. Now 123 is unboxed and the referenced data is copied into the unboxed value type v; this has no effect on the boxed version of v, so the boxed version keeps its value of 5. Note that this example shows how o is unboxed (which returns a pointer to the data in o), and then the data in o is memory copied to the unboxed value type v.
      Now, you have the call to WriteLine. WriteLine wants a String object passed to it but you don't have a String object. Instead, you have these three items: an Int32 unboxed value type (v), a string, and an Int32 reference (or boxed) type (o). These must somehow be combined to create a String.
      To accomplish this, the C# compiler generates code that calls the String object's static Concat method. There are several overloaded versions of Concat. All of them perform identically; the difference is in the number of parameters. Since you want to format a string from three items, the compiler chooses the following version of the Concat method:

  public static String Concat(Object arg0, Object arg1, Object arg2);

      For the first parameter, arg0, v is passed. But v is an unboxed value parameter and arg0 is an Object, so v must be boxed and the address to the boxed v is passed for arg0. For the arg1 parameter, the address of the ", " string is passed, identifying the address of a String object. Finally, for the arg2 parameter, o (a reference to an Object) was cast to an Int32. This creates a temporary Int32 value type that receives the unboxed version of the value currently referred to by o. This temporary Int32 value type must be boxed once again with the memory address being passed for Concat's arg2 parameter.
      Once Concat is called, it calls each of the specified object's ToString methods and concatenates each object's string representation. The String object returned from Concat is then passed to WriteLine to show the final result.
      I should point out that the generated IL code would be more efficient if the call to WriteLine were written as follows:

  Console.WriteLine(v + ", " + o);    // Displays "123, 5"

This line is identical to the previous version except that I've removed the (Int32) cast that preceded the variable o. This code is more efficient because o is already a reference type to an Object and its address may simply be passed to the Concat method. So, removing the cast saved both an unbox and a box operation.
      Here is another example that demonstrates boxing and unboxing:

  public static void Main() {
   Int32 v = 5;           // Create an unboxed value type variable
   Object o = v;          // o refers to the boxed version of v

   v = 123;               // Changes the unboxed value type to 123
   Console.WriteLine(v);  // Displays "123"

   v = (Int32) o;         // Unboxes o into v
   Console.WriteLine(v);  // Displays "5"
}

      How many boxing operations do you count in this code? The answer is one. There is only one boxing operation because there is a WriteLine method that accepts an Int32 as a parameter:

  public static void WriteLine(Int32 value);

      In the two calls to WriteLine, the variable v (an Int32 unboxed value type) is passed by value. Now, it may be that WriteLine will box this Int32 internally, but you have no control over that. The important thing is that you've done the best you could and have eliminated the boxing from your code.
      If you know that the code you're writing is going to cause the compiler to generate a lot of boxing code, you will get smaller and faster code if you manually box value types, as shown in Figure 4.
      The C# compiler automatically generates boxing and unboxing code. This makes programming easier, but it hides the overhead from the programmer who is concerned with performance. Like C#, other languages may also hide boxing or unboxing details. However, some languages may force the programmer to explicitly write boxing or unboxing code. For example, C++ with Managed Extensions requires that the programmer explicitly box value types using the __box operator; unboxing a value type is done by casting the boxed type to its unboxed equivalent using dynamic_cast.
      One last note: if a value type doesn't override a virtual method defined by System.ValueType, then this method can only be called on the boxed form of the value type. This is because only the boxed form of the object has a pointer to a virtual method table. Methods defined directly with the value type can be called on boxed and unboxed versions of the value.

Conclusion

      The concepts discussed in this column are extremely important to all .NET developers. You should really understand the difference between reference types and value types. You must also understand which operations require boxing, and if you're using a compiler that boxes value types automatically (like C# and Visual Basic®) you should also learn when compilers are going to do this and what effect it has on your code. I can't emphasize enough that a misinterpretation of these concepts can easily cause you to create subtle bugs and performance slowdowns in your program.
Jeffrey Richter is the author of Programming Applications for Microsoft Windows (Microsoft Press, 1999), and cofounder of Wintellect (https://www.Wintellect.com), a software education, debugging, and consulting firm. He specializes in programming/design for .NET and Win32. Jeff is currently writing a Microsoft .NET Frameworks book, and offers .NET seminars.

From the December 2000 issue of MSDN Magazine