Object Serialization in the .NET Framework
Piet Obermeyer and Jonathan Hawkins
Microsoft Corporation
August 2001
Updated March 2002
Summary: Why would you want to use serialization? The two most important reasons are to persist the state of an object to a storage medium so an exact copy can be recreated at a later stage, and to send the object by value from one application domain to another. For example, serialization is used to save session state in ASP.NET and to copy objects to the clipboard in Windows Forms. It is also used by remoting to pass objects by value from one application domain to another. This article provides an overview of the serialization used in the Microsoft .NET Framework. (9 printed pages)
Introduction
Persistent Storage
Marshal By Value
Basic Serialization
Selective Serialization
Custom Serialization
Steps in the Serialization Process
Versioning
Serialization Guidelines
Serialization can be defined as the process of storing the state of an object instance to a storage medium. During this process, the public and private fields of the object and the name of the class, including the assembly containing the class, is converted to a stream of bytes, which is then written to a data stream. When the object is subsequently deserialized, an exact clone of the original object is created.
When implementing a serialization mechanism in an object-oriented environment, you have to make a number of tradeoffs between ease of use and flexibility. The process can be automated to a large extent, provided you are given sufficient control over the process. For example, situations may arise where simple binary serialization is not sufficient, or there might be a specific reason to decide which fields in a class need to be serialized. The following sections examine the robust serialization mechanism provided with the .NET Framework and highlight a number of important features that allow you to customize the process to meet your needs.
It is often necessary to store the value of fields of an object to disk and then retrieve this data at a later stage. Although this is easy to achieve without relying on serialization, this approach is often cumbersome and error prone, and becomes progressively more complex when you need to track a hierarchy of objects. Imagine writing a large business application containing many thousands of objects and having to write code to save and restore the fields and properties to and from disk for each object. Serialization provides a convenient mechanism for achieving this objective with minimal effort.
The Common Language Runtime (CLR) manages how objects are laid out in memory and the .NET Framework provides an automated serialization mechanism by using reflection. When an object is serialized, the name of the class, the assembly, and all the data members of the class instance are written to storage. Objects often store references to other instances in member variables. When the class is serialized, the serialization engine keeps track of all referenced objects already serialized to ensure that the same object is not serialized more than once. The serialization architecture provided with the .NET Framework correctly handles object graphs and circular references automatically. The only requirement placed on object graphs is that all objects referenced by the object that is being serialized must also be marked as Serializable (see Basic Serialization). If this is not done, an exception will be thrown when the serializer attempts to serialize the unmarked object.
When the serialized class is deserialized, the class is recreated and the values of all the data members are automatically restored.
Objects are only valid in the application domain where they are created. Any attempt to pass the object as a parameter or return it as a result will fail unless the object derives from MarshalByRefObject or is marked as Serializable. If the object is marked as Serializable, the object will automatically be serialized, transported from the one application domain to the other, and then deserialized to produce an exact copy of the object in the second application domain. This process is typically referred to as marshal by value.
When an object derives from MarshalByRefObject, an object reference will be passed from one application domain to another, rather than the object itself. You can also mark an object that derives from MarshalByRefObject as Serializable. When this object is used with remoting, the formatter responsible for serialization, which has been preconfigured with a SurrogateSelector takes control of the serialization process and replaces all objects derived from MarshalByRefObject with a proxy. Without the SurrogateSelector in place, the serialization architecture follows the standard serialization rules (see Steps in the Serialization Process) below.
The easiest way to make a class serializable is to mark it with the Serializable attribute as follows:
[Serializable]
public class MyObject {
public int n1 = 0;
public int n2 = 0;
public String str = null;
}
The code snippet below shows how an instance of this class can be serialized to a file:
MyObject obj = new MyObject();
obj.n1 = 1;
obj.n2 = 24;
obj.str = "Some String";
IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile.bin",
FileMode.Create,
FileAccess.Write, FileShare.None);
formatter.Serialize(stream, obj);
stream.Close();
This example uses a binary formatter to do the serialization. All you need to do is create an instance of the stream and the formatter you intend to use, and then call the Serialize method on the formatter. The stream and the object instance to serialize are provided as parameters to this call. Although this is not explicitly demonstrated in this example, all member variables of a class will be serialized, even variables marked as private. In this aspect, binary serialization differs from the XML Serializer, which only serializes public fields.
Restoring the object back to its former state is just as easy. First, create a formatter and a stream for reading, and then instruct the formatter to deserialize the object. The code snippet below shows how this is done.
IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile.bin",
FileMode.Open,
FileAccess.Read,
FileShare.Read);
MyObject obj = (MyObject) formatter.Deserialize(fromStream);
stream.Close();
// Here's the proof
Console.WriteLine("n1: {0}", obj.n1);
Console.WriteLine("n2: {0}", obj.n2);
Console.WriteLine("str: {0}", obj.str);
The BinaryFormatter used above is very efficient and produces a very compact byte stream. All objects serialized with this formatter can also be deserialized with it, which makes it an ideal tool for serializing objects that will be deserialized on the .NET platform. It is important to note that constructors are not called when an object is deserialized. However, this violates some of the usual contracts the run time makes with the object writer, and developers should ensure they understand the ramifications when marking an object as serializable.
If portability is a requirement, use the SoapFormatter instead. Simply replace the formatter in the code above with SoapFormatter, and call Serialize and Deserialize as before. This formatter produces the following output for the example used above.
<SOAP-ENV:Envelope
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:SOAP- ENC=https://schemas.xmlsoap.org/soap/encoding/
xmlns:SOAP- ENV=https://schemas.xmlsoap.org/soap/envelope/
SOAP-ENV:encodingStyle=
"https://schemas.microsoft.com/soap/encoding/clr/1.0
https://schemas.xmlsoap.org/soap/encoding/"
xmlns:a1="https://schemas.microsoft.com/clr/assem/ToFile">
<SOAP-ENV:Body>
<a1:MyObject id="ref-1">
<n1>1</n1>
<n2>24</n2>
<str id="ref-3">Some String</str>
</a1:MyObject>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
It is important to note that the Serializable attribute cannot be inherited. If we derive a new class from MyObject, the new class must be marked with the attribute as well, or it cannot be serialized. For example, when you attempt to serialize an instance of the class below, you will get a SerializationException informing you that the MyStuff type is not marked as serializable.
public class MyStuff : MyObject
{
public int n3;
}
Using the serialization attribute is convenient, but it has limitations as demonstrated above. Refer to the guidelines (see Serialization Guidelines below) regarding when to mark a class for serialization, since serialization cannot be added to a class after it has been compiled.
A class often contains fields that should not be serialized. For example, assume a class stores a thread ID in a member variable. When the class is deserialized, the thread stored for the ID when the class was serialized might not be running anymore, so serializing this value does not make sense. You can prevent member variables from being serialized by marking them with the NonSerialized attribute as follows:
[Serializable]
public class MyObject
{
public int n1;
[NonSerialized] public int n2;
public String str;
}
You can customize the serialization process by implementing the ISerializable interface on an object. This is particularly useful in cases where the value of a member variable is invalid after deserialization, but you need to provide the variable with a value in order to reconstruct the full state of the object. Implementing ISerializable involves implementing the GetObjectData method and a special constructor that will be used when the object is deserialized. The sample code below shows how to implement ISerializable on the MyObject class from a previous section.
[Serializable]
public class MyObject : ISerializable
{
public int n1;
public int n2;
public String str;
public MyObject()
{
}
protected MyObject(SerializationInfo info, StreamingContext context)
{
n1 = info.GetInt32("i");
n2 = info.GetInt32("j");
str = info.GetString("k");
}
public virtual void GetObjectData(SerializationInfo info,
StreamingContext context)
{
info.AddValue("i", n1);
info.AddValue("j", n2);
info.AddValue("k", str);
}
}
When GetObjectData is called during serialization, you are responsible for populating the SerializationInfo object provided with the method call. Simply add the variables to be serialized as name/value pairs. Any text can be used as the name. You have the freedom to decide which member variables are added to the SerializationInfo, provided that sufficient data is serialized to restore the object during deserialization. Derived classes should call the GetObjectData method on the base object if the latter implements ISerializable.
It is important to stress that you need to implement both GetObjectData as well as the special constructor when ISerializable is added to a class. The compiler will warn you if GetObjectData is missing, but since it is impossible to enforce the implementation of a constructor, no warnings will be given if the constructor is absent and an exception will be thrown when an attempt is made to deserialize a class without the constructor. The current design was favored above a SetObjectData method to get around potential security and versioning problems. For example, a SetObjectData method must be public if it is defined as part of an interface, thus users have to write code to defend against having the SetObjectData method called multiple times. One can imagine the headaches that can potentially be caused by a malicious application that calls the SetObjectData method on an object that was in the process of executing some operation.
During deserialization, the SerializationInfo is passed to the class using the constructor provided for this purpose. Any visibility constraints placed on the constructor are ignored when the object is deserialized, so you can mark the class as public, protected, internal, or private. It is a good idea to make the constructor protected unless the class is sealed, in which case the constructor should be marked private. To restore the state of the object, simply retrieve the values of the variables from the SerializationInfo using the names used during serialization. If the base class implements ISerializable, the base constructor should be called to allow the base object to restore its variables.
When you derive a new class from one that implements ISerializable, the derived class must implement both the constructor as well as the GetObjectData method if it has any variables that need to be serialized. The code snippet below shows how this is done using the MyObject class shown previously.
[Serializable]
public class ObjectTwo : MyObject
{
public int num;
public ObjectTwo() : base()
{
}
protected ObjectTwo(SerializationInfo si, StreamingContext context) :
base(si,context)
{
num = si.GetInt32("num");
}
public override void GetObjectData(SerializationInfo si,
StreamingContext context)
{
base.GetObjectData(si,context);
si.AddValue("num", num);
}
}
Do not forget to call the base class in the deserialization constructor; if this is not done, the constructor on the base class will never be called and the object will not be fully constructed after deserialization.
Objects are reconstructed from the inside out, and calling methods during deserialization can have undesirable side effects, since the methods called might refer to object references that have not been deserialized by the time the call is made. If the class being deserialized implements the IDeserializationCallback, the OnSerialization method will automatically be called when the entire object graph has been deserialized. At this point, all the child objects referenced have been fully restored. A hash table is a typical example of a class that is difficult to deserialize without using the event listener described above. It is easy to retrieve the key/value pairs during deserialization, but adding these objects back to the hash table can cause problems since there is no guarantee that classes that derived from the hash table have been deserialized. Calling methods on a hash table at this stage is therefore not advisable.
When the Serialize method is called on a formatter, object serialization proceeds according to the following rules:
- A check is made to determine if the formatter has a surrogate selector. If it does, check if the surrogate selector handles objects of the given type. If the selector handles the object type, ISerializable.GetObjectData is called on the surrogate selector.
- If there is no surrogate selector or if it does not handle the type, a check is made to determine if the object is marked with the Serializable attribute. If it is not, a SerializationException is thrown.
- If it is marked appropriately, check if the object implements ISerializable. If it does, GetObjectData is called on the object.
- If it does not implement ISerializable, the default serialization policy is used, serializing all fields not marked as NonSerialized.
The .NET Framework provides support for versioning and side-by-side execution, and all classes will work across versions if the interfaces of the classes remain the same. Since serializations deals with member variables and not interfaces, be cautious when adding or removing member variables to classes that will be serialized across versions. This is especially true for classes that do not implement ISerializable. Any change of state of the current version, such as the addition of member variables, changing the types of variables, or changing their names, will mean that existing objects of the same type cannot be successfully deserialized if they were serialized with a previous version.
If the state of an object needs to change between versions, class authors have two choices:
- Implement ISerializable. This allows you to take precise control of the serialization and deserialization process, allowing future state to be added and interpreted correctly during deserialization.
- Mark nonessential member variables with the NonSerialized attribute. This option should only be used when you expect minor changes between different versions of a class. For example, when a new variable has been added to a later version of a class, the variable can be marked as NonSerialized to ensure the class remains compatible with previous versions.
You should consider serialization when designing new classes since a class cannot be made serializable after it has been compiled. Some questions to ask are: Do I have to send this class across application domains? Will this class ever be used with remoting? What will my users do with this class? Maybe they derive a new class from mine that needs to be serialized. When in doubt, mark the class as serializable. It is probably better to mark all classes as serializable unless:
- They will never cross an application domain. If serialization is not required and the class needs to cross an application domain, derive the class from MarshalByRefObject.
- The class stores special pointers that are only applicable to the current instance of the class. If a class contains unmanaged memory or file handles, for example, ensure these fields are marked as NonSerialized or don't serialize the class at all.
- Some of the data members contain sensitive information. In this case, it will probably be advisable to implement ISerializable and serialize only the required fields.