New information has been added to this article since publication.
Refer to the Editor's Update below.

Advanced Serialization

Format Your Way to Success with the .NET Framework Versions 1.1 and 2.0

Juval Lowy

This article discusses:

  • Serialization events
  • Secure serialization
  • Serialization and generics
  • The binary formatter and versioning
This article uses the following technologies:
.NET and C#

Code download available at:AdvancedSerialization.exe(245 KB)

Contents

Delegates and Serialization
Deserialization Events in the .NET Framework 1.1
Binary Formatter Serialization Events
Applying the Event Attributes
Serialization Events and Class Hierarchies
Using the Deserializing Event
Using the Deserialized Event
Generic Formatters
Serialization and Versioning
Binary Formatter Version Tolerance
Custom Serialization and Versioning
Serialization and Cloning
Constraining Serialization
Manual Base Class Serialization
Securing Serialization
Conclusion

Serialization is the act of persisting the state of an object, typically to a file or a memory stream. Serialization is crucial when it comes to composing an application out of independent components. If an application contains several components from various vendors, how will it persist its objects into a single file without having the individual components overwrite and destroy the information persisted by other components? By default, objects in .NET are not serializable. But the Microsoft® .NET Framework provides a standard, straightforward way to serialize and deserialize objects. In most cases, all you have to do is add the Serializable attribute and thus grant your consent for serializing the object, as shown here:

[Serializable] public class MyClass { public string SomeString; }

The .NET Framework uses reflection during runtime serialization to read the state of the object and write it to the stream. Reflection is also used during deserialization to set the state of the object using values read from the stream. If you want to preclude serialization of individual members, you can use the NonSerialized attribute, like this:

public class MyOtherClass {...} [Serializable] public class MyClass { [NonSerialized] MyOtherClass m_Obj; }

In fact, you must use that attribute when the field's type is not serializable, otherwise you'll get errors when attempting to serialize or deserialize the class or structure that contains it. You can even provide custom serialization by also implementing the ISerializable interface.

While .NET offers adequate support for the simple cases, issues arise when you're attempting to support delegates and subscribers, versioning, and class hierarchies. The .NET Framework 2.0 adds support for serialization events and some degree of versioning tolerance. In addition, the introduction of generics opens new dimensions, both in the complexity of the serialization task and the power now available for serialization tools. Because security must be at the forefront of your development effort, secure serialization is essential. I'll discuss these issues and related pitfalls, suggest techniques and workarounds that address them, and recommend when and how to best use the new serialization abilities.

Delegates and Serialization

All delegates are compiled into serializable classes. This means that when serializing an object that has a delegate member variable, the delegate's internal invocation list is serialized too. This makes serializing delegates very difficult because there are no guarantees that the target objects in the internal list are serializable. Consequently, sometimes the serialization will work and sometimes it will throw a serialization exception. In addition, the object containing the delegate typically does not know or care about the actual state of the delegate. This is even more the case when the delegate is used to manage event subscriptions. The exact number and identity of the subscribers are often transient values that should not persist between application sessions.

As a result, you should mark delegate member variables as nonserializable using the NonSerialized attribute:

[Serializable] public class MyClass { [NonSerialized] EventHandler m_MyEvent; }

In the case of events, you must also add the field attribute qualifier when applying the NonSerialized attribute so that the attribute is applied to the underlying delegate rather than to the event itself:

[Serializable] public class MyPublisher { [field:NonSerialized] public event EventHandler MyEvent; }

Deserialization Events in the .NET Framework 1.1

When .NET deserializes an object, it initializes the nonserializable member variables to the default value for that type. It is then up to you to provide code to initialize the variables to their correct value. To that end, the object needs to know when it's being deserialized. To let the object know, you implement the interface IDeserializationCallback:

public interface IDeserializationCallback { void OnDeserialization(object sender); }

OnDeserialization is called after an object is deserialized, allowing it to perform the required custom initialization steps. The sender parameter is ignored and is always set to null. Figure 1 shows the implementation of IDeserializationCallback. Here, MyClass has a database connection as a member variable of the type SqlConnection, which isn't a serializable type and is therefore marked using the NonSerialized attribute. In its implementation of OnDeserialization, MyClass creates a new connection object, initializes the connection object by providing it with a connection string, and then proceeds to open it.

Figure 1 Using IDeserializationCallback

using System.Runtime.Serialization; [Serializable] public class MyClass : IDeserializationCallback { [NonSerialized] IDbConnection m_Connection; string m_ConnectionString; public void OnDeserialization(object sender) { Debug.Assert(m_Connection == null); m_Connection = new SqlConnection(); m_Connection.ConnectionString = m_ConnectionString; m_Connection.Open(); } }

Binary Formatter Serialization Events

The .NET Framework 2.0 introduces support for serialization events. Designated methods on your class will be called when serialization and deserialization take place. The .NET Framework 2.0 defines four serialization and deserialization events. The Serializing event is raised just before serialization takes place, and the Serialized event is raised just after serialization. Similarly, the Deserializing event is raised just before deserialization, and the Deserialized event is raised after deserialization. Both classes and structures can take advantage of serialization events. You designate methods as serialization event handlers using method attributes, as shown in Figure 2.

Figure 2 Applying Serialization Event Attributes

[Serializable] public class MyClass { [OnSerializing] void OnSerializing(StreamingContext context) {...} [OnSerialized] void OnSerialized(StreamingContext context) {...} [OnDeserializing] void OnDeserializing(StreamingContext context) {...} [OnDeserialized] void OnDeserialized(StreamingContext context) {...} }

Each serialization event-handling method must have the signature shown in the following line of code:

void <Method Name>(StreamingContext context);

This is required because, internally, delegates with a matching signature are used to invoke the event-handling methods.

Figure 3 Events During Serialization

Figure 3** Events During Serialization **

As the attribute names imply, the OnSerializing attribute designates a method to handle the Serializing event, and the OnSerialized attribute designates a method that handles the Serialized event. Similarly, the OnDeserializing attribute designates a method to be used for handling the Deserializing event, and the OnDeserialized attribute designates a method to be used to handle the Deserialized event. Figure 3 is an activity diagram depicting the order in which events are raised during serialization when using the binary formatter.

The serialization mechanism verifies which type of formatter is used. If it is the SOAP formatter (or any custom formatter), then .NET simply performs serialization. However, if the binary formatter is used, .NET first raises the Serializing event, thus invoking the corresponding event handlers (there can be more than one, as you will see shortly). Next, .NET serializes the object, and finally the Serialized event is raised and its event handlers are invoked. The process of deserialization is executed in a similar fashion. Figure 4 is an UML activity diagram depicting the order in which binary deserialization events are raised.

Figure 4 Events During Deserialization

Figure 4** Events During Deserialization **

Unlike the serialization events, with deserialization .NET has to accommodate the use of IDeserializationCallback. If the binary format is not used, the formatter performs deserialization and then calls the OnDeserialization method of IDeserializationCallback if the class implements it. When the binary formatter is used, the formatter first raises the Deserializing event, followed by the deserialization itself. If the class implements IDeserializationCallback, then the formatter calls the OnDeserialization method and finally raises the Deserialized event. Note that in order to call the Deserializing event-handling methods, the formatter first has to construct an object, yet it does so without ever calling any of the class's constructors.

Applying the Event Attributes

The .NET Framework 2.0 allows you to apply the same serialization event attributes on multiple methods of the class:

[OnSerializing] void OnSerializing1(StreamingContext context) {...} [OnSerializing] void OnSerializing2(StreamingContext context) {...}

This can come in handy when you want to add additional processing to a serialization event without affecting existing event-handling methods. While you can also apply multiple attributes on the same event-handling method, as shown in the following, the usefulness of doing so is questionable:

[OnSerializing] [OnSerialized] void OnSerialization(StreamingContext context) {...}

The method will be called once per attribute, and there is no easy way to detect which event is raised inside the method.

Serialization Events and Class Hierarchies

A significant advantage to using attributes for events as opposed to using interfaces is that the event mechanism is decoupled from the class hierarchy. When using attributes, the event-handling methods are called for each level in a class hierarchy. There is no need to call the base class's event-handling methods and there is no problem if those base methods are private. The events are raised according to the order of the class hierarchy, and the event attributes are not inherited. For example, when serializing an object of the type MySubClass, as shown in the following code, OnSerializing1 is called first, followed by a call to OnSerializing2:

[Serializable] public class MyBaseClass { [OnSerializing] void OnSerializing1(StreamingContext context) {...} } [Serializable] public class MySubClass : MyBaseClass { [OnSerializing] void OnSerializing2(StreamingContext context) {...} }

The situation could therefore get messy when virtual methods are involved and the sub class overrides its base class's handling of a serialization event or even calls it. As a result, the serialization infrastructure will throw a SerializationException if any of the event attributes are applied on a virtual method or on an overriding method. Use of the new inheritance qualifier is still allowed in conjunction with serialization events. Since you should not encounter any external party besides the .NET Framework calling a serialization event-handling method, I recommend always designing such methods as private. Similarly, the serialization event attributes throw an exception when applied on a class method that implements an interface method.

Using the Deserializing Event

Since no constructor calls are ever made during deserialization, the Deserializing event-handling method is logically your deserialization constructor. It is designed for performing custom pre-deserialization steps, which typically involve the initialization of nonserializable members. Any value settings performed on the serializable members will be a waste of time because the binary formatter will set those members again during deserialization using values from the serialization stream.

The main difference between the Deserializing event and IDeserializationCallback is that OnDeserialization is called after deserialization is complete, while the Deserializing event is called before deserialization starts. In the Deserializing event-handling method you should place the initialization steps that are independent of the values saved in the serialization stream. In contrast, in OnDeserialization you can take advantage of already deserialized members (such as the database connection string in Figure 1). Other tasks you can perform in the deserialization event-handling method are setting specific environment variables such as thread local storage and signaling global synchronization events.

Using the Deserialized Event

Taking advantage of the Deserialized event makes the use of IDeserializationCallback redundant, as the two are logically equivalent. Both let your class respond to post-deserialization events and initialize nonserializable members while using already deserialized values, as the following code demonstrates:

[Serializable] public class MyClass { [NonSerialized] IDbConnection m_Connection; string m_ConnectionString; [OnDeserialized] void OnDeserialized(StreamingContext context) { m_Connection = new SqlConnection(); m_Connection.ConnectionString = m_ConnectionString; m_Connection.Open(); } }

It performs exactly the same task as in Figure 1, but it relies on the Deserialized event rather than on IDeserializationCallback.

Note that IDeserializationCallback is still useful when you cannot guarantee the use of a BinaryFormatter or the presence of the .NET Framework 2.0 runtime. Let me emphasize that relying on the serialization events mandates that the client always use the BinaryFormatter on the class; the SoapFormatter has no equivalent support for the serialization events.

Generic Formatters

The IFormatter interface is implemented by both the binary and SOAP formatters, as shown here:

public interface IFormatter { object Deserialize(Stream serializationStream); void Serialize(Stream serializationStream, object graph); /* Other methods */ }

IFormatter's significant methods are Serialize and Deserialize, which perform the actual serialization and deserialization. The problem with IFormatter is that it was defined before generics was available in .NET. With the introduction of generics in the .NET Framework 2.0, you can improve on the available implementations of IFormatter by providing generic and type-safe wrappers around them. You can define the IGenericFormatter interface which provides similar methods to IFormatter, but does so using generic methods, as shown in the following code:

public interface IGenericFormatter { T Deserialize<T>(Stream serializationStream); void Serialize<T>(Stream serializationStream,T graph); }

The use of generic methods is preferable to making the whole interface generic. You could use the same formatter instance, but change the type parameter being serialized or deserialized in every call. Next, implement IGenericFormatter by encapsulating a nongeneric formatter, and delegate the calls to it.

Figure 5 shows the generic class GenericFormatter<F> that implements IGenericFormatter. GenericFormatter<F> is defined using the generic type parameter F, which is constrained to implement IFormatter and provide a default constructor. This enables GenericFormatter<F> to declare a member of the type IFormatter and assign to it a new F object:

IFormatter m_Formatter = new F();

Figure 5 also defines two subclasses of GenericFormatter<F>: GenericBinaryFormatter and GenericSoapFormatter. All they do is provide the binary or the SOAP formatter, respectively, as type parameters to GenericFormatter<F>. You could have defined GenericBinaryFormatter and GenericSoapFormatter with the using statement except that would have had only file scope:

using GenericBinaryFormatter = GenericFormatter<BinaryFormatter>; using GenericSoapFormatter = GenericFormatter<SoapFormatter>;

In this case, inheritance is good for strong typing and shorthand across files and assemblies. Figure 6 shows the use of the generic and type-safe formatters.

Figure 6 Using the Generic and Type-Safe Formatters

[Serializable] public class MyClass {...} MyClass obj1 = new MyClass(); IGenericFormatter formatter = new GenericBinaryFormatter(); Stream stream = new FileStream( @"C:\obj.bin",FileMode.Create,FileAccess.ReadWrite); formatter.Serialize(stream,obj1); stream.Seek(0,SeekOrigin.Begin); MyClass obj2 = formatter.Deserialize<MyClass>(stream); stream.Close();

Figure 5 GenericFormatter<F>

public class GenericFormatter<F> : IGenericFormatter where F : IFormatter,new() { IFormatter m_Formatter = new F(); public T Deserialize<T>(Stream serializationStream) { return (T)m_Formatter.Deserialize(serializationStream); } public void Serialize<T>(Stream serializationStream,T graph) { m_Formatter.Serialize(serializationStream,graph); } } public class GenericBinaryFormatter : GenericFormatter<BinaryFormatter> {} public class GenericSoapFormatter : GenericFormatter<SoapFormatter> {}

Serialization and Versioning

If an application wants to serialize the state of multiple objects of multiple types to the same stream, a simple dump of object state will not do. The formatter must also capture the object's type information. During deserialization, the formatter needs to read the type's metadata and initialize a new object according to the information serialized, populating the corresponding field. The easiest way to capture the type information is to record the type's name and assembly. For each object that is serialized, the formatter persists the state of the object (the values held in its various fields) and the version information and full name of its assembly, including a token of the assembly's public key (if a strong name is used). When the formatter deserializes the object, it loads its assembly and reflects the type's metadata.

By default, the formatters comply with the version-binding and assembly-resolving policies of the common language runtime (CLR). If the serialized type's assembly does not have a strong name, the formatters try to load a private assembly and completely ignore any version incompatibility between the version captured during serialization and the version of the assembly found. If the serialized type's assembly has a strong name, the CLR loader insists on using a compatible assembly. If such an assembly is not found, an exception is thrown.

Both the binary and SOAP formatters provide a way to record only the friendly name of the assembly without any version or public key token, even if the assembly has a strong name. The formatters provide a public property called AssemblyFormat of the enum type FormatterAssemblyStyle:

public enum FormatterAssemblyStyle { Full, Simple }

The default value of AssemblyFormat is FormatterAssemblyStyle.Full. If you set it to FormatterAssemblyStyle.Simple, no version compatibility checks will take place during deserialization:

SoapFormatter formatter = new SoapFormatter(); formatter.AssemblyFormat = FormatterAssemblyStyle.Simple;

However, I strongly discourage you from circumventing serialization version and type verification. At best, a potential incompatibility will result in an exception of type SerializationException. At worst, your application may later crash unexpectedly because the incompatible type required some custom initialization steps.

Binary Formatter Version Tolerance

In the .NET Framework 1.x, there had to be absolute compatibility between the metadata used to serialize a type and the metadata used to deserialize it. This meant that if your application had clients with serialized state of your types, your type members' metadata needed to be immutable or you would break those clients.

In the .NET Framework 2.0, the binary formatter has acquired some version-tolerance capabilities with respect to changes in the type metadata, rather than changes to the assembly version itself. Imagine a class library vendor that provides a serializable component. The various client applications are responsible for managing the serialization media (typically a file). Suppose the vendor changes the component definition by adding a private member variable. Such a change does not necessitate a version change because new client applications can serialize the new component properly. However, the serialization information captured by the old applications is now incompatible and will result in a SerializationException if used in the .NET Framework 1.x. The vendor can, of course, increment the assembly version number, but doing so will prevent the old clients from taking advantage of new functionality. The binary formatter in the .NET Framework 2.0 was redesigned to handle such predicaments.

In the case of removing an unused member variable, the binary formatter will simply ignore the additional information found in the stream. For example, suppose you used the following class definition and serialized it using a binary formatter:

// Version 1.0 [Serializable] public class MyClass { public int Number1; public int Number2; }

Without changing the assembly version, you can now remove one of the member variables, rebuild, redeploy, and deserialize instances of version 2.0 of MyClass with the serialization information captured using version 1.0 of MyClass:

// Version 2.0 [Serializable] public class MyClass { public int Number1; }

The real challenge in type-versioning tolerance is dealing with new members because the old serialization information does not contain any information about them. By default, the binary formatter is not tolerant of the new members, and will throw an exception. The .NET Framework 2.0 addresses this problem by providing the OptionalField field attribute—a simple attribute with a single public member of type int called VersionAdded:

[AttributeUsage(AttributeTargets.Field)] public sealed class OptionalFieldAttribute : Attribute { public int VersionAdded; }

Applying the OptionalField attribute has no effect during serialization, and fields marked with it will be serialized into the stream. OptionalField is meant to be applied on new fields of your type as it causes the binary formatter to ignore the new member during the deserialization process:

//Version 1.0 [Serializable] public class MyClass { public int Number1; } //Version 2.0 [Serializable] public class MyClass { public int Number1; [OptionalField] public int Number2; }

That said, if the new member variable has a good enough default value, such as the application's default directory or user preferences, you can use values provided by the new clients and synthesize a value for the old clients. You will need to provide these values in your handling of the deserializing event. The reason is that if you only set them before deserialization, when the stream does contain serialized values, the deserialization process will override the values previously set.

Consider this class version:

//Version 1.0 [Serializable] public class MyClass { public int Number1; }

Suppose you want to add a new class member called Number2, while using the old serialization information. You can provide a handler for the Deserializing event and in it initialize Number2:

[Serializable] public class MyClass { public int Number1; [OptionalField] public int Number2; [OnDeserializing] void OnDeserializing(StreamingContext context) { Number2 = 123; } }

But what if the values you synthesize are somehow dependent on the version of the class in which they are added? You can store version information in the OptionalField attribute using its VersionAdded member:

[OptionalField(VersionAdded = 1)] public int Number2;

In the Deserializing event handler you will need to use reflection to read the value of the VersionAdded field and act accordingly, as shown in Figure 7. The example uses the helper method OptionalFieldVersion of the SerializationUtil static helper class. The OptionalFieldVersion method relies on a generic type parameter for the type used, and it accepts the member variable name to reflect, returning the value of the VersionAdded field.

Figure 7 Relying on VersionAdded

[Serializable] public class MyClass { public int Number1; [OptionalField(VersionAdded = 1)] public int Number2; [OnDeserializing] void OnDeserializing(StreamingContext context) { int versionAdded = SerializationUtil.OptionalFieldVersion<MyClass>("Number2"); if(versionAdded == 1) Number2 = 123; if(versionAdded == 2) Number2 = 456; } } public static class SerializationUtil { public static int OptionalFieldVersion<T>(string member) { Type type = typeof(T); MemberInfo[] members = type.GetMember(member,BindingFlags.Instance | BindingFlags.NonPublic| BindingFlags.Public | BindingFlags.DeclaredOnly); Debug.Assert(members.Length == 1); object[] attributes = members[0].GetCustomAttributes( typeof(OptionalFieldAttribute),false); Debug.Assert(attributes.Length == 1); //Exactly one is expected OptionalFieldAttribute attribute; attribute = attributes[0] as OptionalFieldAttribute; return attribute.VersionAdded; } }

Custom Serialization and Versioning

Type changes often do involve changes to the version number of the containing assembly. In such cases, you cannot take advantage of the binary formatter type-version tolerance because it is first subject to assembly versioning. In such cases, you need to use custom serialization to deal with version issues between the serialized information and the current class definition. As with binary formatter version tolerance, the component needs to be able to determine the version of its assembly with which the serialized information was saved and then act accordingly.

The SerializationInfo class used in the GetObjectData method of ISerializable provides the AssemblyName property, and you can extract from it the version of the assembly used to serialize the information. For example, suppose version 1.0.0.0 of the serializable class MyClass is distributed to clients:

//Version 1.0.0.0 [Serializable] public class MyClass { public int Number; }

The clients persisted this version in the serialized state information. With version 2.0.0.0 of the assembly, the vendor adds the new field NewField to the class definition. Using version checks, the component can decide during deserialization whether to use the serialized value or assign a default value. Figure 8 illustrates this technique. It uses the helper class, SerializationUtil, to extract the version information from the assembly's full name. Note that the version extracted is the version of the assembly in which the original serialized type resided.

[Editor's Update - 3/14/2005: The SerializationInfo.AssemblyName and SerializationInfo.FullTypeName properties have no useful meaning during deserialization. The BinaryFormatter sets these values to match the assembly name and type name of the type being deserialized into. As such, it is a mistake to depend on them for versioning during deserialization.]

Figure 8 Versioning Using Custom Serialization

//Version 2.0.0.0 [Serializable] public class MyClass : ISerializable { public int Number; public int NewField; public void GetObjectData( SerializationInfo info,StreamingContext context) { info.AddValue("Number",Number); info.AddValue("NewField",NewField); } protected MyClass(SerializationInfo info,StreamingContext context) { Number = info.GetInt32("Number"); Version storedVersion = SerializationUtil.GetVersion(info); if(storedVersion.ToString() == "2.0.0.0") { NewField = info.GetInt32("NewField"); } else { NewField = 123;//Some default value } } public MyClass() {} } public static class SerializationUtil { static public Version GetVersion(SerializationInfo info) { string assemblyName = info.AssemblyName; /* AssemblyName is in the form of "MyAssembly, Version=1.2.3.4, Culture=neutral,PublicKeyToken=null" */ char[] separators = {',','='}; string[] nameParts = assemblyName.Split(separators); return new Version(nameParts[2]); } //Rest of SerializationUtil }

Serialization and Cloning

In addition to a FileStream, you can use any other type of Stream, such as a NetworkStream or MemoryStream. You can actually use a MemoryStream to create a deep clone of a serializable object. Figure 9 shows the static Clone method of the SerializationUtil static helper class.

Figure 9 Cloning a Serializable Object

public static class SerializationUtil { static public T Clone<T>(T source) { Debug.Assert(typeof(T).IsSerializable); IGenericFormatter formatter = new GenericBinaryFormatter(); Stream stream = new MemoryStream(); formatter.Serialize(stream,source); stream.Seek(0,SeekOrigin.Begin); T clone = formatter.Deserialize<T>(stream); stream.Close(); return clone; } //Rest of SerializationUtil }

The Clone method first verifies that the object passed in is serializable. This is easily accomplished by obtaining the type of the source object. The Type class provides a Boolean property, IsSerializable, which returns true if the type has been marked with the Serializable attribute. The Clone method then uses the GenericBinaryFormatter helper class to serialize and deserialize the object into a memory stream and returns the deserialized object. Using Clone is quite straightforward, as shown in the following code:

[Serializable] public class MyClass {...} MyClass obj1 = new MyClass(); MyClass obj2 = SerializationUtil.Clone(obj1);

You can use SerializationUtil.Clone as an easy way to implement the interface ICloneable:

[Serializable] public class MyClass : ICloneable { public object Clone() { return SerializationUtil.Clone(this); } }

Constraining Serialization

A generic class that has generic type parameters as members can still be marked for serialization, like so:

[Serializable] public class MyClass<T> { T m_T; }

In such cases, however, the generic class is only serializable if the generic type parameter specified is serializable. Consider this code:

public class SomeClass {} MyClass<SomeClass> obj;

Here, obj is not serializable because the type parameter, SomeClass, is not serializable. Consequently, MyClass<T> may or may not be serializable, depending on the generic type parameter used. This may result in a runtime loss of data or system corruption because the using app may not be able to persist the state of the object.

Presently, .NET 2.0 does not provide a mechanism for constraining a generic type parameter to be serializable. There are three workarounds to guarantee deterministic serialization behavior. The first is to mark all member variables of generic type parameter as nonserializable, like so:

[Serializable] public class MyClass<T> { [NonSerialized] T m_T; }

This, of course, may seriously damage the ability of the generic class MyClass<T> to function properly in all cases where you need to serialize the state of members of a generic type.

The second workaround is to place a constraint on the generic type parameter to implement ISerializable:

[Serializable] public class MyClass<T> where T : ISerializable { T m_T; }

This ensures that all instances of MyClass<T>, regardless of the type parameter, are serializable. But it does place a burden on the class developer for implementing custom serialization of all generic type parameters used.

The third and best solution is to perform a single runtime check before any use of the type MyClass<T> and abort the use immediately before any damage can take place. The trick is to place the runtime verification in a C# static constructor. Figure 10 demonstrates this technique.

Figure 10 Runtime Serialization Constraints

[Serializable] class MyClass<T> { static MyClass() { SerializationUtil.ConstrainType<T>(); } T m_T; } public static class SerializationUtil { public static void ConstrainType<T>() { Type type = typeof(T); bool serializable = type.IsSerializable; if(serializable == false) { string message = "The type " + type.ToString() + " is not serializable"; Exception exception = new SerializationException(message); throw exception; } } //Rest of SerializationUtil }

The C# static constructor is invoked exactly once per type, per app domain, upon the first attempt to instantiate an object of that type. In Figure 10, the static constructor calls the static helper method, ConstrainType, of SerializationUtil. Although ConstrainType does not accept any arguments, it does define a single generic type parameter that the caller needs to supply. ConstrainType then simply verifies that the generic type parameter is serializable by checking the IsSerializable property of its type. If the generic type parameter is not serializable, ConstrainType throws a serialization exception, thus aborting any attempt to use the type.

Performing the constraint verification in the static constructor is a technique applicable to any constraint that you cannot enforce at compile time, yet where you have some programmatic way of determining and enforcing it at run time.

Manual Base Class Serialization

Combining class hierarchies and serialization, whether fully automatic or custom, is straightforward: all classes use only the Serializable attribute or they all do that plus implement ISerializable. The picture isn't so clear when it comes to deriving a serializable class from a class that's not marked with the Serializable attribute, as in this case:

public class MyBaseClass {} [Serializable] public class MySubClass : MyBaseClass {}

In fact, here a formatter can't serialize objects of type MySubClass at all because it can't serialize their base classes. Trying to serialize an object of type MySubClass results in an exception of type SerializationException. Such a situation may happen when deriving from a class in a third-party assembly where the vendor neglected to mark its class as serializable.

The good news is that you can provide a solution even for such a case. The solution presented next isn't a sure cure. It assumes that none of the base classes require custom serialization steps. The solution merely compensates for an oversight of not marking the base class as serializable.

The workaround is simple: the subclass can implement ISerializable, use reflection to read and serialize the base classes' fields, and use reflection again to set these fields during deserialization. The static SerializationUtil helper class provides the two static methods, SerializeBaseType and DeserializeBaseType, defined as you can see here:

public static class SerializationUtil { public static void SerializeBaseType(object obj, SerializationInfo info, StreamingContext context); public static void DeserializeBaseType(object obj, SerializationInfo info, StreamingContext context); //Rest of SerializationUtil }

All the subclass needs to do is implement ISerializable and use SerializationUtil to serialize and deserialize its base classes, as shown in the following:

public class MyBaseClass {} [Serializable] public class MySubClass : MyBaseClass,ISerializable { public MySubClass() {} public void GetObjectData(SerializationInfo info, StreamingContext context) { SerializationUtil.SerializeBaseType(this,info,context); } protected MySubClass(SerializationInfo info,StreamingContext context) { SerializationUtil.DeserializeBaseType(this,info,context); } }

If the subclass has no need for custom serialization, and the only reason it implements ISerializable is to serialize its base class, you can use SerializationUtil to serialize the subclass as well. SerializationUtil provides these overloaded versions of SerializeBaseType and DeserializeBaseType:

public static void SerializeBaseType(object obj,bool serializeSelf, SerializationInfo info, StreamingContext context); public static void DeserializeBaseType(object obj,bool deserializeSelf, SerializationInfo info, StreamingContext context);

These versions accept a flag instructing them whether to start serialization with the type itself instead of with its base class:

public void GetObjectData(SerializationInfo info,StreamingContext context) { //Serializing this type and its base classes SerializationUtil.SerializeBaseType(this,true,info,context); } protected MyClass(SerializationInfo info,StreamingContext context) { //Deserializing this type and its base classes SerializationUtil.DeserializeBaseType(this,true,info,context); }

Figure 11 presents the implementation of SerializeBaseType and DeserializeBaseType. When SerializationUtil serializes an object's base class, it needs to serialize all of the base classes leading to that base class as well. You can access the base class type using the BaseType property of Type.

Figure 11 Manual Base Class Serialization

public static class SerializationUtil { public static void SerializeBaseType(object obj, SerializationInfo info,StreamingContext context) { Type baseType = obj.GetType().BaseType; SerializeBaseType(obj,baseType,info,context); } static void SerializeBaseType(object obj,Type type, SerializationInfo info, StreamingContext context) { if(type == typeof(object)) { return; } BindingFlags flags = BindingFlags.Instance|BindingFlags.DeclaredOnly| BindingFlags.NonPublic|BindingFlags.Public; FieldInfo[] fields = type.GetFields(flags); foreach(FieldInfo field in fields) { if(field.IsNotSerialized) { continue; } string fieldName = type.Name + "+" + field.Name; info.AddValue(fieldName,field.GetValue(obj)); } SerializeBaseType(obj,type.BaseType,info,context); } public static void DeserializeBaseType(object obj, SerializationInfo info, StreamingContext context) { Type baseType = obj.GetType().BaseType; DeserializeBaseType(obj,baseType,info,context); } static void DeserializeBaseType(object obj,Type type, SerializationInfo info, StreamingContext context) { if(type == typeof(object)) { return; } BindingFlags flags = BindingFlags.Instance|BindingFlags.DeclaredOnly| BindingFlags.NonPublic|BindingFlags.Public; FieldInfo[] fields = type.GetFields(flags); foreach(FieldInfo field in fields) { if(field.IsNotSerialized) { continue; } string fieldName = type.Name + "+" + field.Name; object fieldValue = info.GetValue(fieldName,field.FieldType); field.SetValue(obj,fieldValue); } DeserializeBaseType(obj,type.BaseType,info,context); } //Rest of SerializationUtil }

With the Type.GetFields method, you can get all of the fields (private and public) declared by a type as well as any public or protected fields available via its own base classes. This isn't good enough because you need to capture all of the private fields available from all levels of the class hierarchy, including name repetition. The solution is to serialize each level of the class hierarchy separately. SerializeBaseType calls a private helper method also called SerializeBaseType, providing it with the level of the class hierarchy to serialize:

SerializeBaseType(obj,baseType,info,context);

The private SerializeBaseType serializes that level and then calls itself recursively, serializing the next level up the hierarchy:

SerializeBaseType(obj,type.BaseType,info,context);

The recursion stops once it reaches the System.Object level. To serialize a particular level, the private SerializeBaseType calls GetFields with a binding flags mask, which instructs it to return all fields defined by this type only and not its base types (BindingFlags.DeclaredOnly), so that as it visits the next level up, it doesn't serialize fields more than once. It also binds only to instance fields and not to static fields because static fields are never serialized anyway. The private SerializeBaseType then calls GetFields and stores the result in an array of FieldInfo objects:

FieldInfo[] fields = type.GetFields(flags);

The solution needs to deal with a class hierarchy in which some levels actually use the Serializable attribute, such as class A in the following example:

[Serializable] class A {} class B : A {} [Serializable] class C : B,ISerializable {}

Because class A may have some fields marked with the NonSerialized attribute, the solution needs to check that a field is serializable. This is easy to do using the IsNotSerialized Boolean property of FieldInfo:

foreach(FieldInfo field in fields) { if(field.IsNotSerialized) { continue; } //Rest of the iteration loop }

Since different levels in a hierarchy can declare private fields with the same name, the private SerializeBaseType prefixes a field name with its declaring type, separated by a plus sign:

string fieldName = type.Name + "+" + field.Name;

The value of a field is obtained using the GetValue method of FieldInfo and is then added to the info parameter:

info.AddValue(fieldName,field.GetValue(obj));

Deserialization of the base class (or classes) is similar to serialization and is done recursively as well until it reaches the System.Object level. At each level in the class hierarchy, the private DeserializeBaseType retrieves the collection of the fields for that type. For each field, it creates a name by appending the name of the current level to the name of the field, gets the value from info, and sets the value of the corresponding field using the SetValue method of the FieldInfo class.

Securing Serialization

Imagine a class containing sensitive information that needs to interact with partially trusted clients. If a malicious client could provide its own serialization formatters, it could gain access to the sensitive information or it could deserialize an object with bogus state. To prevent abuse by such serialization clients, a class can demand that its clients have the security permission to provide a serialization formatter during link time, using the SecurityPermission attribute with the dedicated SecurityPermissionFlag.SerializationFormatter flag:

[SecurityPermission(SecurityAction.LinkDemand, Flags = SecurityPermissionFlag.SerializationFormatter)] [Serializable] public class MyClass {...}

The link-time demand does not trigger a security stack walk, but rather verifies that the formatter has the demanded permission.

The problem with demanding serialization permission at the class level is that it precludes clients that don't have that permission and don't even need to serialize the class from using the class at all. In such cases, it's better to provide custom serialization and demand the permission only on the deserialization constructor and GetObjectData, as shown in the code listing in Figure 12. You can use the SerializationUtil helper class to automate the implementation of the custom serialization.

Figure 12 Get Permission

[Serializable] public class MyClass : ISerializable { public MyClass() {...} [SecurityPermission(SecurityAction.LinkDemand, Flags = SecurityPermissionFlag.SerializationFormatter)] public void GetObjectData(SerializationInfo info, StreamingContext context) {...} [SecurityPermission(SecurityAction.LinkDemand, Flags = SecurityPermissionFlag.SerializationFormatter)] protected MyClass(SerializationInfo info,StreamingContext context) {...} }

If all you need are the standard .NET Framework formatters, there is a different solution altogether to the malicious serialization client. Use the StrongNameIdentityPermission attribute to demand at link time that only Microsoft-provided assemblies serialize and deserialize your class, as shown in Figure 13.

Figure 13 Restricting Serialization

public static class PublicKeys { public const string Microsoft = "0024000004 ... 44D5AD293"; } [Serializable] public class MyClass : ISerializable { public MyClass() {} [StrongNameIdentityPermission(SecurityAction.LinkDemand, PublicKey = PublicKeys.Microsoft)] public void GetObjectData(SerializationInfo info, StreamingContext context) {...} [StrongNameIdentityPermission(SecurityAction.LinkDemand, PublicKey = PublicKeys.Microsoft)] protected MyClass(SerializationInfo info,StreamingContext context) {...} }

A drawback to this approach is that now the class cannot be serialized using custom formatters by even benign clients with full trust because those clients will not be signed with the Microsoft strong name. Adding another link-time demand for the client's strong name will actually guarantee that no one could use the class, since no assembly could ever be signed with both strong names. To address this, the .NET Framework 2.0 introduces the demand-choice option; when combined with a link-time demand, it instructs the just-in-time (JIT) compiler that the linking client need satisfy only one of the demanded permissions. If you want to allow either Microsoft or clients with the serialization formatter permission to serialize your class, use a link-time demand-choice on both permissions, as shown in Figure 14.

Figure 14 Link-Time Permission Demand-Choice

[StrongNameIdentityPermission(SecurityAction.LinkDemandChoice, PublicKey = PublicKeys.Microsoft)] [SecurityPermission(SecurityAction.LinkDemandChoice, Flags = SecurityPermissionFlag.SerializationFormatter)] public void GetObjectData(SerializationInfo info, StreamingContext context) {...} [StrongNameIdentityPermission(SecurityAction.LinkDemandChoice, PublicKey = PublicKeys.Microsoft)] [SecurityPermission(SecurityAction.LinkDemandChoice, Flags = SecurityPermissionFlag.SerializationFormatter)] protected MyClass(SerializationInfo info,StreamingContext context) {...}

Conclusion

On the surface, serialization is a seemingly simple problem to solve, while in fact it poses a series of interdisciplinary challenges, from versioning to generics and security. Although .NET does a great job for most simple serialization cases, you may have to improve on the basic infrastructure for enterprise applications. Luckily, the extensibility of the .NET serialization system makes this possible. The techniques discussed here should put you on the road to more advanced serialization problems. Furthermore, the next generation of serialization (as part of "Indigo, " code name for the communications infrastructure of the next version of Windows®, known as "Longhorn") will offer a compatible programming model that will support many of the programming models and techniques presented in this article.

Juval Lowyis a software architect providing .NET architecture consultation and advanced training. He is also the Microsoft Regional Director for the Silicon Valley. This article contains excerpts from his upcoming book, Programming .NET Components 2nd Edition (O'Reilly). Contact Juval at https://www.idesign.net.