Troubleshooting Common Problems with the XmlSerializer

 

Christoph Schittko

May 2004

Applies to:
   Microsoft® Visual Studio® .NET

Summary: Christoph Schittko discusses various techniques for diagnosing common problems that occur when converting XML to objects and vice versa with XML Serialization technology in the .NET Framework. (13 printed pages)

Contents

Introduction
The Inner Workings of the XmlSerializer
Serialization Errors
Declaring Serialization Types
Problems Deserializing XML
Exceptions from the Constructor
Conclusion
Acknowledgements

Introduction

The XmlSerializer in the .NET Framework is a great tool to map strongly structured XML data to .NET objects. The XmlSerializer performs the transformations between XML documents and objects in your program with a single API call. The mapping rules for the transformation are expressed in the .NET classes via metadata attributes. This programming model comes with its own class of errors that developers need to learn how to diagnose. For example, the metadata attributes have to describe all variations of an XML format that a serializer can process. This article examines the various errors that can occur when building XML based solutions with the XmlSerializer, and discusses techniques and tools to diagnose them.

The Inner Workings of the XmlSerializer

It is important to understand what is going on under the covers of the very simple interface of XmlSerializer in order to effectively troubleshoot problems arising from XML serialization. In contrast to traditional parsing paradigms, the XmlSerializer from the System.Xml.Serialization namespace in the .NET Framework binds XML documents to instances of .NET classes. Instead of writing DOM or SAX parsing code, programmers declaratively set up binding rules by attaching .NET metadata attributes directly in the classes. Since all the parsing rules are expressed through the attributes, interface of the XmlSerializer is very simple. It consists primarily of two methods, Serialize() to produce XML from an object instance, and Deserialize() to parse an XML document into an object graph.

This approach works very well in situations with strongly typed, rigidly structured XML formats that map well to programming objects. If a format is defined by a W3C Schema that consists of complexTypes without mixed content or an excessive use wildcards (xs:any and xs;anyAttribute), then XML serialization is a good approach to process that data.

Message oriented applications are a very good example in which the format of the exchange between applications is defined up front. Because many message driven enterprise applications have very high throughput requirements, the Serialize() and Deserialize() methods are designed to be very fast. In fact, the XmlSerializer is what powers the highly scalable libraries in the System.Messaging namespace, ASP.NET Web services and BizTalk Server 2004.

The trade-off for the high performance of the XmlSerializer is two-fold. The first is flexibility with regards to the XML formats a given XmlSerializer can process, and the second is a rather processing intensive instance construction.

When you instantiate an XmlSerializer you have to pass the Type of the objects that you will attempt to serialize and deserialize with that serializer instance. The serializer examines all public fields and properties of the Type to learn about which types an instance references at runtime. It then proceeds to create C# code for a set of classes to handle serialization and deserialization using the classes in the System.CodeDOM namespace. During this process, the XmlSerializer checks the reflected type for XML serialization attributes to customize the created classes to the XML format definition. These classes are then compiled into a temporary assembly and called by the Serialize() and Deserialize() methods to perform the XML to object conversions.

This elaborate process to set up the XmlSerializer and the declarative programming model result in three classes of errors, some of which can be complicated to troubleshoot:

  • The generated serialization classes expect the objects serialized to fully conform to the type structure defined by the metadata attributes. An object will fail to serialize if the XmlSerializer encounters any types that were not declared, either explicitly or via an XML serialization attribute.
  • An XML document fails to deserialize if its root element does not map an object type; when the document is not well formed, such as if it contains characters illegal according to the XML Specification; and in some cases if the document violates restrictions of the underlying schema.
  • Finally, the creation of the serialization classes and their subsequent compilation may fail for a number of different reasons. The creation of the classes can fail when the type passed to the constructor or a type that is referenced by that type implements an unsupported interface or does not satisfy the limitations imposed by the XmlSerializer.
    The compilation step can fail when the attached attributes produce C# code that cannot be compiled, or also due to security related reasons.

The following sections will examine these cases in more depth and offer guidance and suggestions on how to solve them.

Serialization Errors

The first class of errors we examine occurs in the Serialize() method. It occurs when the types in the object graph that are passed to the method runtime do not match the types that were declared in the class at design-time. You can declare types either implicitly, via the type definition of the field or property, or explicitly through attaching a serialization attribute.

Figure 1. Type declarations in the object graph

It is important to note here that relying on inheritance is not sufficient. Developers must declare derived types to the XmlSerializer, either by attaching XmlInclude attributes to the base class or by attaching XmlElement attributes to the fields that can hold objects of types derived from the declared type.

Take a look at this class hierarchy for an example:

public class Base
{
   public string Field;
}

public class Derived
{
  public string AnotherField;
}

public class Container
{
  public Base MyField;
}

If you relied on inheritance and wrote serialization code like this:

Container obj = new Container();
obj.MyField = new Derived(); // legal assignment in the 
                             //.NET type system

// ...
XmlSerializer serializer = new XmlSerializer( typeof( Container ) );
serializer.Serialize( writer, obj ); // Kaboom!

you would get an exception from the Serialize() method because there was no explicit type declaration for the XmlSerializer.

Exceptions from the XmlSerializer

Diagnosing the source of these problems can be tricky at first, because the exceptions from the XmlSerializer do not seem to provide a lot of information about the cause of their occurance; at least, they do not provide the information in a spot where developers typically would look.

In most cases, Serialize, Deserialize and even the XmlSerializer constructor throw a rather generic System.InvalidOperationException when an error occurs. This exception type can occur in many places in the .NET Framework; it is not specific to the XmlSerializer at all. To make matters worse, the exception's Message property only yields very generic information, as well. In the example above, the Serialize() method would throw an exception with the following message:

There was an error generating the XML document.

This message is annoying at best, because you already figured that much when you saw that the XmlSerializer threw an exception. Now you have to find that the exception's Message doesn't help you troubleshoot the problem.

The odd exception message and the non-descriptive exception type reflect the inner workings of the XmlSerializer I introduced earlier in this article. The Serialize() method catches all exceptions thrown in the serialization classes, wraps them in an InvalidOperationException, and throws that up the strack.

Reading the Exception Message

The trick to get to the "real" exception information is to examine the exception's InnerException property. The InnerException references the actual exception thrown from within the serialization classes. It contains very detailed information about the problem and where it occurred. The Exception you would catch running the example above would contain an InnerException with this Message:

The type Derived was not expected. Use the XmlInclude or SoapInclude 
attribute to specify types that are not known statically.

You can get to this message either by examining the InnerException directly, or by calling the exception's ToString() method. The following code snippet demonstrates an exception handler writing out the information in all exceptions that occurred while deserializing an object:

public void SerializeContainer( XmlWriter writer, Container obj )
{
  try
  {
    // Make sure even the construsctor runs inside a
    // try-catch block
    XmlSerializer ser = new XmlSerializer( typeof(Container));
    ser.Serialize( writer, obj );
  }
  catch( Exception ex )               
  {                                   
    DumpException( ex );             
  }                                   
}
public static void DumpException( Exception ex )
{
  Console.WriteLine( "--------- Outer Exception Data ---------" );        
  WriteExceptionInfo( ex );
  ex = ex.InnerException;                     
  if( null != ex )               
  {                                   
    Console.WriteLine( "--------- Inner Exception Data ---------" );                
    WriteExceptionInfo( ex.InnerException );    
    ex = ex.InnerException;
  }
}
public static void WriteExceptionInfo( Exception ex )
{
  Console.WriteLine( "Message: {0}", ex.Message );                  
  Console.WriteLine( "Exception Type: {0}", ex.GetType().FullName );
  Console.WriteLine( "Source: {0}", ex.Source );                    
  Console.WriteLine( "StrackTrace: {0}", ex.StackTrace );           
  Console.WriteLine( "TargetSite: {0}", ex.TargetSite );            
}

Declaring Serialization Types

To fix the problem in the example above you just need to read the InnerException's message and implement the suggested solution. A field in the object graph you passed to the Serialize method referenced an object of type Derived, but the field was not declared to serialize objects of the Derived type. Even though the object graph was perfectly legal within the .NET type system, the constructor of the XmlSerializer did not know to create serialization code for objects of type Derived when it traversed the fields of the container type because it did not find any reference to the Derived type.

To declare additional types for fields and properties to the XmlSerializer, you do have several options. You can declare derived types on their base class through the XmlInclude attribute (as suggested by the exception message) like this:

[System.Xml.Serialization.XmlInclude( typeof( Derived ) )]
public class Base
{
    // ...
}

Attaching the XmlInclude attribute allows the XmlSerializer to serialize fields referencing objects of Derived type when a field or property is defined as type Base.

Alternatively, you can declare valid types only on a single field or property, instead of declaring derived types at the base class. You can attach XmlElement, XmlAttribute, or XmlArrayItem attributes to a field and declare the types that the field or property can reference. Then the constructor of the XmlSerializer will add the code required to serialize and deserialize those types to the serialization classes.

Reading the StackTrace

The Message property of the InnerException is not the only property that carries valuable information. The StackTrace property conveys more details about the source of the error. At the very top of the stack trace you find the name of the method where the exception originated. The method names in the temporary assemblies follow the pattern Write<n>_<ClassName> for serialization classes, and Read<n>_<ElementName> for deserialization classes. In the example with the bad namespace above, you would see the exception originating in a method named Read1_MyClass. Later on, I will show how you can even use the Visual Studio debugger to set a breakpoint and single-step through this method. First, however, let's look at common issues around deserializing an XML document.

Problems Deserializing XML

Deserializing an XML document into an object graph is less error prone than serializing an object graph to XML. The XmlSerializer is very sensitive when the objects don't closely match the type definition, but it is very forgiving if a deserialized XML document doesn't closely match up with the object. Instead of throwing exceptions for XML elements that do not correspond to a field or property in the deserialized object, the XmlSerializer simply raises events. You can register handlers for those events if you need to keep track of how closely the XML documents you deserialize match the XML format. You do not need to register an event handler with the XmlSerializer, however, to properly deal with unmapped XML nodes.

Only a few error conditions lead to exceptions during the deserialization process. The most common ones are:

  • The name of the root element or its namespace did not match the expected name.
  • An enumerated data type presented an undefined value.
  • The document contained illegal XML.

Just like in the case of serialization, the Deserialize() method throws an InvalidOperation exception with the Message

There is an error in XML document (<line>, <column>).

whenever a problem occurs. This exception typically contains the real exception in the InnerException property. The type of the InnerException varies according to the actual error that occurred while reading the XML document. If the serializer cannot match up the root element of the document with the type passed to the constructor, a type specified via an XmlInclude attribute, or a type that was specified in the Type[] passed to one of the more sophisticated overloads of the XmlSerializer constructor, then the InnerException is an InvalidCastException. Keep in mind that the XmlSerializer is looking at the Qname, i.e. the name of the element and the namespace to determine the class into which to deserialize the document. Both have to match the declaration in the .NET class for the XmlSerializer to properly identify the type that is corresponding to the root element of the document.

Let's look at an example:

[XmlRoot( Namespace="urn:my-namespace" )]
public class MyClass
{
  public string MyField;
}

Deserializing the following XML document will cause an exception,then, because the XML namespace of the MyClass element is not urn:my-namespace, as it is declared through the XmlRoot attribute on the .NET class:

<MyClass>
  <MyField>Hello, World</MyField>
</MyClass>

Let's take a closer look at the exception. The exception Message was more descriptive than the message you catch from the Serialize() method; at least it is referencing the position in the document that caused Deserialize() to fail. When you are processing large XML documents, though, it may not be all that easy to look at the document and determine the error. Again, the InnerException provides better information. This time it says:

<MyClass xmlns=''> was not expected.

The message is still somewhat ambiguous, but it does point you to the element that is causing the problem. You can go back and closely examine the MyClass class and compare the element name and the XML namespace to the XML serialization attributes in the .NET class.

Deserializing Invalid XML

Another frequently reported problem is the failure to deserialize invalid XML documents. The XML specification forbids the use of certain control characters in an XML document. Nevertheless, sometimes you receive XML documents containing these characters anyway. The problem manifests itself in a—you guessed it—InvalidOperationException. In this particular case, though, the InnerException is of type XmlException. The InnerException's message is to the point:

hexadecimal value <value>, is an invalid character

You can avoid this problem if you deserialize with an XmlTextReader that has its Normalization property set to false. Unfortunately, the XmlTextReader used under the covers by ASP.NET Web services has its Normalization property set to true; i.e., it will not deserialize SOAP messages containing these invalid characters.

Exceptions from the Constructor

The last class of problems this article discusses occurs when the constructor of the XmlSerializer reflects over the passed in type. Remember, the constructor recursively examines each public field and property in the type hierarchy to create classes that handle serialization and deserialization. It then compiles the classes on the fly and loads the resulting assembly.

There are quite a number of different problems that can occur during this complicated process:

  • Declared types for the root, or types references by a property or a field, don't provide a default constructor.
  • A type in the hierarchy implements the collection interface Idictionary.
  • Executing a constructor or a property accessor of a type in the object graph requires elevated security privileges.
  • The code for the generated serialization classes does not compile.

Trying to pass a non-serializable type to the XmlSerializer constructor also results in an InvalidOperationException, but this time the exception does not wrap another exception. The Message property contains a good explanation about why the constructor rejected the passed in Type. Trying to serialize an instance of a class that does not implement a constructor without parameters (default constructor) results in an exception with the Message:

Test.NonSerializable cannot be serialized because it does not have a default public constructor.

Troubleshooting compilation errors on the other hand is very complicated. These problems manifest themselves in a FileNotFoundException with the message:

File or assembly name abcdef.dll, or one of its dependencies, was not found. File name: "abcdef.dll"
   at System.Reflection.Assembly.nLoad( ... )
   at System.Reflection.Assembly.InternalLoad( ... )
   at System.Reflection.Assembly.Load(...)
   at System.CodeDom.Compiler.CompilerResults.get_CompiledAssembly() 
    ....

You may wonder what a file not found exception has to do with instantiating a serializer object, but remember: the constructor writes C# files and tries to compile them. The call stack of this exception provides some good information to support that suspicion. The exception occurred while the XmlSerializer attempted to load an assembly generated by CodeDOM calling the System.Reflection.Assembly.Load method. The exception does not provide an explanation as to why the assembly that the XmlSerializer was supposed to create was not present. In general, the assembly is not present because the compilation failed, which may happen because, under rare circumstances, the serialization attributes produce code that the C# compiler fails to compile.

Note   This error also occurs when the XmlSerializer runs under an account or a security environment that is not able to access the temp directory.

The actual compilation errors are not part of any exception error message thrown by the XmlSerializer, not even an InnerException. This made it very difficult to troubleshoot these exceptions until Chris Sells published his XmlSerializerPrecompiler tool.

The XmlSerializerPreCompiler

The XmlSerializer PreCompiler is a command-line program that performs the same steps as the constructor of the XmlSerializer. It reflects over a type, generates serialization classes, and compiles them—and because it was purely designed to be a troubleshooting tool, it's safe for the tool to write any compilation errors to the console.

The tool is very easy to use. You simply point the tool at the assembly that contains the type that causes the exception, and specify which type to pre-compile. Let's look at an example. One problem that's reported regularly occurs when you attach an XmlElement or and XmlArrayItem attribute to a field that's defined as a jagged array, as in the example below:

namespace Test
{
  public class StringArray
  {
    [XmlElement( "arrayElement", typeof( string ) )]
    public string [][] strings;
  }
}

The XmlSerializer constructor throws the FileNotFoundException when you instantiate an XmlSerializer object for the type Test.StringArray. If you compile the class and try to serialize an instance of it you will get the FileNotFoundException, but no clues about the real nature of the problem. The XmlSerializerPreCompiler can give you the missing information. In my example, the StringArray class is compiled into an assembly named XmlSer.exe, and I have to run the tool with the following command-line:

XmlSerializerPreCompiler.exe XmlSer.exe Test.StringArray

The first command-line parameter specifies the assembly, and the second parameter defines what class in the assembly to pre-compile. The tool writes quite a bit of information to the command window.

Figure 2. XmlSerializerPreCompiler command window output

The important lines to look at are the lines with the compile errors and two lines that read something like:

XmlSerializer-produced source:
C:\DOCUME~1\<user>\LOCALS~1\Temp\<random name>.cs

Now the XmlSerializerPreCompiler gave us the compilation errors and the location of the source file with the code that does not compile.

Debugging Serialization Code

Under normal circumstances, the XmlSerializer deletes the C# source files for the serialization classes when they are no longer needed. There is an undocumented diagnostics switch, however, which will instruct the XmlSerializer deletes to leave these files on your disk. You can set the switch in your application's .config file:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <system.diagnostics>
        <switches>
            <add name="XmlSerialization.Compilation" value="4" />
        </switches>
    </system.diagnostics>
</configuration>

With this switch present in the .config file, the C# source files stay in your temp directory. If you are working on a computer running Windows 2000 or later, the default location for the temp directory is <System Drive>\Documents and Settings\<Your User Name>\LocalSettings\Temp or <Windows Directory>\Temp, for web applications running under the ASPNET account. The C# files are easy to miss because they have very odd looking, randomly generated filenames, something like: bdz6lq-t.0.cs. The XmlSerializerPreCompiler sets this diagnostics switch, so you can open the files to inspect the lines on which the XmlSerializerPreCompiler reported compilation errors in Notepad or Visual Studio.

You can even step through those temporary serialization classes, because the diagnostics switch also leaves .pdb files with the debugging symbols on your disk. If you need to set a breakpoint in a serialization class, then you can run your application under the Visual Studio debugger. Once you see messages in the output window that your application loaded, assemblies with these odd looking names from the temp directory, then open the C# files with the corresponding name and set breakpoints just like you can in your own code.

Figure 3. Compiliation error output from the diagnostics switch

Once you set your breakpoint in a serialization class, you need to execute code that calls the Serialize() or the Deserialize() method on an XmlSerializer object.

Note   You can only debug serialization and deserialization, but not the code generation process that runs in the constructor.

Stepping through the serialization class, you are able to pinpoint every serialization problem. You can use that trick if you want to single-step the deserialization of a SOAP message, since ASP.NET Web services and Web service proxies are built on top of the XmlSerializer. Simply add the diagnostics switch to your config file and set a breakpoint in the class that deserializes the message. I use that technique once in a while, to figure out the correct set of serialization attributes if the WSDL didn't accurately reflect message format when the proxy class was generated.

Conclusion

These tips should help you diagnose serialization problems with the XmlSerializer. Most problems you encounter stem either from bad combinations of the XML serialization attributes or from XML that doesn't match the type being deserialized. The serialization attributes control the generation of the code for the serialization classes, and can lead to compilation errors or runtime exceptions. Inspecting the exceptions thrown by the XmlSerializer closely will help you identifying the source of runtime exceptions. If you need to dig deeper to diagnose a problem, then XmlSerializerPreCompiler tool assists you in finding compilation errors. If neither approach leads you to the root cause of the problem, you can inspect the code for the automatically created serialization classes and step through them in the debugger.

Acknowledgements

I would like to thank Dare Obasanjo and Daniel Cazzulino for their feedback and editorial suggestions with this article.