Special Character Conversion when Writing XML Content

The XmlWriter includes a method, WriteRaw, which allows you to write out raw markup manually. This method prevents special characters from being escaped. This is in contrast to the WriteString method, which escapes some strings to their equivalent entity reference. The characters that are escaped are given in the XML 1.0 recommendation in section 2.4 Character Data and Markup, and section 3.3.3 Attribute-Value Normalization of the Extensible Markup Language (XML) 1.0 (fourth edition) recommendation. If the WriteString method is called when writing an attribute value, it escapes ' and ". Character values 0x-0x1F are encoded as numeric character entities � through , except for the white space characters 0x9, 0x10, and 0x13.

Therefore, the guiding principle of when to use WriteString or WritingRaw is that WriteString is used when you need to walk through every character looking for entity characters, and WriteRaw writes exactly what it is given.

The WriteNode method copies everything from the current node, and the reader is positioned at the writer. The reader is then advanced to the next sibling node for further processing. The WriteNode method is a quick way to extract information from one document into another.

The following table shows the supported NodeTypes for the WriteNode method.

Node type

Description

Element

Writes out the element node and all attribute nodes.

Attribute

No operation. Use WriteStartAttribute or WriteAttributeString to write the attribute.

Text

Writes out the text node.

CDATA

Writes out the CDATA section node.

EntityReference

Writes the Entity Ref node.

ProcessingInstruction

Writes the PI node.

Comment

Writes the Comment node.

DocumentType

Writes the DocType Node.

Whitespace

Writes the Whitespace node.

SignificantWhitespace

Writes out the Whitespace node.

EndElement

No operation.

EndEntity

No operation.

The following example shows the difference between the WriteString and WriteRaw method when given the "<" character. This code sample uses WriteString.

w.WriteStartElement("myRoot")
w.WriteString("<")
w.WriteEndElement()
Dim tw As New XmlTextWriter(Console.Out)
tw.WriteDocType(name, pubid, sysid, subset)
w.WriteStartElement("myRoot");
w.WriteString("<");
w.WriteEndElement();
XmlTextWriter tw = new XmlTextWriter(Console.Out);
tw.WriteDocType(name, pubid, sysid, subset);

Output

<myRoot>&lt;</myRoot>

This code sample uses WriteRaw, and the output has an illegal character as the element content.

w.WriteStartElement("myRoot")
w.WriteRaw("<")
w.WriteEndElement()
w.WriteStartElement("myRoot");
w.WriteRaw("<");
w.WriteEndElement();

Output

<myRoot><</myRoot>

The following example shows how to convert an XML document from an element-centric document to an attribute-centric document. You can also convert an XML attribute-centric document back to an element-centric document. An element-centric mode means the XML document was designed to have many elements but few attributes. An attribute-centric design has fewer elements, and what would be elements in an element-centric design have been made attributes of elements. So there are fewer elements, but more attributes per element.

This sample is useful if you have designed the XML data in either mode, as this will allow it to be converted to the other mode.

The following XML is using an element-centric document. The elements contain no attributes.

Input - centric.xml

<?xml version='1.0' encoding='UTF-8'?>
<root>
    <Customer>
        <firstname>Jerry</firstname>
        <lastname>Larson</lastname>
        <Order>
        <OrderID>Ord-12345</OrderID>
          <OrderDetail>
            <Quantity>1301</Quantity>
            <UnitPrice>$3000</UnitPrice>
            <ProductName>Computer</ProductName>
          </OrderDetail>
        </Order>
    </Customer>
</root>

The following sample application does the conversion.

' The program will convert an element-centric document to an 
' attribute-centric document or element-centric to attribute-centric.
Imports System
Imports System.Xml
Imports System.IO
Imports System.Text
Imports System.Collections

Class ModeConverter
   Private bufferSize As Integer = 2048
  
   Friend Class ElementNode
      Private _name As [String]
      Private _prefix As [String]
      Private _namespace As [String]
      Private _startElement As Boolean
      
      Friend Sub New()
         Me._name = Nothing
         Me._prefix = Nothing
         Me._namespace = Nothing
         Me._startElement = False
      End Sub 'New
      
      Friend Sub New(prefix As [String], name As [String], [nameSpace] As [String])
         Me._name = name
         Me._prefix = prefix
         Me._namespace = [nameSpace]
      End Sub 'New
      
      Public ReadOnly Property name() As [String]
         Get
            Return _name
         End Get
      End Property
      
      Public ReadOnly Property prefix() As [String]
         Get
            Return _prefix
         End Get
      End Property
      
      Public ReadOnly Property [nameSpace]() As [String]
         Get
            Return _namespace
         End Get
      End Property
      
      Public Property startElement() As Boolean
         Get
            Return _startElement
         End Get
         Set
            _startElement = value
         End Set
      End Property
   End Class 'ElementNode
   
   ' Entry point which delegates to C-style main Private Function.
   Public Overloads Shared Sub Main()
      Main(System.Environment.GetCommandLineArgs())
   End Sub
   
   Overloads Public Shared Sub Main(args() As [String])
      Dim modeConverter As New ModeConverter()
      If args(0) Is Nothing Or args(0) = "?" Or args.Length < 2 Then
         modeConverter.Usage()
         Return
      End If
      Dim sourceFile As New FileStream(args(1), FileMode.Open, FileAccess.Read, FileShare.Read)
      Dim targetFile As New FileStream(args(2), FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite)
      If args(0) = "-a" Then
         modeConverter.ConertToAttributeCentric(sourceFile, targetFile)
      Else
         modeConverter.ConertToElementCentric(sourceFile, targetFile)
      End If
      Return
   End Sub 'Main
   
   Public Sub Usage()
      Console.WriteLine("? This help message " + ControlChars.Lf)
      Console.WriteLine("Convert -mode sourceFile, targetFile " + ControlChars.Lf)
      Console.WriteLine(ControlChars.Tab + " mode: e element centric" + ControlChars.Lf)
      Console.WriteLine(ControlChars.Tab + " mode: a attribute centric" + ControlChars.Lf)
   End Sub 'Usage
   
   Public Sub ConertToAttributeCentric(sourceFile As FileStream, targetFile As FileStream)
      ' Stack is used to track how many.
      Dim stack As New Stack()
      Dim reader As New XmlTextReader(sourceFile)
      reader.Read()
      Dim writer As New XmlTextWriter(targetFile, reader.Encoding)
      writer.Formatting = Formatting.Indented
      
      Do
         Select Case reader.NodeType
            Case XmlNodeType.XmlDeclaration
               writer.WriteStartDocument((Nothing = reader.GetAttribute("standalone") Or "yes" = reader.GetAttribute("standalone")))
            
            Case XmlNodeType.Element
               Dim element As New ElementNode(reader.Prefix, reader.LocalName, reader.NamespaceURI)
               
               If 0 = stack.Count Then
                  writer.WriteStartElement(element.prefix, element.name, element.nameSpace)
                  element.startElement = True
               End If
               
               stack.Push(element)
            
            Case XmlNodeType.Attribute
                  Throw New Exception("We should never been here!")
            
            Case XmlNodeType.Text
               Dim attribute As New ElementNode()
               attribute = CType(stack.Pop(), ElementNode)
               element = CType(stack.Peek(), ElementNode)
               If Not element.startElement Then
                  writer.WriteStartElement(element.prefix, element.name, element.nameSpace)
                  element.startElement = True
               End If
               writer.WriteStartAttribute(attribute.prefix, attribute.name, attribute.nameSpace)
               writer.WriteRaw(reader.Value)
               reader.Read() 'jump over the EndElement
            
            Case XmlNodeType.EndElement
               writer.WriteEndElement()
               stack.Pop()
            
            
            Case XmlNodeType.CDATA
               writer.WriteCData(reader.Value)
            
            Case XmlNodeType.Comment
               writer.WriteComment(reader.Value)
            
            Case XmlNodeType.ProcessingInstruction
               writer.WriteProcessingInstruction(reader.Name, reader.Value)
            
            Case XmlNodeType.EntityReference
               writer.WriteEntityRef(reader.Name)
            
            Case XmlNodeType.Whitespace
                writer.WriteWhitespace(reader.Value);
            
            Case XmlNodeType.None
               writer.WriteRaw(reader.Value)
            
            Case XmlNodeType.SignificantWhitespace
               writer.WriteWhitespace(reader.Value)
            
            Case XmlNodeType.DocumentType
               writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value)
            
            Case XmlNodeType.EndEntity
            
            
            Case Else
               Console.WriteLine(("UNKNOWN Node Type = " + CInt(reader.NodeType)))
         End Select
      Loop While reader.Read()
      
      writer.WriteEndDocument()
      
      reader.Close()
      writer.Flush()
      writer.Close()
   End Sub 'ConertToAttributeCentric
   
   
   ' Use the WriteNode to simplify the process.
   Public Sub ConertToElementCentric(sourceFile As FileStream, targetFile As FileStream)
      Dim reader As New XmlTextReader(sourceFile)
      reader.Read()
      Dim writer As New XmlTextWriter(targetFile, reader.Encoding)
      writer.Formatting = Formatting.Indented
      Do
         Select Case reader.NodeType
            
            Case XmlNodeType.Element
               writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI)
               If reader.MoveToFirstAttribute() Then
                  Do
                     writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI)
                     writer.WriteRaw(reader.Value)
                     writer.WriteEndElement()
                  Loop While reader.MoveToNextAttribute() 
                  
                  writer.WriteEndElement()
               End If
            
            Case XmlNodeType.Attribute
                  Throw New Exception("We should never been here!")
            
            Case XmlNodeType.Whitespace
               writer.WriteWhitespace(reader.Value)
            
            Case XmlNodeType.EndElement
               writer.WriteEndElement()
            
            Case XmlNodeType.Text
                  Throw New Exception("The input document is not a attribute centric document" + ControlChars.Lf)
            
            Case Else
               Console.WriteLine(reader.NodeType)
               writer.WriteNode(reader, False)
         End Select
      Loop While reader.Read()
      
      reader.Close()
      writer.Flush()
      writer.Close()
   End Sub 'ConertToElementCentric
End Class 'ModeConverter
// The program will convert an element-centric document to an 
// attribute-centric document or element-centric to attribute-centric.
using System;
using System.Xml;
using System.IO;
using System.Text;
using System.Collections;

class ModeConverter {
    private const int bufferSize=2048;

    internal class ElementNode {
        String _name;
        String _prefix;
        String _namespace;
        bool   _startElement;
        internal ElementNode() {
            this._name = null;
            this._prefix = null;
            this._namespace = null;
            this._startElement = false;
        }
        internal ElementNode(String prefix, String name, String nameSpace) {
            this._name = name;
            this._prefix = prefix;
            this._namespace = nameSpace;
        }
        public String name{
            get { return _name;   }
        }
        public String prefix{
            get { return _prefix;   }
        }
        public String nameSpace{
            get { return _namespace;   }
        }
        public bool startElement{
            get { return _startElement;   }
            set { _startElement = value;}
        }
    }
    public static void Main(String[] args) {
        ModeConverter modeConverter = new ModeConverter();
        if (args[0]== null || args[0]== "?" || args.Length < 2 ) {
            modeConverter.Usage();
            return;
        }
        FileStream sourceFile = new FileStream(args[1], FileMode.Open, FileAccess.Read, FileShare.Read);
        FileStream targetFile = new FileStream(args[2], FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
        if (args[0] == "-a") {
            modeConverter.ConertToAttributeCentric(sourceFile, targetFile);
        } else {
            modeConverter.ConertToElementCentric(sourceFile, targetFile);
        }
        return;
    }
    public void Usage() {
        Console.WriteLine("? This help message \n");
        Console.WriteLine("Convert -mode sourceFile, targetFile \n");
        Console.WriteLine("\t mode: e element centric\n");
        Console.WriteLine("\t mode: a attribute centric\n");
    }
    public void ConertToAttributeCentric(FileStream sourceFile, FileStream targetFile) {
        // Stack is used to track how many.
        Stack stack = new Stack();
        XmlTextReader reader = new XmlTextReader(sourceFile);
        reader.Read();
        XmlTextWriter writer = new XmlTextWriter(targetFile, reader.Encoding);
        writer.Formatting = Formatting.Indented;

        do {
            switch (reader.NodeType) {
                case XmlNodeType.XmlDeclaration:
                    writer.WriteStartDocument(null == reader.GetAttribute("standalone") || "yes" == reader.GetAttribute("standalone"));
                    break;

                case XmlNodeType.Element:
                    ElementNode element = new ElementNode(reader.Prefix, reader.LocalName, reader.NamespaceURI);

                    if (0 == stack.Count) {
                        writer.WriteStartElement(element.prefix, element.name, element.nameSpace);
                        element.startElement=true;
                    }

                    stack.Push(element);
                    break;

                case XmlNodeType.Attribute:
                    throw new Exception("We should never been here!");

                case XmlNodeType.Text:
                    ElementNode attribute = new ElementNode();
                    attribute = (ElementNode)stack.Pop();
                    element = (ElementNode)stack.Peek();
                    if (!element.startElement) {
                        writer.WriteStartElement(element.prefix, element.name, element.nameSpace);
                        element.startElement=true;
                    }
                    writer.WriteStartAttribute(attribute.prefix, attribute.name, attribute.nameSpace);
                    writer.WriteRaw(reader.Value);
                    reader.Read(); //jump over the EndElement
                    break;

                case XmlNodeType.EndElement:
                    writer.WriteEndElement();
                    stack.Pop();

                    break;

                case XmlNodeType.CDATA:
                    writer.WriteCData(reader.Value);
                    break;

                case XmlNodeType.Comment:
                    writer.WriteComment(reader.Value);
                    break;

                case XmlNodeType.ProcessingInstruction:
                    writer.WriteProcessingInstruction(reader.Name, reader.Value);
                    break;

                case XmlNodeType.EntityReference:
                    writer.WriteEntityRef( reader.Name);
                    break;

                case XmlNodeType.Whitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.None:
                    writer.WriteRaw(reader.Value);
                    break;

                case XmlNodeType.SignificantWhitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.DocumentType:
                    writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
                    break;

                case XmlNodeType.EndEntity:
                    break;


                default:
                    Console.WriteLine("UNKNOWN Node Type = " + ((int)reader.NodeType));
                    break;
             }
        } while (reader.Read());

        writer.WriteEndDocument();

        reader.Close();
        writer.Flush();
        writer.Close();
    }

    // Use the WriteNode to simplify the process.
    public void ConertToElementCentric(FileStream sourceFile, FileStream targetFile) {
        XmlTextReader reader = new XmlTextReader(sourceFile);
        reader.Read();
        XmlTextWriter writer = new XmlTextWriter(targetFile, reader.Encoding);
        writer.Formatting = Formatting.Indented;
        do {
            switch (reader.NodeType) {

                case XmlNodeType.Element:
                    writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                    if (reader.MoveToFirstAttribute()) {
                         do {
                            writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                            writer.WriteRaw(reader.Value);
                            writer.WriteEndElement();

                        } while(reader.MoveToNextAttribute());

                        writer.WriteEndElement();
                    }
                    break;

                case XmlNodeType.Attribute:
                     throw new Exception("We should never been here!");

                case XmlNodeType.Whitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.EndElement:
                    writer.WriteEndElement();
                    break;

                case XmlNodeType.Text:
                    throw new Exception("The input document is not a attribute centric document\n");

                default:
                    Console.WriteLine(reader.NodeType);
                    writer.WriteNode(reader, false);
                    break;
             }
        } while (reader.Read());

        reader.Close();
        writer.Flush();
        writer.Close();
    }
}

After the code is compiled, run it from the command line by typing in <compiled name> -a centric.xml <output file name>. The output file must exist and can be an empty text file.

For the following output, assuming the C# program was compiled to centric_cs, the command line is C:\centric_cs -a centric.xml centric_out.xml.

The mode of -a tells the application to convert the input XML to attribute-centric, while a mode of -e will change it to element-centric. The output below is the new attribute-centric output generated using the -a mode. The elements now contain attributes instead of nested elements.

Output: centric_out.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>

<root>
<Customer firstname="Jerry" lastname="Larson">
 <Order OrderID="Ord-12345">
  <OrderDetail Quantity="1301" UnitPrice="$3000" ProductName="Computer" />
 </Order>
</Customer>
</root>

See Also

Reference

XmlTextWriter

XmlTextWriter

XmlWriter

XmlWriter

Concepts

Well-Formed XML Creation with the XmlTextWriter

XML Output Formatting with XmlTextWriter

Namespace Features within the XmlTextWriter