The XML Diff and Patch GUI Tool

 

Amol Kher
Microsoft Corporation

July 2004

Applies to:
   the XML Diff and Patch GUI tool

Summary: This article shows how to use the XmlDiff class to compare two XML files and show these differences as an HTML document in a .NET Framework 1.1 application. The article also shows how to build a WinForms application for comparing XML files.

Click here to download the code sample for this article.

Contents

Introduction
An Overview of the XML Diff and Patch API
XML Diff and Patch Meets Winforms
Working with XML DiffGrams
Other Features of the XML Diff and Patch Tool

Introduction

There is no good command line tool that can be used to compare two XML files and view the differences. There is an online tool called XML Diff and Patch that's available on the GotDotNet website under the XML Tools section. For those who have not, you can find it at Microsoft XML Diff and Patch 1.0. It is a very convenient tool for those who want to compare the difference between two XML files. Comparing XML files is different from comparing regular text files because one wants to compare logical differences in the XML nodes not just differences in text. For example one may want to compare XML documents and ignore white space between elements, comments or processing instructions. The XML Diff and Patch tool allows one to perform such comparisons but it is primarily available as an online web application. We cannot take this tool and use it from command line.

This article focuses on developing a command-line tool by reusing code from the XML Diff and Patch installation and samples. The tool works very similar to the WinDiff utility; it presents the differences in a separate window and highlights them.

The XML Diff and Patch tool contains a library that contains an XmlDiff class, which can be used to compare two XML documents. The Compare method on this class takes two files and either returns true, if the files are equal, or generates an output file called an XML diffgram containing a list of differences between the files. The XmlDiff class can be supplied an options class XmlDiffOptions that can be used to set the various options for comparing files.

An Overview of the XML Diff and Patch API

The XmlDiff class implements a Compare method.

XmlDiff.Compare( XmlReader original, XmlReader compareWith,
           bool isFragment, XmlTextWriter diffOutput).

This method is the one we use, though there are other overloads that take the filepath directly. The XmlDiffOptions enumeration has all the Ignore{*} options. You can set this enumeration on the XmlDiff class using the Options property.

XmlDiff.Options = XmlDiffOptions.IgnorePI | XmlDiffOptions.IgnoreChildOrder;

So much for a quick primer! I think we are ready to understand our simple app. To understand this article you should have an idea of the XmlDiff class and XmlDiffOptions

XML Diff and Patch Meets Winforms

We built a small Windows application, which comprises two forms. One form prompts the user to specify two files, and the other form hosts an Internet Explorer control, which displays the highlighted differences side-by-side between the two files, similar to any other file compare tool we know.

The UI design is kept very simple and hence usable, since that's not the part we are focusing in this article. You can always download the code and make it more usable for yourself. The idea of this article is to demo the XmlDiff code and show the differences in a nice IE control. Figure 1 shows what our main screen looks like.

Figure 1. The main screen

The File menu has an Exit command.

The Diff Options menu allows the user to select the options that will directly be passed on to the Compare method, which uses the XmlDiffOptions enumeration. This keeps the utility simple and easy to understand. The following screen shot shows the options available. These are directly mapped to the XmlDiffOptions object.

Figure 2. Available options

  • Here's a quick primer on what some of the options look like and what they mean. For more detailed information visit the Diff and Patch Overview.
  • Ignore Processing instructions: Do not compare Processing instructions. Thus <a></a> and <a><?somepi?></a> are both considered equal.
  • Ignore white spaces (normalize text values): Do not compare white space. This means insignificant white space. White space marked by xml:space="preserve" will be compared. But white space after element tags or any such possibly insignificant white space will be ignored. Thus <root><a/></root> and <root>\n<a/>\n</root> are both equal.
  • Ignore prefixes: The prefixes of element and attribute names are not compared. When this option is selected then two names that have the same local name and namespace URI but with a different prefix are treated as the same names. The following two XML would be considered equal when this option is set. <a xmlns:ns1="ns"><ns1:child/></a> and <a xmlns:ns2="ns"><ns2:child/></a> are equal.
  • Ignore Namespaces: The namespace URIs of the element and attribute names are not compared. This option also implies that the name prefixes are ignored. When this option is selected then two names with the same local name but a different namespace URI and prefix are treated as the same names. Thus <a xmlns:ns1="ns1"><ns1:child/></a> and <a xmlns:ns2="ns2"><ns2:child/></a> are equal under this option.
  • Ignore Child Order: The order of child nodes of each element is ignored. When this option is selected then two nodes with the same value that differ only by their position among sibling child nodes are treated as the same nodes. Thus <a><b/><c/></a> and <a><c/><b/></a> are equal.

The following is the basic control flow of the application.

When the user clicks the Compare button the following actions take place.

  1. Both the input files are verified to exist, since they could have been entered by hand and hence the path may be wrong.
  2. The XmlDiffOptions enumeration is set using the values of the checked items on the Diff Options Menu drop-down. This is done using a SetDiffOptions method.
  3. DoCompare is called which compares two files.
  4. The two files are compared and the diffgram is written out to a temporary file (vxd.out). This file is used to figure out the differences.
  5. The samples code we mentioned earlier is called to figure out the differences. This code takes the original file and the diffgram file as inputs and generates the output, which consists of rows (HTML encoded) that show the side by side differences of the two files compared.
  6. HTML is written out to a temporary file and displayed in the IE Control in a separate window. This HTML shows the Diff in the desired manner.

Working with XML DiffGrams

Before we move on to the samples code that gives us our HTML, we should discuss what the diffgram looks like. DiffGram doesn't really tell us the visual differences; it isn't the actual differences file. What it does tell us is that given a file A and a diffgram file, you can get to file B by applying the patches specified in the diffgram. In other words, the diffgram shows us how to incrementally build the target file, which is the file we compared against originally. The diffgram itself is written in XML, which can be parsed and used to apply on the original file to get the target file. The diffgram code consists of tags such as add, remove, and change. For more information on the diffgram tags look at this Diff Language page. See the following sample taken from the XML Diff Patch site. The concept would be similar to XPath users. Every tag has a match attribute which works like a select operation. It allows you to move to a specific location in the original file. The other tags then work relative to the position you are placed at. So for instance, match="2" would mean go to the second child node from this location. An add tag adds specific text or markup while a remove tag removes specific text or markup. There are other helper tags such as change, which is used to update the contents.

<?xml version="1.0" encoding="utf-16"?>
<xd:xmldiff version="1.0" srcDocHash="5346998544451918424" options="None"
    xmlns:xd="https://www.microsoft.com/xmldiff">
  <xd:node match="2">
    <xd:change match="1" name="yy" />
    <xd:node match="3" />
    <xd:add>
      <e>Some text 4</e>
      <f>Some text 5</f>
    </xd:add>
    <xd:node match="4">
      <xd:change match="1">Changed text</xd:change>
      <xd:remove match="2" />
    </xd:node>
    <xd:node match="5">
      <xd:remove match="@secondAttr" />
      <xd:add type="2" name="newAttr">new value</xd:add>
      <xd:change match="@firstAttr">changed attribute value</xd:change>
    </xd:node>
    <xd:remove match="6" opid="1" />
    <xd:add type="1" name="p">
      <xd:add type="1" name="q">
        <xd:add match="/2/6" opid="1" />
      </xd:add>
    </xd:add>
  </xd:node>
  <xd:descriptor opid="1" type="move" />
</xd:xmldiff>

As you can see, parsing this code and applying the changes specified in the diffgram is not trivial. However, thankfully we don't have to do all that ourselves. The XmlDiff and Patch utility ships with samples code that does all this work for us. It can be found in the Samples\XmlDiffView directory. We compiled that source code and then copied the generated library (XmlDiffPath.View.dll) out to our directory to reuse and link to it. It contains one class called XmlDiffView. XmlDiffView has a method called Load, which takes the original XML file and the DiffGram file. Load internally loads the original file and applies the diffgram patches to it to reach the target file. While doing so, it also stores the HTML required to show the differences in two columns for each line that was read. The desired output HTML is got by invoking the GetHTML method, which takes a TextWriter to write the HTML.

For the interested reader, the bulk of parsing work is done in a private method found in XmlDiffView.cs file called ApplyDiffgram. I am quoting it here to see what's going on.

private void ApplyDiffgram( XmlNode diffgramParent, XmlDiffViewParentNode sourceParent ) 
{
sourceParent.CreateSourceNodesIndex();
   XmlDiffViewNode currentPosition = null;

   IEnumerator diffgramChildren=diffgramParent.ChildNodes.GetEnumerator();
while ( diffgramChildren.MoveNext() ) 
{
      XmlNode diffgramNode = (XmlNode)diffgramChildren.Current;
      if ( diffgramNode.NodeType == XmlNodeType.Comment )
         continue;
      XmlElement diffgramElement = diffgramChildren.Current as XmlElement;
      if ( diffgramElement == null )
         throw new Exception( "Invalid node in diffgram." );
if ( diffgramElement.NamespaceURI != XmlDiff.NamespaceUri )
         throw new Exception( "Invalid element in diffgram." );
string matchAttr = diffgramElement.GetAttribute( "match" );
      XmlDiffPathNodeList matchNodes = null;
      if ( matchAttr != string.Empty )
matchNodes = XmlDiffPath.SelectNodes( _doc, sourceParent, matchAttr );

switch ( diffgramElement.LocalName ) {
         case "node":
         if ( matchNodes.Count != 1 )
throw new Exception( "The 'match' attribute of 'node' element must select a single node." );
         matchNodes.MoveNext();
         if ( diffgramElement.ChildNodes.Count > 0 )
ApplyDiffgram( diffgramElement, (XmlDiffViewParentNode)matchNodes.Current );
         currentPosition = matchNodes.Current;
         break;
         case "add":
if ( matchAttr != string.Empty ) {
OnAddMatch( diffgramElement, matchNodes, sourceParent, ref currentPosition );
            }
            else {
            string typeAttr = diffgramElement.GetAttribute( "type" );
            if ( typeAttr != string.Empty ) {
OnAddNode( diffgramElement, typeAttr, sourceParent, ref currentPosition );
            }
            else {
OnAddFragment( diffgramElement, sourceParent, ref currentPosition );
            }
            }
            break;
            case "remove":
OnRemove( diffgramElement, matchNodes, sourceParent, ref currentPosition );
                 break;
              case "change":
OnChange( diffgramElement, matchNodes, sourceParent, ref currentPosition );
               break;
            }
        }
    }

The main objective here is to get the current node from the diffgram, and based on the action specified we do, either add, remove, or change operation which is called out by the different case statements.

And that's it. Believe it or not, all we did was piece all of these things together much the same way it was done on the online tool to generate the output file. There remained the small matter of displaying the HTML in an IE Control.

Given below is the code that we overviewed earlier to generate the diffgram and then the output file. The source code download attached to this article contains the full implementation.

public void DoCompare(string file1, string file2)
{
   Random r = new Random(); 
   //to randomize the output files and hence allow 
   //us to generate multiple files for the same pair 
   //of comparisons.

   string startupPath = Application.StartupPath;
   //output diff file.
   diffFile = startupPath + Path.DirectorySeparatorChar + "vxd.out"; 
   XmlTextWriter tw=new XmlTextWriter(new StreamWriter(diffFile) );
   tw.Formatting = Formatting.Indented;

   //This method sets the diff.Options property.
   SetDiffOptions();

   bool isEqual = false;

   //Now compare the two files.
   try
   {
isEqual = diff.Compare( file1, file2, compareFragments, tw);
   }
   catch ( XmlException xe )
   {
     MessageBox.Show( "An exception occured while comparing\n" + xe.StackTrace );
   }
   finally
   {
     tw.Close();
   }

   if (  isEqual )
   {
      //This means the files were identical for given options.
      MessageBox.Show ( "Files Identical for the given options");
      return; //dont need to show the differences.
   }

   //Files were not equal, so construct XmlDiffView.
   XmlDiffView dv = new XmlDiffView();

   //Load the original file again and the diff file.
   XmlTextReader orig = new XmlTextReader( file1 );
   XmlTextReader diffGram = new XmlTextReader( diffFile );
   dv.Load( orig, 
      diffGram );

   //Wrap the HTML file with necessary html and 
   //body tags and prepare it before passing it to 
//the GetHtml method.

string tempFile = startupPath + Path.DirectorySeparatorChar + "diff" + r.Next() + ".htm";
   
StreamWriter sw1 = new StreamWriter( tempFile );
//Wrapping
   sw1.Write("<html><body><table>");
   sw1.Write("<tr><td><b>");
   sw1.Write(textBox1.Text);
   sw1.Write("</b></td><td><b>");
   sw1.Write(textBox2.Text);
   sw1.Write("</b></td></tr>");

   //This gets the differences but just has the 
   //rows and columns of an HTML table
   dv.GetHtml( sw1 );

   //Finish wrapping up the generated HTML and 
//complete the file by putting legend in the end just like the 
//online tool.
   
sw1.Write("<tr><td><b>Legend:</b> <font style='background-color: yellow'" +
" color='black'>added</font>&nbsp;&nbsp;<font style='background-color: red'"+
"color='black'>removed</font>&nbsp;&nbsp;<font style='background-color: "+
"lightgreen' color='black'>changed</font>&nbsp;&nbsp;"+
"<font style='background-color: red' color='blue'>moved from</font>"+
"&nbsp;&nbsp;<font style='background-color: yellow' color='blue'>moved to"+
"</font>&nbsp;&nbsp;<font style='background-color: white' color='#AAAAAA'>"+ "ignored</font></td></tr>");

   sw1.Write("</table>
"); //HouseKeeping...close everything we dont want to lock. sw1.Close(); dv = null; orig.Close(); diffGram.Close(); File.Delete ( diffFile ); //Open the IE Control window and pass it //the HTML file we created. Browser b = new Browser( tempFile ); b.Show(); //Display it! //Done! }

As you can see, we use the XmlDiff object to compare the two files (try catch block). XmlDiff takes a StreamWriter to write out the diffgram text. This diff file and the original file are then loaded by the Load method into the XmlDiffView object if the files are not equal (isEqual flag). We preformat the output HTML with the required leading HTML tags. The HTML returned by GetHTML contains only the rows and columns of the two files. So we wrap that HTML with the complete and correct html tags that can be loaded in any Web browser.

Other Features of the XML Diff and Patch Tool

Since the tool is built out of modules, modules can be easily replaced and recompiled. If you think of a more efficient way of parsing the diffgram, you can plug that in and use it to generate the output. Also the output currently is directly put to an IE Plug-in through a temporary file. If required, this can be stored out to a permanent file.

I hope you find this utility useful for comparing XML files and working with XmlDiff easier in future.