Using the InfoPath HTML to XHTML Conversion Tool

The Microsoft Office InfoPath 2007 HTML to XHTML conversion tool allows you to convert regular HTML into well-formed XHTML that can be edited in an InfoPath form. This is useful in cases where a form designer needs to take HTML documents that are created outside of InfoPath and insert them into a form. Because InfoPath will only accept well-formed XHTML, the HTML must first be converted. The conversion process attempts to fix malformed HTML by inserting closing tags, such as the following:

  </p>

The conversion process also creates self-closing tags where needed, such as the following:

  <br/>

It also fixes attributes which are not properly formed.

** Warning **  The HTML to XHTML conversion tool may fail when it encounters badly formed HTML. It is not designed to correct all the possible instances of malformed HTML, such as HTML that is completely lacking any closing tags. The tool will not correct HTML, for example, like the following:

  <a><b>text

The HTML to XHTML conversion tool is implemented as a Component Object Model (COM)-based object model that contains one object and two methods. The object is named XHTMLUtilities and the methods that it implements are convertToXHTML and convertToXHTMLEx. This simple object model can be used in any COM-compliant programming language.

The file name that contains the methods of the HTML to XHTML conversion tool is html2xhtml.dll, and it is installed with Microsoft Visual Studio 2005 Tools for the 2007 Microsoft Office System, which can be downloaded from MSDN. This DLL must first be registered on your computer before you can reference it in script. In addition to this file, there is also a file named html2xhtml_sample.htm that is used to demonstrate how you can use the XHTMLUtilities object.

The following sections discuss the two methods of the XHTMLUtilities object.

The convertToXHTML method

Creates an XHTML string from a supplied HTML or XHTML string.

Syntax

expression.convertToXHTML(ByVal bstrHTML As String) As String

Remarks

The convertToXHTML method may fail to produce the appropriate XHTML string if XML is passed instead of HTML.

Example

In the following example, Windows script code is used to create a reference to the FileSystemObject object, which reads the HTML text stored in a specified file. The convertToXHTML method is used to convert the HTML text contained in a file to XHTML, then the FileSystemObject is used to create a new file that contains the XHTML text:

  var args = WScript.Arguments;

if (args.length != 2) { WScript.Echo("Usage: " + WScript.ScriptName + " <INPUTHTML> <OUTPUTXHTML>"); } else { var strInputFile = args.item(0); var strOutputFile = args.item(1); var objFSO = WScript.CreateObject("Scripting.FileSystemObject"); var objInputFile = objFSO.OpenTextFile(strInputFile, 1 /ForReading/, false);

strHTML = objInputFile.ReadAll(); var oXHTMLUtils = new ActiveXObject("HTML2XHTML.XHTMLUtilities"); strXHTML = oXHTMLUtils.convertToXHTML(strHTML); var objOutputFile = objFSO.CreateTextFile(strOutputFile, true); objOutputFile.Write(strXHTML); objOutputFile.Close(); }

The convertToXHTMLEx method

Creates an XHTML string from a supplied HTML or XHTML string, and returns information about any changes that were made.

Syntax

expression.convertToXHTMLEx(ByVal bstrHTML As String, ByVal iOptions As Long, ByRef pfStatus As Long) As String

Remarks

The convertToXHTMLEx method may fail to produce the appropriate XHTML string if XML is passed instead of HTML.

Example

In the following example, the convertToXHTMLEx method is used to convert the HTML text contained in a file to XHTML:

  var args = WScript.Arguments;

if (args.length != 2) { WScript.Echo("Usage: " + WScript.ScriptName + " <INPUTHTML> <OUTPUTXHTML>"); } else { var strInputFile = args.item(0); var strOutputFile = args.item(1); var objFSO = WScript.CreateObject("Scripting.FileSystemObject"); var objInputFile = objFSO.OpenTextFile(strInputFile, 1 /ForReading/, false); var bReturn;

strHTML = objInputFile.ReadAll(); var oXHTMLUtils = new ActiveXObject("HTML2XHTML.XHTMLUtilities"); strXHTML = oXHTMLUtils.convertToXHTMLEx(strHTML, 1, bReturn); var objOutputFile = objFSO.CreateTextFile(strOutputFile, true); objOutputFile.Write(strXHTML); objOutputFile.Close(); }