Converting programtically created word documents to HTML errors with nulls.

Jul 30, 2013 at 12:30 PM
Im creating WordProcessingDocuments using openxml (which works fine and the produced word doc is exactly what i want), now im trying to convert these newly created docs to HTML using the openxml Powertools. Im new to this so im hoping thats its something stupid that im missing but was hoping someone could point me in the right direction with these nullable errors im receiving.

This is the exact error...

System.NullReferenceException: Object reference not set to an instance of an object.
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func2 imageHandler)
at OpenXmlPowerTools.HtmlConverter.<>c__DisplayClass37.<ConvertToHtmlTransform>b__1d(XElement e)
at System.Linq.Enumerable.WhereSelectEnumerableIterator
2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XElement..ctor(XName name, Object content)
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func2 imageHandler)
at OpenXmlPowerTools.HtmlConverter.<>c__DisplayClass37.<ConvertToHtmlTransform>b__1c(XElement e)
at System.Linq.Enumerable.WhereSelectEnumerableIterator
2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XElement..ctor(XName name, Object[] content)
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func`2 imageHandler)


Im using the exact same code you can find on Eric Whites blog.
public static void PrintHTML(string file)
    {
        byte[] byteArray = File.ReadAllBytes(file);
        using (MemoryStream memoryStream = new MemoryStream())
        {
            memoryStream.Write(byteArray, 0, byteArray.Length);
            using (WordprocessingDocument doc =
                WordprocessingDocument.Open(memoryStream, true))
            {

                HtmlConverterSettings settings = new HtmlConverterSettings()
                {
                    //PageTitle = "some title"
                };
                XElement html = HtmlConverter.ConvertToHtml(doc, settings);

                File.WriteAllText(@"C:\\Temp\Test.html", html.ToStringNewLineOnAttributes());
            }
        }
    }

I know the code works because if i pass it a normal worddoc that i havnt created it works fine and converts to html fine. If i create a word doc using openxml then manually copy the contents into a new word file, save it, then pass it through the conversion code, that will work as well. So im thinking it must be something to do with the way im createing the word doc in openxml initially. Maybe im not adding a part to the file that is required.

Using the openxml sdk i have compared a working and non working file and they appear to have the same components/parts.

From the errors iv posted does anyone have any ideas of where the problem could be, ie, what is null? I can post the creation code for the word doc but its quite extensive and it might just confuse people more.

thanks for anyones help.

James
Jul 30, 2013 at 2:41 PM
Hi James,

The code is failing in the image handler callback, which points to something being incorrect with the markup for the image, or how the image is setup in the package. Have you tried using the validation functionality of the Open XML SDK 2.5 Productivity Tool and see if it reports any errors?

Also I suggest examining the markup for each image in your document and see where the problem could be. You could also take the step of temporarily deleting the markup for images in the main document part, and then seeing if it will convert. By trial and error, you can then narrow down exactly which image has the issue, and then can examine that image markup in more detail.

Cheers, Eric
Jul 30, 2013 at 4:34 PM
Thanks for your reply Eric,

I finally got to the bottom of this. I had to dig out the source code for the HtmlConverter in the openxmlpower tools, after some debuging I found that this line in the code was erroring...

line 371
styleId = (string)wordDoc.MainDocumentPart.StyleDefinitionsPart
      .GetXDocument().Root.Elements(W.style)
      .Where(e => (string)e.Attribute(W.type) == "paragraph" &&
      (string)e.Attribute(W._default) == "1")
      .FirstOrDefault().Attributes(W.styleId).FirstOrDefault();
basically in my debugging the
(string)e.Attribute(W._default) 
was returning as True or False

so i changed the following line
 .Where(e => (string)e.Attribute(W.type) == "paragraph" &&
      (string)e.Attribute(W._default) == "1")
to
.Where(e => (string)e.Attribute(W.type) == "paragraph" && (
      (string)e.Attribute(W._default) == "1" || (string)e.Attribute(W._default) == "true"))
and now works as expected. Thanks for your help

James