Converting a Library’s Word Documents to PDF using Word Automation Services for SharePoint 2013


  1. Setting up Word Automation Services for SharePoint 2013
  2. Setting up PowerPoint Automation Services for SharePoint 2013
  3. Converting a Library’s Word Documents to PDF
  4. Converting a Library’s PowerPoint documents to PDF
  5. Using ITextSharp to Merge PDF’s into one
  6. Using ITextSharp to create basic PDF page from item property

In my previous posts, I showed you how to set up your environment so that you could perform Word or PowerPoint conversion to PDF. In this post I’m going to walk through the code that you can use to convert your word document to PDF.

Back in SharePoint 2010, when you converted a Word Document to PDF you created a Conversion Job, this was to accommodate bulk file operation scenarios. The job was placed in a queue within the Word Automation Services and ran by a SharePoint Timer Job. Due to it being ran on a SharePoint Timer Job, there wasn’t a simple way of determine when a conversion had completed. Developers who wanted to perform a conversion synchronously, and get a response if the file conversion was successful or not just couldn’t. Part of the SharePoint 2013 version of Word Automation Services, you can now call the Word Automation Services on demand. Meaning you can now perform synchronous file conversions. If you wish to perform Asynchronous Word to PDF conversion, there is a MSDN walkthrough article how you might achieve this. https://msdn.microsoft.com/en-us/library/office/ff181518(v=office.14).aspx

I have put together a simple Farm Solution that although it is performing a bulk conversion of Word Documents to PDF, it does it synchronously. There is currently no possible way of doing this process within Office 365, well not without using an on premise farm to perform the actual conversion for you.

I have built a simple visual web part (maybe in a real world scenario this might be a button on the ribbon of the library, or you perform a event receiver on the list that converts any word document to pdf), that has textbox for a user to enter the document library they wish to convert Word Documents to PDF, and a button. The button click calls then looks up the document library, grab all word documents then loops through each word document converting to PDF and then save it back to the same library.

image

The Shared Document library for my site, I have added two word documents to my library.

image

One is a simple test document that says “hello you”.

image

And a second document that is actually an “Event Flyer” Template created in Microsoft Word, and edited slightly. This is just so that you can see pictures and layouts conversion too.

image

Below is the code and comments explaining what is happening at different stages of the code.

//Will require this using statement
using Microsoft.Office.Word.Server.Conversions;

//This is the name of your Word Automation Service within Central Admin. If you have followed my setup yours will be called WAS too.
public static string WORD_AUTOMATION_SERVICE = "WAS";

//Click handler
protected void btnWordConvertToPDF_Click(object sender, EventArgs e)
{
  //Get current site context         
  var web = SPContext.Current.Web;
  var list = web.Lists.TryGetList(libraryName);

 if(list != null)
 {
    WordDocsToConvertToPdf(list);
 }
} 

private void WordDocsToConvertToPdf(SPList library)
{
   //Perform a SPQuery that returns only Word Documents.
   SPQuery query = new SPQuery();
   query.Folder = library.RootFolder;
   //Include all subfolders so include Recursive Scope.
    query.ViewXml = @"<View Scope='Recursive'>
                                <Query>
                                   <Where>
                                        <Or>
                                            <Contains>
                                                <FieldRef Name='File_x0020_Type'/>
                                                <Value Type='Text'>doc</Value>
                                            </Contains>
                                            <Contains>
                                                <FieldRef Name='File_x0020_Type'/>
                                                <Value Type='Text'>docx</Value>
                                            </Contains>
                                        </Or>
                                    </Where>
                                </Query>
                            </View>"; 

  //Get Documents
  SPListItemCollection listItems = library.GetItems(query);

  //Check that there are any documents to convert.
            if (listItems.Count > 0)
            {
                foreach (SPListItem li in listItems)
                {
                    //Perform the conversion in memory first, therefore we require a MemoryStream.
                    using (MemoryStream destinationStream = new MemoryStream())
                    {
                        //Call the syncConverter class, passing in the name of the Word Automation Service for your Farm.
                        SyncConverter sc = new SyncConverter(WORD_AUTOMATION_SERVICE);
                        //Pass in your User Token or credentials under which this conversion job is executed.
                        sc.UserToken = SPContext.Current.Site.UserToken;
                        sc.Settings.UpdateFields = true;

                        //Save format
                        sc.Settings.OutputFormat = SaveFormat.PDF; 

                        //Convert to PDF by opening the file stream, and then converting to the destination memory stream.
                        ConversionItemInfo info = sc.Convert(li.File.OpenBinaryStream(), destinationStream);

                        var filename = Path.GetFileNameWithoutExtension(li.File.Name) + ".pdf";
                        if (info.Succeeded)
                        {
                            //File conversion successful, then add the memory stream to the SharePoint list.
                            SPFile newfile = library.RootFolder.Files.Add(filename, destinationStream, true);
                        }
                        else if (info.Failed)
                        {
                            throw new Exception(info.ErrorMessage);
                        }
                     }
                }
            }
        }

After typing in the Document library name and clicking the button, the word document have been converted to PDF.

image

image

 

When building this demo, I did discover that if you have your Active Directory on the same box as your SharePoint then Word Automation Service will throw an error message as you try to convert documents, saying that:

“The file could not be converted; it may be corrupt or otherwise invalid (the conversion process failed). Please try opening the file in Microsoft Word, resaving it, and then resubmitting the file for conversion. If this does not resolve the issue, contact your system administrator.”

So firstly, I’m sorry to any developers out there that have used my earlier blogs to build themselves a Single SharePoint 2013 server farm. And secondly thank you to Karsten Pohnke who blog post confirmed why I was getting the error message. http://www.ilikesharepoint.de/2014/07/sharepoint-word-automation-service-does-not-work-file-may-be-corrupted/

Link to Sample code : OneDrive