Using ITextSharp to Merge PDF’s into one


  1. Setting up Word Automation Services for SharePoint 2013
  2. Setting up PowerPoint Automation Services for SharePoint 2013
  3. Converting a Library’s Word Documents to PDF
  4. Converting a Library’s PowerPoint documents to PDF
  5. Using ITextSharp to Merge PDF’s into one
  6. Using ITextSharp to create basic PDF page from item property

In the last couple of posts I’ve shown you how to convert Word and PowerPoint documents into PDF. In this post I’m going to show you how to use ITextSharp to merge your PDF’s together. iTextSharp is a port of the iText open source java library for PDF generation written entirely in C# for the .NET platform. http://itextpdf.com

Following on from my last Farm Solution, I have extended the solution by adding another button that when clicked, it looks up the document library and grabs all pdf files. Looping through each file, copies each pdf to a memory stream, once it have completely looped through each file, it then saves the file back to the SharePoint library. The advantage of using ITextSharp dll, is that you could create a provider hosted SharePoint App, and therefore create this merge functionality in Office 365. Unfortunately, I’m not showing you how to do this in this demo.

The Shared Document library for my site, I already have pdf’s files from my previous demos.

First thing we need to do is add the itextsharp to the Visual Studio project. With NuGet this is really simple. Right click your project and select Manage NuGet Packages…

This will open the Manage NuGet Packages dialog. Type in the Search Online Box for ITextSharp, the top result should be ITextSharp. Just click the Install button, and accept the License Acceptance.

This will add the itextsharp reference to your project. All that remains is to ensure that this dll is deployed to the GAC. Double Click on the Package.Package file in your project.

Select the Advanced tab. Then click the Add to add Existing assemblies. You will find the dll sitting in a package folder at the same location you find the .sln file for your solution. Navigate down through the folder to Packages\iTextSharp.5.5.4\lib\itextsharp.dll. Select the Deployment Target as GlobalAssemblyCache and the location is itextsharp.dll

Click OK.

Below is the code and comments explaining what is happening at different stages of the code.

//Will require these using statements
using iTextSharp.text;
using iTextSharp.text.pdf;

protected void btnPDFMerge_Click(object sender, EventArgs e)
{
  //Get current web context
   var web = SPContext.Current.Web;
   var list = web.Lists.TryGetList(libraryName.Text)
   if (list != null)
   {
       MergePdfs(list);
   }
 
 private void MergePdfs(SPList library)
{
  byte[] result = null;
  SPFolder merged = null;
  //Using helper method (explained further down, look for folder called Merged)
  if (!DoesFolderExist(library.RootFolder, "Merged"))
  {
     //Folder not found so create it.
     merged = library.RootFolder.SubFolders.Add("Merged");
     library.Update();
  }
  else
  {
    //Using helper method (explained further down, Get folder called Merged)
    merged = GetFolder(library.RootFolder, "Merged");
  }
    //CAML Query to return all pdf file types.
   SPQuery query = new SPQuery();

   //Include all subfolders so include Recursive Scope
   //TODO: Ignore Merged folder.
    query.ViewXml = @"<View Scope='Recursive'>
         <Query>
            <Where>
                <Contains>
                    <FieldRef Name='File_x0020_Type'/>
                    <Value Type='text'>pdf</Value>
                </Contains>
            </Where>
        </Query>
      </View>";

   //Get Documents
   SPListItemCollection listItems = library.GetItems(query);
   //Check that there are any documents to merge
   if (listItems.Count > 0)
   {
      //Perform the conversion in memory first
      using (MemoryStream ms = new MemoryStream())
      {
         //Using the ItextSharp Document
         using (Document document = new Document())
         {
           //Using the ITextSharp PdfCopy to create a PDF document in the memory stream
           using (PdfCopy copy = new PdfCopy(document, ms))
           {
             //Open the document before any changes can be made.
             document.Open();

             //Loop through each file
             foreach (SPListItem li in listItems)
             {
               // Convert the BinaryStream of the file to a ITexSharp PdfReader
               PdfReader reader = new PdfReader(li.File.OpenBinaryStream());
               int n = reader.NumberOfPages;
               //Loop through each page in the current PDF file
              for (int page = 0; page < n; )
              {
                 //Import the page to the PDF document in the memory stream.
                 copy.AddPage(copy.GetImportedPage(reader, ++page));
              }
             }
          }
       }

      //Convert the final memory stream into a byte array.
      result = ms.ToArray();
   }
    //Create a new file in the Merged folder called combined.pdf from the byte array.
    SPFile newFile = merged.Files.Add("combined.pdf", result, true);
  }
 }
}

I used a helper method to check if folder exists and another method to return the found folder. I have pinched this method and converted to work on an on-premise code from the OfficeDevPnP code. (I can’t take credit for this).

//Pass in folder and folder we are looking for.
private bool DoesFolderExist(SPFolder parentFolder, string folderName)
{
  //Check that foldername
  if(string.IsNullOrEmpty(folderName))
 {
    throw new ArgumentException("folderName");
 }
   
 //Get collection of the sub folders
 var folderCollection = parentFolder.SubFolders;
 //Call the worker code
 var exists = FolderExistsImplementation(folderCollection, folderName);

  return exists;
}

private bool FolderExistsImplementation(SPFolderCollection folderCollection, string folderName)
{
    //Validate inputs
    if(folderCollection == null)
    {
       throw new ArgumentNullException("folderCollection");
    }

    if(string.IsNullOrEmpty(folderName))
    {
       throw new ArgumentException("folderName");
    }

    //Loop each folder for the folderName
    foreach(SPFolder f in folderCollection)
    {
       if(f.Name.Equals(folderName, StringComparison.InvariantCultureIgnoreCase))
       {
         //If found return true
         return true;
       }
    }

   //not found return false
   return false;
 }
 
 //This code is almost identical to Does folder exist, except instead of return true/false it returns the folder or null.
//In fact these I could just use this method for both and check for null instead of true/false
private SPFolder GetFolder(SPFolder parentFolder, string folderName)
{
   if (string.IsNullOrEmpty(folderName))
   {
       throw new ArgumentException("folderName");
   }

   var folderCollection = parentFolder.SubFolders;
   var folder = GetFolderImplementation(folderCollection, folderName);
   return folder;
}

private SPFolder GetFolderImplementation(SPFolderCollection folderCollection, string folderName)
{
   if (folderCollection == null)
   {
      throw new ArgumentNullException("folderCollection");
   }
   if (string.IsNullOrEmpty(folderName))
   {
     throw new ArgumentException("folderName");
   }

   foreach (SPFolder f in folderCollection)
   {
      if (f.Name.Equals(folderName, StringComparison.InvariantCultureIgnoreCase))
      {
          return f;
      }
   }

  return null;
}

As you can see from my screen shot, I now have a document called combined.pdf sitting in my Merged folder inside my Shared Document library.

Link to Sample code : OneDrive

Advertisements