Blog

ColdFusion 9 PDF Enhancements

July 20, 2009 · 4 Comments

Another of the less visible, but still cool features in ColdFusion 9 are the enhancements we've made to <cfpdf>. We've added the ability to:

  • Add headers/footers to existing PDFs
  • Create PDF packages
  • Selectively optimize PDF size
  • Extract text from PDFs
  • Extract images from PDFs
  • Create high quality thumbnails

Of these features, my personal favorites are optimization and extraction.

Optimization

PDFs can do a lot. Consequently, PDFs size can swell due to the presence of extra information, metadata, and embedded files. The optimize feature allows you to remove specific types of extras in order to selectively reduce the size of your PDF. But you can retain features that you need. When you take action="optimize" the following options are open to you:

  • noattachments
  • nobookmarks
  • nocomments
  • nofonts
  • nojavascript
  • nolinks
  • nometadata
  • nothumbnails

Code looks like this:

<cfpdf    action="optimize"
      source="#ExpandPath('./UserGroupTour2009.pdf')#"
      destination="#ExpandPath('./UserGroupTour2009Opt.pdf')#"
      noattachments = true
      nobookmarks = true
      nocomments = true
      nofonts = true
      nojavascripts = true
      nolinks = true
      nometadata = true
      />

As you can see, the code is pretty straightforward. I've seen reductions of 65-75% on PDF size when using all options.

Extraction

Yes, you can get at the text or embedded images of a PDF with ColdFusion 9.

Here's the code to get at the text of a PDF:

<cfpdf action="extracttext"
      source="#ExpandPath('./UserGroupTour2009Opt.pdf')#"
      name="cfref"
       />


<cfdump var="#XMLparse(cfref)#" >

That code will extract the text of a PDF to XML. The structure divides the content into pages, so you can quickly get at content on particular pages, etc.

You have a few options that I'm not showing though. You can get the content as just a string. You can selectively get page numbers. You can even get XY coordinates for all of the words in the document.

Getting images is similar; you plug in a PDF, and send the images to a directory:

<cfpdf action = "extractimage"
   source = "#ExpandPath('./UserGroupTour2009Opt.pdf')#"
   destination = "ExpandPath('./images')" />

You have options to prefix the images, and pick image formats

As you can see, the engineers added some cool functionality here.

Tags: ColdFusion · Centaur

4 response s so far ↓

  • 1 dev // Jul 20, 2009 at 10:19 PM

    Why don't enhance the cfdocument to print web to PDF ? I think it is better to embed WebKit rendering engine in CF for generating web thumbnails on the fly.
  • 2 Calvin // Jul 27, 2009 at 1:50 AM

    Excellent. Creating a thumbnail of a PDF would be VERY useful for my content management app.
  • 3 Aaron Neff // Sep 19, 2009 at 2:01 AM

    Regarding WebKit.. yes, it looks like the upcoming JWebPane (based on WebKit) may be the logical replacement for the end-of-life ICEbrowser SDK.

    -Aaron Neff
  • 4 Anna LKee // Sep 19, 2009 at 9:56 AM

    Thanks for the tip. It helped me solve a problem .

    P.S. The best
    <a href="http://www.queentorrent.com">torrents search</a> engine.

Leave a Comment