Another of the less visible, but still cool features in ColdFusion 9 are the enhancements we've made to <cfpdf>. We've added the ability to:
- Add headers/footers to existing PDFs
- Create PDF packages
- Selectively optimize PDF size
- Extract text from PDFs
- Extract images from PDFs
- Create high quality thumbnails
Of these features, my personal favorites are optimization and extraction.
Optimization
PDFs can do a lot. Consequently, PDFs size can swell due to the presence of extra information, metadata, and embedded files. The optimize feature allows you to remove specific types of extras in order to selectively reduce the size of your PDF. But you can retain features that you need. When you take action="optimize" the following options are open to you:
- noattachments
- nobookmarks
- nocomments
- nofonts
- nojavascript
- nolinks
- nometadata
- nothumbnails
Code looks like this:
<cfpdf action="optimize"
source="#ExpandPath('./UserGroupTour2009.pdf')#"
destination="#ExpandPath('./UserGroupTour2009Opt.pdf')#"
noattachments = true
nobookmarks = true
nocomments = true
nofonts = true
nojavascripts = true
nolinks = true
nometadata = true
/>
As you can see, the code is pretty straightforward. I've seen reductions of 65-75% on PDF size when using all options.
Extraction
Yes, you can get at the text or embedded images of a PDF with ColdFusion 9.
Here's the code to get at the text of a PDF:
<cfpdf action="extracttext"
source="#ExpandPath('./UserGroupTour2009Opt.pdf')#"
name="cfref"
/>
<cfdump var="#XMLparse(cfref)#" >
That code will extract the text of a PDF to XML. The structure divides the content into pages, so you can quickly get at content on particular pages, etc.
You have a few options that I'm not showing though. You can get the content as just a string. You can selectively get page numbers. You can even get XY coordinates for all of the words in the document.
Getting images is similar; you plug in a PDF, and send the images to a directory:
<cfpdf action = "extractimage"
source = "#ExpandPath('./UserGroupTour2009Opt.pdf')#"
destination = "ExpandPath('./images')" />
You have options to prefix the images, and pick image formats
As you can see, the engineers added some cool functionality here.
4 response s so far ↓
1 dev // Jul 20, 2009 at 10:19 PM
2 Calvin // Jul 27, 2009 at 1:50 AM
3 Aaron Neff // Sep 19, 2009 at 2:01 AM
-Aaron Neff
4 Anna LKee // Sep 19, 2009 at 9:56 AM
P.S. The best
<a href="http://www.queentorrent.com">torrents search</a> engine.
Leave a Comment