Blog

Pulling Content out of Word with ColdFusion 9

August 4, 2009 · 7 Comment s

I had a 1 + 1 = 2 moment the other day. I was fooling around with the ColdFusion's ability to turn Word docs into PDFs. At first glance it's pretty simple and straightforward:

<cfset src = expandPath("./cf9.docx") />
<cfset result = expandPath("./cf9.pdf") />
<cfdocument filename="#result#" srcfile="#src#" format="pdf" />

Word to PDF is nice to have, but as features go, it's a pretty small bullet point. Don't get me wrong, you get fidelity to the original, including fonts, layouts, and images. But it's still just converting a Word document to a PDF.

That is until you remember that you can pull content out of PDFs now in ColdFusion 9. So now you can do this:

<cfset src = expandPath("./cf9.docx") />
<cfset result = expandPath("./cf9.pdf") />
<cfdocument filename="#result#" srcfile="#src#" format="pdf" overwrite="true" />

<cfpdf action="extracttext"
      source="#result#"
      name="cfref"
       />


<cfdump var="#XMLparse(cfref)#" >

This will yield you the content of the original Word document. Now that's cool.

Tags: ColdFusion

7 response s so far ↓

  • 1 Mingo Hagen // Aug 4, 2009 at 9:18 AM

    while cool, I'd like to have <cfdocument action="extracttext">.
  • 2 Ben Nadel // Aug 4, 2009 at 9:19 AM

    Awesome! I didn't even know ColdFusion could convert word documents into PDFs! Bitchy!
  • 3 John Farrar // Aug 4, 2009 at 11:05 PM

    Now that has some serious potential. Will have to
    start dumping and see what pragmatic use this can have! :)
  • 4 Ben Spencer // Aug 11, 2009 at 1:57 AM

    I havent downloaded CF9 yet, but this example uses .docx as the document format. Can the same be done for the old .doc format which wasn't XML based?
  • 5 Terrence Ryan // Aug 11, 2009 at 4:39 PM

    Ben: I haven't tested that myself, but I'm pretty sure we've said it works. ;)
  • 6 Ben Spencer // Aug 11, 2009 at 5:27 PM

    Thanks Terrence, I read up on it in the end. Yep, it should do .doc quite nicely (with whatever quirks are associated with OpenOffice).

    Use #1 for me: Produce thumbnails of word docs for document management application.
  • 7 Juan Escalada // Sep 22, 2009 at 7:10 AM

    Dear Terrence,

    A client of mine is asking if a document could be uploaded so that the document´s footnotes would be stripped and, together with the associated paragraph, be emailed to different people (As in Footnote 1 and its paragraph goes to Adam for check-up and footnote 2 goes to Joe)...
    I could convince the client to use a PDF if that mad ethings any easier... But I´d appreciate your insigt to know wether this would be at all possible...

    Thanks in advance, Juan Escalada.

Leave a Comment