I had a 1 + 1 = 2 moment the other day. I was fooling around with the ColdFusion's ability to turn Word docs into PDFs. At first glance it's pretty simple and straightforward:
<cfset src = expandPath("./cf9.docx") />
<cfset result = expandPath("./cf9.pdf") />
<cfdocument filename="#result#" srcfile="#src#" format="pdf" />
Word to PDF is nice to have, but as features go, it's a pretty small bullet point. Don't get me wrong, you get fidelity to the original, including fonts, layouts, and images. But it's still just converting a Word document to a PDF.
That is until you remember that you can pull content out of PDFs now in ColdFusion 9. So now you can do this:
<cfset src = expandPath("./cf9.docx") />
<cfset result = expandPath("./cf9.pdf") />
<cfdocument filename="#result#" srcfile="#src#" format="pdf" overwrite="true" />
<cfpdf action="extracttext"
source="#result#"
name="cfref"
/>
<cfdump var="#XMLparse(cfref)#" >
This will yield you the content of the original Word document. Now that's cool.
12 response s so far ↓
1 Mingo Hagen // Aug 4, 2009 at 9:18 AM
2 Ben Nadel // Aug 4, 2009 at 9:19 AM
3 John Farrar // Aug 4, 2009 at 11:05 PM
start dumping and see what pragmatic use this can have! :)
4 Ben Spencer // Aug 11, 2009 at 1:57 AM
5 Terrence Ryan // Aug 11, 2009 at 4:39 PM
6 Ben Spencer // Aug 11, 2009 at 5:27 PM
Use #1 for me: Produce thumbnails of word docs for document management application.
7 Juan Escalada // Sep 22, 2009 at 7:10 AM
A client of mine is asking if a document could be uploaded so that the document´s footnotes would be stripped and, together with the associated paragraph, be emailed to different people (As in Footnote 1 and its paragraph goes to Adam for check-up and footnote 2 goes to Joe)...
I could convince the client to use a PDF if that mad ethings any easier... But I´d appreciate your insigt to know wether this would be at all possible...
Thanks in advance, Juan Escalada.
8 Don Blaire // Apr 23, 2010 at 7:22 PM
9 Tad // May 17, 2010 at 4:02 PM
10 Virginia Neal // May 28, 2010 at 7:30 PM
However, I have a real need to keep the general format of the document as well. This would simply be things line centering, indenting/tabs, line breaks, etc.
I had hoped that using useStructure="true" along with honourspaces="true" would return the basic format of the document, but that does not seem to be the case.
Do you know if it is possible to maintain the basic formatting of a PDF document?
Thanks
BTW - the PDF documents that I am working with began as Word and were converted to PDF using your suggested approach (thank you).
11 Cheyenne Throckmorton // Jun 14, 2010 at 4:14 PM
I was running into a similar problem in the last week. I used an alternative solution that worked in my use case that may or may not help. I still converted the word documents to pdfs then I used the thumbnail capability to create jpg images at 100% resolution and then displayed those images to the end user which worked well for our use case.
While I am looking forward to diving more into the capabilities here and with DDX I think the overall problem we both encountered is that DDX is about the actual content of the pdf devoid of styling, similar to how HTML "is supposed to be". With HTML you add in a CSS file to style, and I believe there is similar type of document that combines with the DDX to create your fully stylized PDF.
Thats obviously a lot more work to dig into and I only had the time to do the thumbnail solution for now, but hope that helps.
@Terrence thanks for the blog and great tips that even got me started on solving my use case.
12 Nick // Aug 7, 2010 at 4:49 PM
Please correct me if I'm wrong, but I just ran Terrence's code and received a error that an openOffice install was required.
Leave a Comment