Recently, I had the need to convert a set (> 30) of Word documents (docx
files) to pdf. Cobbling together pieces, I came up with a PowerShell script that takes advantage of the feature within Word to save a document as pdf.
As I have been wont to do of late, I turned this simple task into a more complete script, with comment based help, that I can distribute and share. This serves two purposes for me.
- It helps to continue to improve my scripting. I have to think through how to best generalize common functionality, how to perform (light) error checking, and how to design for use by others.
- It also helps to ensure that a user can utilize the script right away. Note that because I anticipate using this script somewhat infrequently, the ability to recall its use is important to future me!
Unfortunately, this script only works on Windows as it requires the use of a COM Object that allows PowerShell to open and control a Word application.
Of particular interest to me within the script is how to perform a map like operation where I am only interested in processing a collection and producing side-effects rather than processing a collection and returning a new collection. To accomplish, the script is broken into two pieces — creating the initial collection and producing the side-effect on each element within the collection.
Creating the initial collection
Utilizing Get-ChildItem, creating the collection is quite easy. Utilizing a simple filter results in a collection of files in a directory with a docx
extension.
Get-ChildItem -Path "${docxDir}" -Filter "*.docx"
Saving as pdf
To process the above collection, it is piped into ForEach-Object which iterates over the collection, executing a function for each file (the collection element).
ForEach-Object {
convertToPdf -wordApp $wordApp -wordFile $_
}
The actual function is quite simple, merely opening the Word file and saving as a pdf.
function convertToPdf {
Param(
[Parameter(Mandatory = $true)]
[System.__ComObject]$wordApp,
[Parameter(Mandatory = $true)]
[System.IO.FileInfo]$wordFile
)
$pdfFile = Join-Path $wordFile.DirectoryName "$($wordFile.Basename).pdf"
$wordDoc = $wordApp.Documents.Open($wordFile.FullName)
Write-Host "Converting $wordFile to $pdfFile"
try {
$wordDoc.SaveAs($pdfFile, 17)
}
finally {
$wordDoc.Close()
}
}
Putting it all together, I merely have to pass in a directory and all docx
files within are converted into a pdf.
Convert-DocxToPdf -docxDir "C:\Users\myuser\Documents\Word"
Finally, I had need to combine all the resulting pdf files into a single pdf and accomplished using pdftk.
pdftk *.pdf cat output singlefile.pdf