The faxes received on this line need to be verified and forwarded to another company for fulfillment. There was a project tangentially related to these faxes, so I decided to add the fax-handling process into this system, creating models, views, controllers, etc. The fax machine sent us the emails in pdf form, and they were downloaded by our server and saved in a directory specifically for the faxes, but it turned out the next company in the chain would not be able to open the pdf files (cause still unknown), so we changed the output (and controllers and cron jobs) to handle the faxes as .tiff files. This also worked fine for a while.
Yesterday, we were expecting two different faxes from one sender, and I eventually was notified by our system that we had received a fax, and I waited to forward the information until the next fax came in. When this didn't happen for an hour, I checked the mailbox manually, and saw I'd only received a cover sheet in the .tiff that the fax machine had sent me. I know .tiffs can send multi-page documents, so I was confused. I logged into the fax machine and found that when I downloaded the pdf, it was a proper multi-page document. I split the pages with pdftk and converted them to .tiff with imagemagick and sent them on their merry way. Later in the day, we received another fax which I found out was multiple pages, requiring me to do this again. Now I know that it is customary to provide a cover sheet on your TPS reports and faxes, so this is going to become an issue. Let's fix that.
I'm running an Ubuntu 16.04 box with Apache 2.4.18, and ImageMagick 6.8.9.
First step is to install pdftk on my server (I was using it locally before).
sudo apt install pdftk
Let's look at the faxes folder:
sharkserver:faxes$ ls
10.tif 11.pdf 6.tif 7.tif 8.tif 9.tif
This includes previous .tifs I had to deal with. Let's start with bursting (or splitting) the .pdf files:
for file in $(ls -1|egrep '^[0-9]+.pdf'); do basename=${file##*/}; filename=${basename%%.*}; if [ -e $filename-1.tif ]; then continue; fi; pdftk $file burst output ${filename}-%d.pdf; done
It's a bash for loop, with the iterated files being the result of ls -1 | egrep '^[0-9]+.pdf'
. That's a one-line-per-file listing of the directory, piped through grep, which is looking for pdfs starting with one or more numbers. Once we have this list of files, we extract the base name through parameter expansion. The basename expression gives us the match after the solidus, and the filename expression gives us the match before the full stop. Once we have the name of the file without the extension, we check if a tif has already been created for the pdf with if [ -e $filename-1.tif ]
. If it doesn't exist, then we use pdftk to split the pdf into pages, preserving the initial filename, and using a hyphen and digit to serialize the pages. This gives us a pdf for each page:
sharkserver:faxes$ ls
10.tif 11-1.pdf 11-2.pdf 11-3.pdf 11.pdf 6.tif 7.tif 8.tif 9.tif doc_data.txt
Now that the pages are separate .pdfs, let's convert them to .tifs:
for file in $(ls -1|egrep '\-.*.pdf'); do convert $file ${file%%.*}.tif; rm $file; done;
This finds any pdf with a hyphen in the name, and converts it to a tif, preserving the name before the file extension. Once that's done, remove the pdf (we don't need the separate pdf pages anymore).
sharkserver:faxes$ ls
10.tif 11-1.tif 11-2.tif 11-3.tif 11.pdf 6.tif 7.tif 8.tif 9.tif doc_data.txt