Print this page
Thursday, 12 April 2018 20:55

Bash and fax machines

Written by Sharkis
Rate this item
(0 votes)

We have a fax number. Antiquated, yes. I'd rather not deal with it.

Originally, it was configured to also email faxes to a number of recipients in the company. My first change was to create an email address specificially to receive the faxes, and use a cron job to check and process any unread messages in the inbox. This worked fine for a while.

The faxes received on this line need to be verified and forwarded to another company for fulfillment. There was a project tangentially related to these faxes, so I decided to add the fax-handling process into this system, creating models, views, controllers, etc. The fax machine sent us the emails in pdf form, and they were downloaded by our server and saved in a directory specifically for the faxes, but it turned out the next company in the chain would not be able to open the pdf files (cause still unknown), so we changed the output (and controllers and cron jobs) to handle the faxes as .tiff files. This also worked fine for a while.

Yesterday, we were expecting two different faxes from one sender, and I eventually was notified by our system that we had received a fax, and I waited to forward the information until the next fax came in. When this didn't happen for an hour, I checked the mailbox manually, and saw I'd only received a cover sheet in the .tiff that the fax machine had sent me. I know .tiffs can send multi-page documents, so I was confused. I logged into the fax machine and found that when I downloaded the pdf, it was a proper multi-page document. I split the pages with pdftk and converted them to .tiff with imagemagick and sent them on their merry way. Later in the day, we received another fax which I found out was multiple pages, requiring me to do this again. Now I know that it is customary to provide a cover sheet on your TPS reports and faxes, so this is going to become an issue. Let's fix that.

I'm running an Ubuntu 16.04 box with Apache 2.4.18, and ImageMagick 6.8.9.

First step is to install pdftk on my server (I was using it locally before).

sudo apt install pdftk

Let's look at the faxes folder:

sharkserver:faxes$ ls 10.tif 11.pdf 6.tif 7.tif 8.tif 9.tif

This includes previous .tifs I had to deal with. Let's start with bursting (or splitting) the .pdf files:

for file in $(ls -1|egrep '^[0-9]+.pdf'); do basename=${file##*/}; filename=${basename%%.*}; if [ -e $filename-1.tif ]; then continue; fi; pdftk $file burst output ${filename}-%d.pdf; done

It's a bash for loop, with the iterated files being the result of ls -1 | egrep '^[0-9]+.pdf'. That's a one-line-per-file listing of the directory, piped through grep, which is looking for pdfs starting with one or more numbers. Once we have this list of files, we extract the base name through parameter expansion. The basename expression gives us the match after the solidus, and the filename expression gives us the match before the full stop. Once we have the name of the file without the extension, we check if a tif has already been created for the pdf with if [ -e $filename-1.tif ]. If it doesn't exist, then we use pdftk to split the pdf into pages, preserving the initial filename, and using a hyphen and digit to serialize the pages. This gives us a pdf for each page:

sharkserver:faxes$ ls 10.tif 11-1.pdf 11-2.pdf 11-3.pdf 11.pdf 6.tif 7.tif 8.tif 9.tif doc_data.txt

Now that the pages are separate .pdfs, let's convert them to .tifs:

for file in $(ls -1|egrep '\-.*.pdf'); do convert $file ${file%%.*}.tif; rm $file; done;

This finds any pdf with a hyphen in the name, and converts it to a tif, preserving the name before the file extension. Once that's done, remove the pdf (we don't need the separate pdf pages anymore).

sharkserver:faxes$ ls 10.tif 11-1.tif 11-2.tif 11-3.tif 11.pdf 6.tif 7.tif 8.tif 9.tif doc_data.txt

Read 511 times Last modified on Thursday, 12 April 2018 21:16