Curious case of combining pdfs
The case:
Few many chapters of some study material,files like chapter1.pdf, chapter2.pdf....chapterN.pdf, was easy to loop and wget the files as there was a easy to crack URL pattern, so i just had to do something like:
url=myurl for file in chapter{1.100} do wget $url/$file.pdf done
The bad cat and good cat :
Indeed as an GNU/Linux lover cat is always handy, indeed the issue with pdf unlike ps is cat 1.pdf 2.pdf > 12.pdf will just give us 2.pdf. The very well know and common tool for combining pdf is indeed pdftk.
Tour with pdftk :
pdftk indeed a handy tool for manipulating PDF's was not so handy, until a easy work around was figured out for this case.
The issue:
After wgetting themall, the directory has files like:
chapter10.pdf chapter15.pdf chapter1.pdf chapter24.pdf chapter5.pdf chapter11.pdf chapter16.pdf chapter20.pdf chapter25.pdf chapter6.pdf chapter12.pdf chapter17.pdf chapter21.pdf chapter2.pdf chapter7.pdf chapter13.pdf chapter18.pdf chapter22.pdf chapter3.pdf chapter8.pdf chapter14.pdf chapter19.pdf chapter23.pdf chapter4.pdf chapter9.pdf
As its clear that the ordering is not as required,
pdftk *.pdf cat output combined.pdf
would indeed mess up!
The work around:
Trial 1: Numbers padded with zeros.
Felt like a easy and straight forward way to fix this mess is to pad the number with zeros as :
for name in c*.pdf; do num=${name//[![:digit:]]} newname=$(printf "C%03i.pdf" $num) echo mv "${name}" "${newname}" mv "${name}" "${newname}" done
And then do a
pdftk *.pdf cat output combined.pdf
This was indeed a round about and unnecessary way of resolving this issue! That was only realized in trail 2
Trial : BASH globing saves the day
Using globing it was very easy to reduce the whole exercise into a simple line as :
pdftk chapter[0-9].pdf chapter[1-9][0-9].pdf cat output mixed.pdf
More deeper look into trial 2
c[0-9].pdf expands to any files that matches that pattern.But something like c{1..10} generates 10 words, it does not try to match it to filenames.
To make it more clear here is an example:
# touch c1.pdf c4.pdf c12.pdf; echo c[0-9].pdf c[1-9][0-9].pdf; echo c*.pdf output: c1.pdf c4.pdf c12.pdf output: c1.pdf c12.pdf c4.pdf
So this ends the case of combing pdfs, please do share your experiences below!
Recent blog posts
- watir-webdriver web inspector
- gem list to gemfile
- Packing ruby2.0 on debian.
- Made it into The Guinness Book!
- to_h in ruby 2.0
- Filter elements by pattern jQuery.
- Better HTML password fields for mobile ?
- Grayscale image when user offline
- nth-child CSS pseudo-class Christmas colors
- EventEmitter in nodejs