sudo apt-get install imagemagick
convert *.jpg pictures.pdf
convert img{0..19}.jpg slides.pdf
# joining multiple images into one pdf
convert page1.jpg page2.jpg file.pdf
# NOTE: the +compress turns off compression.
# turns off compression and resulting PDF will be big!
# From ubuntuforums.org, the +compress helps it to not hang.
convert page1.jpg page2.jpg +compress file.pdf
‘convert 的 compress` 参数
It actually disables all compression leaving you with a PDF 10 times bigger than the original JPEG. Just don’t specify compression options, and convert will go with the input compression format (JPEG) which in this case is the best option file size-wise.
PDFtk is free graphical tool that can be used to split or merge PDF files. It is available as free and paid versions. You can use it either in CLI or GUI mode.
PDFsam is capable of doing a lot of more things than just merging: Split, Rotate, Extract, Split bookmarks and many more. PDFsam is written in Java and (of course) available in most Linux distributions.
1
2
# 安装要段时间,有100M+
sudo apt-get install pdfsam
出现如下错误后,没在继续尝试。环境时 mint19.1 + openjdk8
错误 执行 pdfsam 提示 Could not find or load main class java.se.ee
1
2
3
$ /usr/lib/jvm/java-8-openjdk-amd64/bin/java -XX:+IgnoreUnrecognizedVMOptions --add-modules java.se.ee -Xmx512M -classpath /usr/share/pdfsam/lib/* -splash:/usr/share/pdfsam/resources/splash.gif -Dapp.name=pdfsam-basic -Dapp.pid=21474 -Dapp.home=/usr/share/pdfsam -Dbasedir=/usr/share/pdfsam -Dprism.lcdtext=false -Djdk.gtk.version=2 org.pdfsam.basic.App
Error: Could not find or load main class java.se.ee
gs - Ghostscript (PostScript and PDF language interpreter and previewer)
无需另外安装。系统自带。
1
2
3
4
5
6
7
8
9
10
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf mine1.pdf mine2.pdf
# for low resolution PDFs
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged.pdf mine1.pdf mine2.pdf
# 上述2种方法,分辨率效果都好于convert
convert -density 300x300 -quality 100 mine1.pdf mine2.pdf merged.pdf
# trick to shrink the size of PDFs, I reduced with it one PDF of 300 MB to just 15 MB with an acceptable resolution! and all of this with the good ghostscript, here it is:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=output.pdf input.pdf
Open jpg or png file with LibreOffice Writer and export as PDF.
OCR
1
2
3
4
5
# add an OCRed text layer that doesn't change the quality of the scan in the pdfs so they can be searchable:
pypdfocr combined.pdf
# 或者,
ocrmypdf combined.pdf combined_ocr.pdf