PDF Scan File Size: What To Do About It.

Viorel PETCU - Sep 14 - - Dev Community

badge

In today's digital age, we're constantly creating, sharing, and storing documents and media. While the cost of storage has dropped and internet speeds have skyrocketed, there's a hidden cost we often overlook—our environmental impact. A wonderful resource for insights on how our digital activities affect the environment is The Green Web Foundation, particularly their calculators.

The Environmental Cost of Large Files

It might not be immediately obvious, but the size of the files we share and store contributes to global energy consumption. Data centers, which store everything from photos to PDFs, use massive amounts of electricity. Smaller files mean less data transferred, processed, and stored—leading to a direct reduction in energy usage and, ultimately, CO₂ emissions.

The 'Ubuntools' Docker Image

This is where my project, the ubuntools Docker image, comes in handy. Initially created as a basic toolkit for probing APIs and aggregating data, I expanded the project to include tools for compressing PDFs and media files. Why? Because reducing file size doesn’t just save space—it helps reduce the environmental impact of our digital lives.

Why Document and Media Compression Matters

Even though cloud storage seems limitless and fiber internet offers instant downloads, the carbon footprint of transferring large files still matters. Every megabyte of data requires energy to transmit and store. By compressing documents and media, we can make a small but meaningful contribution to minimizing our environmental impact. Here's an example of how I incorporated this idea into ubuntools:

I planned to send an article from a magazine to a contact of mine as an email attachment since there was no online version to simply share the link. So, I fired up my flatbed scanner and scanned the three pages:

can't believe my eyes

To my surprise, for a 300 dpi resolution, the result was a 1.77 MB PDF file. That would not even fit on a standard floppy disk back in the day—unacceptable!

I planned to do some editing of the files anyway, using GIMP (for image corrections, cropping, fine rotation, etc.), so I told myself, "Once the shading is gone and the colors are uniform, the PDF size will surely reduce."

BTW, if you're interested in a tutorial on editing PDFs with GIMP (all free and open source software), leave a comment. If there's enough interest, I'll write up a tutorial on the top 10 things you need and how to accomplish them using GIMP.

But then I got an even bigger surprise. After rotation, color correction, and exporting the layers as pages of the PDF, I felt like this:

how I felt

Well... 💩 The file was now 11.4 MB—kind of going in the wrong direction!

So, I taught ubuntools some new tricks. Under the pdf-processing tag, you'll find a base Ubuntu Docker image with the following tools:

  • ghostscript
  • pdftk-java
  • poppler-utils
# start ubuntools in the directory where your big PDF file are
docker run -it --rm -v $(pwd):/work --workdir /work viorelpe/ubuntools:pdf-processing /bin/bash
Enter fullscreen mode Exit fullscreen mode
# execute the following command
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=compressed.pdf original.pdf
Enter fullscreen mode Exit fullscreen mode

This command reduces the scanned document's size while maintaining high quality—perfect for email attachments or archiving. How much did it reduce the size? It came down to 0.71 MB, which is a considerable improvement.

Here’s the finished product and the original side-by-side:

compare uncompressed and compressed

Check the difference for yourself on GitHub.

Expanding with Media Compression

From here, it's easy to integrate other media compression utilities, such as FFmpeg, to reduce the size of videos and images. These tools, combined with ubuntools, make it real easy for you, because all you have to do, is to run two commands:

  1. Start ubuntools with the appropriate tools (via tag).
  2. Run a command and feed it your files.

Conclusion: Think Small, Act Big

File size might seem trivial in an era of "unlimited" storage and bandwidth, but it's the small, cumulative actions that matter. Compress your files, shrink your media, and contribute to a greener future.

badge

. . . . . . . . . . . . . .
Terabox Video Player