Paperless Extractor

When I was designing ReceiptWallet, I was concerned about the database being corrupted and losing all my data. I decided that as a backup to the database, I’d store all the files as PDFs (later in life it would store other types of files, but for the most part people only stored PDFs) and would embed all the metadata as PDF keywords that could be recovered. The keywords in the PDFs also served as a transport mechanism where you could send a PDF from ReceiptWallet to someone else with ReceiptWallet and when imported, all the metadata would be filled out. In addition to being used as a backup of the data, I envisioned the day when I no longer used ReceiptWallet as I didn’t want to keep supporting it.

While this original idea seemed pretty solid, writing the keywords for the PDFs proved to be somewhat problematic. At the time, Apple’s PDFKit wasn’t completely reliable and could corrupt files when writing them out or crash. Reluctantly I added a switch to ReceiptWallet to turn off this feature (default was to have this on). So depending on the PDF, some PDFs may not have keyword data in them so metadata would basically be lost.

As I wrote earlier this year, that day has come. In my transition away from Paperless (formerly ReceiptWallet), I manually dragged out all files from one of my libraries into the file system and organized them into folders by categories. I placed the folder (named Document Library) in iCloud Drive allowing me to access all the files from anywhere including on my phone. At one point in the past, I started looking at an iPhone version of ReceiptWallet that would sync files; it didn’t go anywhere. Now I finally have that feature!

While I moved a document library over to the file system, I left all my libraries for receipts (one for each year) in Paperless and left them to deal with another day. A few weeks ago I was talking to my father about his transition away from Paperless (he likes to follow my lead sometimes) and I asked him if he had a tool to extract all his files, if he’d use it.

I spent a few hours that day working on a small Mac application that did just that. I had a chance to do a little SwiftUI and Swift concurrency. You drop a Paperless library onto the app’s main window, it walks the library (the library is a bundle which is just a folder which a file extension) looking for every PDF and then opens each file to find the date and merchant. It then creates folders by year and month for the files.

I’ve made the source to this app freely available. It comes with no warranty and I will not make a binary of it available; the tool has worked fairly well for my father and me, but may have some quirks (at one point, extra PDFs without metadata were added to the Paperless libraries).

This app was made without using any source to ReceiptWallet or Paperless; the directory structure of a library can be viewed in the Finder by control clicking the library and choosing Show Package Contents.

I have no information on the future of Paperless and I’m switching away from it as the file system handles my needs better than it did when I wrote ReceiptWallet over 16 years ago.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.