Unless you’ve been living under a rock for the last 5 years, you should have noticed the imminent extinction of paper statements from most of the consumer services. Whether we like it or not, e-bills are the way of the future.
One problem with e-bills is that they still need to be stored – preferably, in a way that makes them easy to find. One also cannot rely on the service owners to archive old bills, as they rarely keep them for longer than 1 year.
A typical process managing such bills involves multiple steps:
- Download the new bill from the website of the service provider.
- Rename the PDF file from an exciting automatic (e.g
8B96766CBC0A49E6.pdf) to a boring one (e.g.
- Store it so that one could find it when needed (ideally, not in a single folder holding thousands of such files).
About 3 years ago, I’ve written a relatively simple tool that attempts to automate the latter two of those 3 steps (automatically downloading is notoriously difficult, given different website organization for each provider and the fact that they love to tweak the website layout, frequently breaking the scraping procedure).
The tool is called
classify_bills and is available on GitHub. When run on a bunch on PDF files, it tries to figure out what account each file belongs to and what time period it is for. It then renames it appropriately and moves it into an appropriate destination.
The tool is configuration-driven, it basically runs off a bunch of regular expressions (hat tip to jwz), and configuring a new account is strictly speaking, not pleasant. Still, once configured, it eliminates the chore of laboriously doing those actions manually, bill by bill, month by month, making the process faster and pleasanter.