Tool: Backup File History

I keep coming back to Robin Sloan's article, "An app can be a home-cooked meal" as a particularly good description of the value that comes from having a hand in your world. As I am not a very good cook, yet I still want to eat satisfying dishes, I like to think of my home cooked apps through the analogy of a toolbox, instead: Most of them are things someone else created, but I put them together into a box I can keep nearby for when something needs doing. Sometimes they don't make a tool that does what you want, so you have to bend and glue one together on your own. My "Backup File History" script is one of those, and I've gotten proud enough of it to share it with you.

I grew up in a time of extremely rapid technology shifts, and over the lifetime of a human that means that I've lost a lot of information that I would really rather not have lost. All of the clever ideas I had when I was a teenager didn't seem like they were worth much after I moved out on my own and they were lost in the shuffle of changing formats, dying hard drives, recycled computers, and misguided deletions. But now that I've settled down into adulthood, it would be nice to have access to all those emails and notebooks and text files and sketches that seemed precious enough to transcribe at the time. Some of this could be (and should be) solved by moving things over to newer formats as people kept coming up with new ones, and by being more dedicated about keeping those files around, but there was another aspect that took me a few decades on my own to understand: I noticed that one thing I was sorely missing from today's all-digital, all-the-time approach was the depth that came from watching how my ideas evolved. Every program makes it easy to continue where you left off and keep going forward, but it's very hard to look back in the future and see how the files got to this point. Software developers deal with it using a version control system, and office workers deal with it (more clumsily) though the "revision" systems in some programs, but that got me thinking that it would be nice to have some way of seeing the trail of a file. How did it get here? What ideas did I abandon along the way? How do I approach making something? So I taped together some bits of Python to make that happen.

So I made a tool.

This is the entire script as of 2023-02-24. If you don't know Python (or don't care about the specifics), meet me down below.

import configparser, datetime, fnmatch, glob, logging, os, re, shutil

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(os.path.basename(__file__) if __name__ == '__main__' else __name__)

def iterateFileHistoryConfigs(path: str):
	"""Recursively find and yield '.keepFileHistory' files found under the given path."""
	for entry in os.scandir(path):
			if entry.is_dir(follow_symlinks=False):
					yield from iterateFileHistoryConfigs(entry.path)
			elif entry.name == '.keepFileHistory':
					logger.debug(f"Found a .keepFileHistory at {entry.path}")
					yield entry

def iterateFilesToArchive(path: str, configuration=None):
	"""Yield files which need to be archived in the given path."""

	# If we weren't given a configuration, load it from the directory
	if configuration == None:
		configFilePath = os.path.join(path, '.keepFileHistory')
		configuration = configparser.ConfigParser(allow_no_value=True)
		with open(configFilePath) as configFile:
			configuration.read_string("[FileHistory]\n" + configFile.read(), source=configFilePath)

	for entry in os.scandir(path):
			# Never keep file history for our own config file
			if entry.name == '.keepFileHistory':
				continue

			# Skip files that look like backups
			if re.search(r'\.\d{8}\.[\w\d]+$', entry.name) != None:
				continue

			# Follow directories if recursive
			# Since we're allowing no value, we have to check if recursive is anything but False, as no value will give recursive=None
			recursive = configuration.get('FileHistory', 'recursive', fallback=False)
			if entry.is_dir() and recursive != False:
				yield from iterateFilesToArchive(entry.path, configuration)
			elif entry.is_dir():
				continue

			# Skip non-files
			if not entry.is_file():
				logger.debug(f"{entry.path} wasn't a file")
				continue

			# Skip files that don't match the 'include' pattern
			includePatterns = configuration.get('FileHistory', 'include', fallback='*.*').split(', ')
			if not any(map(lambda includePattern: fnmatch.fnmatch(entry.path, includePattern), includePatterns)):
				logger.debug(f"{entry.path} didn't match the include patterns, {includePatterns}")
				continue

			# Skip files that match the 'exclude' pattern
			excludePatterns = configuration.get('FileHistory', 'exclude', fallback='').split(', ')
			if any(map(lambda excludePattern: fnmatch.fnmatch(entry.path, excludePattern), excludePatterns)):
				logger.debug(f"{entry.path} matched the exclude patterns, {excludePatterns}")
				continue

			# If this entry has been updated since the latest historic file, or doesn't have a historic file, yield it
			entryRoot, entryExtension = os.path.splitext(entry)
			historicFilenames = glob.glob(f'{entryRoot}.*{entryExtension}')
			if len(historicFilenames) == 0:
				logger.debug(f"{entry.path} didn't have a historic file")
				yield entry
			elif (match := re.search(r'\.(\d{4})(\d{2})(\d{2})' + entryExtension, max(historicFilenames))):
				latestHistoricFileDate = datetime.date(*map(lambda d: int(d), match.groups()))
				latestEntryDate = datetime.date.fromtimestamp(entry.stat().st_mtime)
				if latestEntryDate > latestHistoricFileDate:
					logger.info(f"{entry.path} has been updated since its latest historic file")
					yield entry
				else:
					logger.debug(f"{entry.path} has the same mtime as its latest historic file")

def createHistoricFile(fileToArchive: os.DirEntry):
	"""Create a historic copy of a file as '<filename>.<mdate>.<ext>'."""
	fileToArchiveDirAndRoot, fileToArchiveExt = os.path.splitext(fileToArchive)
	fileToArchiveMdate = datetime.date.fromtimestamp(fileToArchive.stat().st_mtime).strftime('%Y%m%d')
	historicFilePath = f'{fileToArchiveDirAndRoot}.{fileToArchiveMdate}{fileToArchiveExt}'

	logger.info(f"Copying {fileToArchive.path} to {historicFilePath}")
	shutil.copy2(fileToArchive, historicFilePath)

if __name__ == '__main__':
	for fileHistoryConfig in iterateFileHistoryConfigs('.'):
		for fileToArchive in iterateFilesToArchive(os.path.dirname(fileHistoryConfig)):
			createHistoricFile(fileToArchive)

That's roughly one and a half pages of code that does a pretty specific thing: Look for files that I want to "keep", and make a copy with the date in the file name. When I run the script, "Split keyboard cyberdeck notes.txt" becomes "Split keyboard cyberdeck notes.20230224.txt". It absolutely has limitations: It requires Python. It requires that I run it regularly. It only knows how to store one version per day. It doesn't do a lot of "safety" checks that more rigorously designed software would do. But this script has one huge benefit: It does exactly what I want, and it works exactly how I think it does. It's replicating the manual process of saving a file as a new version, but without needing to remember to do it, so I can't accidentally find myself two years down the line wishing I had saved those versions. This script also has another benefit that is even more important in the age of constant interruptions: You just set it up once and then run it forever. I don't have to remember commands, navigate an interface, type things in, or do much of anything. I just push a "Backup file history" button every day and it does it. Eventually I could make that run automatically when I start my computer, but it's a nice feeling to push the button and know that my tool worked.

The last thing I'll mention is that it keeps track of versions with a date instead of a version number. I did that because it connects those files with a part of my life, rather than just being Versioned Objects. I can see the timeline of my life and watch how my files changed in response to it by looking at the frequency of updates, the gaps in the dates, and what other files changed in the middle. Did I lose interest in a project, jump to another one, and then come back to this? Did I have a burst of excitement for something and then forget it existed? Did I go through a big move or meet a new person around that time? These are all much more fascinating to me than "an update was made", so my file histories have dates instead of version numbers.

Anyway, that's a shim that I made, and have been using for a few months now. It's proven itself to be pretty sturdy, so I think I'm ready to call it a full-fledged tool now. It's been far more useful than trying to find a hammer for this particular nail, in any case.