Filesystem annotation and things that suck. Also, procrastination.
Today was a ‘designated project day’. I’m supposed to do no client work - just product development.
I didn’t get much work done.
I seem to have spent the entire day doing admin. This is a good thing - it’d been piling up quite badly over the last few months. I found a forgotten credit card statement. I shuffled a lot of money between accounts. I still haven’t filed all of my invoices or done all of my phone calls. I still haven’t attacked the pile of dishes in the sink - there’s been literally nothing clean for about three days now.
I need a personal assistant or something.
I did bolt on a new feature to the planner - you can now set reminders on tasks. It’s bolted on in the most literal sense possible - I added a new column to my task tree and display the reminder date there. It makes absolutely no sense from a UI point of view, but it solves my immediate problem.
I did have the chance to look over the code again with fresh eyes, seeing as I haven’t touched it in weeks. I’m embarrassed to release it publicly. It works, but the design is so awful that I’d be afraid a future employer or client would see it. It looks an awful lot like a VB application I wrote back in high school. Not good.
I’m also having immediate problems with file synchronization (surprise, surprise) and am eager to start on that. Unison is sucking a whole lot less now that I’ve completed a synchronization (previously, it would crap out because it was taking too long) but it still sucks in fundamental ways, such as not being able to sync multiple directories at once. As one might commonly do, given we have lots of stuff.
Tracker is also sucking, but a lot of people have complained about that. Sorry, Jamie, it’s broken. ‘Churns disk constantly’ is not an acceptable characteristic for any piece of desktop software.
Google Desktop for Linux seemed to suck a lot less in the constant-disk-churning department, but it also did something nasty that made every file access slow. And my Thunderbird would just lock hard for thirty seconds at a time every few minutes. Probably while it was reindexing. Sigh.
This is a weak segue into a more general problem area that I’ve been thinking about which is filesystem annotation. Essentially, what Tracker is doing is reading each file and creating some sort of index. Presumably, it also checksums each file so that it knows when it’s changed. It also sets inotify(s) so that it gets a message when files change.
Well, so does a file synchronizer. I just want to know if a file has changed, which means checksum. If we were sensible and using filesystems that actually knew when disk corruption had occurred like ZFS, I could just query some sort of checksum metadata in the filesystem, in a completely non-portable and non-futureproofed fashion. It would be fast, though. Rambling aside, I want to perform some actions when a file is changed, and I want to keep a little bit of data alongside each file.
Incremental backups (and rsync) have the same problem. You just want to know which files have changed. You don’t want to touch every file on the system. Making things even worse, Linux systems enable atime by default, which means that you’re writing to every file’s metadata every time you traverse the filesystem.
(You can’t just look at file modification times because a) lots of applications don’t change them, and b) they differ between non ntpdated systems. It’s a good way to get your files out of sync without realizing.)
And every one of these applications has the attribute that you want to traverse the filesystem as often as possible, either for reliability or accuracy.
Synergies are brewing here, I tells ya.
Basically, traversing a filesystem is dumb. But we have no better mechanism right now. We want to add file metadata after the fact and the filesystem designer has no way to know what you might want to add and when you might want to add it.
Because of this fundamental problem, I am without a good backup system. I am without file indexing and I am without file synchronization. Multi-gigahertz quad-core processors don’t help with any of that.
How do we fix this, then? (I’m more or less out of premeditated content here and into the exploratory). I think that if each of these applications could get a list of files that have been modified and the datetime on which they were modified, that would help a lot. I assume something like this has to be baked into either the kernel or libc; maybe Windows has an API for this already. I do recall Google Desktop sucking a lot less on there. The list would need to be expired at some point; you can’t track every modification forever or you’ll run out of RAM. Filesystems can probably do this much more efficiently as well, given they already track modification time. And you still might have to do a full rescan in case something modifies the filesystem while your program isn’t running.
If I had my way, it’d be a filesystem option. Every block or inode is checksummed; the block checksum and modtime are stored side-by-side and there’s a fast way to query these checksums/modtimes. A full dump would probably be too slow (multi-gigabyte Maildirs seem to be a good way to show up performance problems in all sorts of software), so a list list of recent changes would be good too. It has the RAM problem, however. Keeping a queue in RAM is more desirable than acting on them immediately, especially in these sort of applications where timeliness is good, but overall system performance is more important. You do the processing while the machine is idle.
Unfortunately, I’m at the limits of what idle speculation can gain me, and more to the point I have to solve these problems using the systems we have now, not some magic pixie-dust filesystem with freaking checksums (come on! seriously! how could every common filesystem in existence not checksum its data! do you not care about reliability at all?) So it looks like inotify, ionice and load detection for now. Undoubtedly, once I start building this stuff I’ll start to understand the problems better.
Back to the dishwashing.
December 19th, 2007 at 10:03 am
[…] covered this in filesystem annotation. What I want is an API that notifies me when files change (or are created or deleted). I think […]