Doing things right vs. doing things fast
Sunday, December 2nd, 2007I’m encountering a very strong conflict between doing things right and doing things fast.
I want to get something working ASAP. The sync algorithm is difficult and too large to hold in my head at one time. I want to make sure all of my theory is correct. This is the push for fast.
Many of the decisions I’m making now will hurt me later on. I’m already hitting a lot of points where the existing sync algorithms just won’t work when you’re feeding in live and/or disconnected updates. And that’s before I even consider multi-way syncing (e.g. laptop/desktop/server/work computer). And so, I have a big urge to do things right.
Psychologically, doing things right is not going to work for me. I don’t work like that. I need results now and I need to think yeah, that’s cool. I don’t do well slogging away on something for long periods, especially with the technical risks that exist at the moment in this project.
I’m very much a cowboy coder: I’ll slap something together fast, and it’ll do the job, but it won’t be pretty. I get the biggest rush when something does its job for the first time. There’s a little bit that comes from incrementally adding functionality or fixing bugs - I also have a strong perfectionist streak. But I find some tasks extremely tedious - GUI programming (it’s been done a thousand times, and it’s so hard to get right) and web programming come to mind. They seem pointless - they’ve been done before, they’re uninteresting, there’s no rush that comes from seeing them come alive for the first time. They’re analogous to just drawing things on paper, and doing so on a computer is painful. I’d rather just draw them on paper.
I will tend to build up a lot of technical debt. This is why I like working for startups: they need stuff fast more urgently than they need it right. Assuming the startup survives long enough it will need stuff done right, but by that time they’ve usually hired other programmers that can do that instead of me. Usually my startup clients haven’t even known what they want to build in the beginning, so coming up with a perfect all-encompassing design and full test suites is pointless. It’ll probably just get thrown away.
It’s the same situation for my MicroISV. I will need stuff done right - eventually. But my top priority is to get enough income coming in, and that means getting a product out the door ASAP. And because I don’t yet know how this particular product is going to work - I don’t completely understand the problem - there’s an even greater incentive to ignore all of these scary problems that I can see looming (performance, cross-platform compatibility, correctness under wierd conditions). I’ll just push on and get something working well enough.
Concrete examples
I’m using an SQLite database to store file metadata. I’m a big fan of SQLite - it’s just so easy to use and implement. But it’s very complicated for what I want, which is essentially a tree of modtimes using filenames as a primary key. Oh, and with another dimension covering the ‘other machines’ data. Ordered by a journal id. Handling a million records. I don’t know of any relational database that is going to give acceptable performance under these conditions. It’s not an easy data structure to design in C, either, but I’m sure I can beat SQLite’s performance.
Thing is, there’s no need to right now. I have some horrible table joins and ORDER BY clauses, but they do the job. I haven’t run into performance problems yet, probably because:
- I’m not doing live updates. I’m working under Unison conditions, which is ’scan for changes at the same time on both sides and sync immediately’. This simplifies things tremendously.
- I’m not using large datasets. I have a few test directories with half a dozen files in each. Eventually, I want to be able to handle up to a million files (I have a Linux kernel tree and a Buildroot in my syncable data, for example).
I’m implementing this in Python right now. Python lets me crank out ideas fairly quickly. Eventually, I expect I’ll have to move to C for performance reasons. The amount of work that the Python implementation does every time it touches a file is enormous - it has to create an object which contains an object for each sync peer, as well as a database query or two. There’s just no need to optimize yet. Premature optimization is the root of all evil, and all that.