Doing things right vs. doing things fast

December 2nd, 2007

I’m encountering a very strong conflict between doing things right and doing things fast.

I want to get something working ASAP. The sync algorithm is difficult and too large to hold in my head at one time. I want to make sure all of my theory is correct. This is the push for fast.

Many of the decisions I’m making now will hurt me later on. I’m already hitting a lot of points where the existing sync algorithms just won’t work when you’re feeding in live and/or disconnected updates. And that’s before I even consider multi-way syncing (e.g. laptop/desktop/server/work computer). And so, I have a big urge to do things right.

Psychologically, doing things right is not going to work for me. I don’t work like that. I need results now and I need to think yeah, that’s cool. I don’t do well slogging away on something for long periods, especially with the technical risks that exist at the moment in this project.

I’m very much a cowboy coder: I’ll slap something together fast, and it’ll do the job, but it won’t be pretty. I get the biggest rush when something does its job for the first time. There’s a little bit that comes from incrementally adding functionality or fixing bugs - I also have a strong perfectionist streak. But I find some tasks extremely tedious - GUI programming (it’s been done a thousand times, and it’s so hard to get right) and web programming come to mind. They seem pointless - they’ve been done before, they’re uninteresting, there’s no rush that comes from seeing them come alive for the first time. They’re analogous to just drawing things on paper, and doing so on a computer is painful. I’d rather just draw them on paper.

I will tend to build up a lot of technical debt. This is why I like working for startups: they need stuff fast more urgently than they need it right. Assuming the startup survives long enough it will need stuff done right, but by that time they’ve usually hired other programmers that can do that instead of me. Usually my startup clients haven’t even known what they want to build in the beginning, so coming up with a perfect all-encompassing design and full test suites is pointless. It’ll probably just get thrown away.

It’s the same situation for my MicroISV. I will need stuff done right - eventually. But my top priority is to get enough income coming in, and that means getting a product out the door ASAP. And because I don’t yet know how this particular product is going to work - I don’t completely understand the problem - there’s an even greater incentive to ignore all of these scary problems that I can see looming (performance, cross-platform compatibility, correctness under wierd conditions). I’ll just push on and get something working well enough.

Concrete examples

I’m using an SQLite database to store file metadata. I’m a big fan of SQLite - it’s just so easy to use and implement. But it’s very complicated for what I want, which is essentially a tree of modtimes using filenames as a primary key. Oh, and with another dimension covering the ‘other machines’ data. Ordered by a journal id. Handling a million records. I don’t know of any relational database that is going to give acceptable performance under these conditions. It’s not an easy data structure to design in C, either, but I’m sure I can beat SQLite’s performance.

Thing is, there’s no need to right now. I have some horrible table joins and ORDER BY clauses, but they do the job. I haven’t run into performance problems yet, probably because:

  • I’m not doing live updates. I’m working under Unison conditions, which is ’scan for changes at the same time on both sides and sync immediately’. This simplifies things tremendously.
  • I’m not using large datasets. I have a few test directories with half a dozen files in each. Eventually, I want to be able to handle up to a million files (I have a Linux kernel tree and a Buildroot in my syncable data, for example).

I’m implementing this in Python right now. Python lets me crank out ideas fairly quickly. Eventually, I expect I’ll have to move to C for performance reasons. The amount of work that the Python implementation does every time it touches a file is enormous - it has to create an object which contains an object for each sync peer, as well as a database query or two. There’s just no need to optimize yet. Premature optimization is the root of all evil, and all that.

File synchronization algorithms: part 3 of monkey

November 26th, 2007

Algorithm 2 (a.k.a. ‘what everyone is using’) has some shortcomings:

  • Detecting changes takes a long time
  • It won’t detect renames or directory moves
  • There are still some cases where you need to resolve conflicts and/or merge files

There are also some usability issues:

  • You need to manually initiate a sync. You can’t just pick up your laptop and go anytime.
  • Performance sucks. I may have mentioned that a dozen or so times.
  • There’s nothing to stop you modifying a file on both sides; you have to remember which is the most recent and remember to sync before working on the other machine.

Here’s how I’ll fix these problems.

Constantly monitor for changes

The existing tools require you to manually initiate a sync, at which point you’ll have a few minutes of disk grinding. I’d rather have the program running constantly and being notified of changes as they happen. The common case is that only a few files will change between syncs - reading all of the files is inefficient.

I covered this in filesystem annotation. What I want is an API that notifies me when files change (or are created or deleted). I think inotify will do this, perhaps FAM. I have no idea what to use on Windows or OSX yet. On a technical level, this is an unsolved problem.

There is a risk here that if files are modified while the application is not running (and hence not receiving notifications) the modifications could be lost.

The fallback option is to scan the file trees while the machine is idle. If you’re checksumming files to detect changes, this can happen during idle time as well.

I think idle time is a grossly underutilized resource right now - we could be doing virus scanning, file indexing, backups and the like constantly instead of at intervals (3am cronjob) or while the user is trying to use the system (like most on-demand virus scanners).

Constantly synchronize changes

If you’re going to scan all of the time, you might as well copy files straight away rather than waiting until the user requests a sync. This will cut down the odds of a merge conflict somewhat, since the files are less likely to be modified simultaneously on both sides. This introduces the idea of a pair of machines being connected; while they are connected, their files are always synchronized. Since you’re probably modifying small amounts of data at a time, this will work reasonably well over a slow network connection.

Lock in-use files

Another way to prevent merge conflicts is to lock a file on machine A if it’s being written to on machine B. This prevents an application on machine A from modifying it at the same time.

Identify machines by a UUID rather than IP address

A common situation is to have a laptop and a desktop that you want synchronized together. You have the laptop at home and sync the files. You take the laptop to work, but because you’re on a different IP the sync program thinks it’s a different machine. If you give each machine a UUID or name, you can be (reasonably) sure of its identity and hence use the right indexes or file trees.

Checksum files or their metadata in order to detect renames

If you’ve got a checksum of each file (or just the modification time and size) and you detect a file deletion, you can look through any new files and see if they’re actually the same file. You can then infer that a file was moved or renamed rather than deleted and a new file created, saving time and bandwidth during the synchronization. It may be possible to optimize this further by looking at inode numbers or their equivalent on whatever filesystem is in use.

File synchronization algorithms, part 2 of elephant

November 21st, 2007

Don’t think about elephants.

The next refinement is to store history data when you look at the file trees. Every time you perform a synchronization you record some metadata for each file. You want to store the filename and the modification time. That way, when you do the next synchronization, you look at what changed between time X and time Y and apply those changes to the remote file tree, somewhat like generating a diff and then patching a tree. You do this twice - once for each direction (A to B and B to A). You can get conflicts, of course.

Conceptually, this looks like:

Compare this with the algorithm from part 1, which looks like this:

Note that if you have no history data, Algorithm 2 works exactly like Algorithm 1. Badly.

This all operates much like a version control system and has similar problems and implications. A VCS usually can’t detect renames of files or directories - you have to explicitly tell the VCS what you’ve done. When you want to perform a synchronization you have to traverse the entire directory tree to find out what’s changed - and this can be very time-consuming. The metadata has to be stored somewhere. Merges almost always require manual intervention and will often be unresolvable (either the user won’t know what to do and will just overwrite one side, or the file format won’t support lines-of-text style merging).

Also note the similar distinction between traditional client-server VCS (e.g. CVS, Perforce) and modern distributed VCS (Mercurial, git). Client-server VCS and propagates the nodes (or the actual files being worked on). Distributed VCS propagates the edges (or diffs). Algorithm 1 is looking purely at the file data and attempting to match it on both sides; algorithm 2 is looking at the changes between the ’sync points’ (or nodes) and propagating the changes.

The actions table for each file looks something like:

File A change File B change Action
 
Created (checksum P) Created (checksum P) Nothing
Created (checksum P) Created (checksum Q) Merge
Deleted No change Delete
Deleted Deleted Nothing
No change No change Nothing
Modified No change Use file A
Modified Modified Merge

(The actions for File A and File B can be interchanged - I didn’t feel like writing out those cases twice.)

If you include the possibility of renames (and horror of horrors, renames with modifies) then you can get a whole lot more combinations and it gets really nasty. I must give kudos to SourceGear for Vault for this: it does handle all of those nasty cases, a headache which I can do without.

Detecting what’s happened between time X and time Y is similarly mechanical. For a given file:

Time X Time Y Change
Does not exist Exists Created
Exists Does not exist Deleted
Checksum P Checksum P Nothing
Checksum P Checksum Q Modified

Without having looked at the source code, I’d say this is the algorithm that Unison uses. I’d also guess that most ‘proper’ synchronization programs use this. It’s the simplest thing that works in most cases.

Note that you also need to be able to reliably detect a change in a file. The (almost) infallible way to do this is to hash the file. I say almost because hash collisions do happen - they’re just extremely rare. ‘Extremely rare’ becomes a lot more common when you’re talking about a million files (32 bits of hash is not enough).

The other option is to look at the modification time of the file. Software can and does manipulate the modtime, however, and you might miss changes. Users might change the system time and confuse your sync program (if a change was made a long time ago). You might not be syncing to a device that has a real-time clock (some mobile phones, notably). You also have to sync the times between the two systems, but that’s not too hard.

Aaaanyway, the gist of it is:

  • Checksums: reliable, slow (you have to read the entire contents of every file)
  • Modification time: less reliable, much faster

Some filesystems such as JFFS2 keep a revision number on each block (roughly). If the revision number goes up, you can be assured that a write has happened regardless of what the modification time says. This is not a common feature, however, and probably not accessible to userspace programs anyway. There’s no easy solution here.

Next up: It still sucks. How to make it usable.

File synchronization algorithms, part 1 of lots

November 21st, 2007

“wherein I work out my design in public.”

You have two filesystem trees, A and B. You want the files on both sides to be the same.

Cases that you need to handle:

  • File exists on A but not on B (and vice-versa)
  • File exists on both and is identical
  • File exists on both and is different

Right about this point in time, you’re in trouble. (That was fast!) Only one of those situations can be handled automatically, and that’s if the file is identical on both sides. You need a lot of user input to figure out what the directories should look like, and users tend to say “too hard!” Unison assumes that if a file is present on one side and not on the other, it has just been created. So it copies it across. Already we’re in dangerous territory because this is frequently not what you want to do.

If the file exists and is different, you have to ask the user how to merge them or which one to pick. Asking regular users how to merge files is a bad idea. (Asking developers how to merge files is usually a bad idea.)

Sigh.

This algorithm is not going to work very well. It doesn’t handle any common cases, makes a lot of mistakes in its assumptions, and asks users too much information (which will probably be wrong anyway). Anyone using this algorithm in their synchronization product (*cough* Microsoft *cough*) is going to have a lousy product.

(Don’t get me wrong. I like Office. I like many Microsoft games. I’m not anti-Microsoft at all. It’s just Sturgeon’s Law: 90% of everything is crap.)

Unfortunately, this case is unavoidable on the very first synchronization of a pair of trees. We have no history data - even disconnected history data - and so cannot make informed decisions about what’s new, deleted or changed. The files just are or they are not and we can’t say which of the two trees is correct.

Read the thrilling (!) algorithm in Part Two for a possible solution (!!!11!)

Status report, 21 Nov 2007

November 21st, 2007

The day job is giving me serious grief, so very little project work is getting done. I’m stressed out and spending a lot of time resting. I do badly need a holiday, but if I wasn’t working in this environment I wouldn’t need a holiday nearly so much.

I’m doing a lot of thinking and planning for the file synchronizer - it’s my biggest ongoing computing problem right now - but not a lot of code is getting written. I’m also a bit nervous that I won’t have such a need for it in a few months and hence will lose interest in the project. I’d like to get something working ASAP, but the day job sucks up all of my energy. Without the day job I won’t need the synchronizer so much - so it’s a bit of a Catch-22.

I have Rentacoder and Elance bids out for some filesystem traversal code. I’m disappointed so far. I’ve been bid between USD$80 and $350. I sat down in front of the TV today, watched an episode of Daria and an episode of Futurama and had it done. Half of that was remembering that Python’s sqlite module doesn’t automatically commit to the database and wondering why there was no data in the database. Plus the hour I spent writing the spec and the hour I spent setting up accounts and the $10 Elance setup fee. There are a few bids which are attractive because the programmers obviously know their stuff, but I just can’t bring myself to spend $200 on an hour-long job which I’ve already finished.

There’s an obvious flipside here - if Rentacoder and Elance bids are so high, I can probably earn a bit of pocket money through them. Earning money by bidding on online jobs is very attractive to me. Of course, people go on there expecting to pay the coders crap. It might just be a matter of sorting through the bid requests and picking ones that I can do profitably and ignoring the requests with very low prices.

On the upside, the USD is in the shitter so stuff is cheap for me to buy. On the downside, one of my income sources pays in USD, so that’s suddenly become a lot less valuable.

I’ll be moving to Melbourne in January, right around linux.conf.au. I’m pretty excited. Rents are so much cheaper than Sydney, so it’s likely that I’ll be able to afford something with a nice view. Rent is still my biggest ongoing expense. Once I cut out expenses for crap cafe food at work and car fuel+maintenance (urgh! the damn thing keeps breaking!) it’ll account for 60% of my expenses. Living lean has its advantages.

Just thinking about the file synchronizer is useful at the moment, though. There’s a lot of unknowns that I haven’t worked out yet - it’s the sort of problem that can be described succinctly, but working out all of the intricacies of the solution (and a UI to suit) is fairly complicated. I’m torn between working out every nuance or just jumping in and getting something 80% working. The first milestone will be to write something that can replace Unison, and I’ve got that pretty well worked out. It’s just unsuitable for later features that I want to add (backups, Time Machine and flash-drive metadata exchange).

S&M (Sales & Marketing) and other things I swore I’d never do

November 13th, 2007

I used to say “I’ll never get involved with fundraising for a startup. I’ll write code, support users, write marketing material, even sell stuff. But I won’t go begging for cash.”

I’ve started begging for cash. I’ve been looking at government grants. This is just the latest in the a saga of Things I Swore I’d Never Do But Am Now Doing.

It’s a slippery slope. You start out as an elitist coder, laughing at those sales and marketing jocks and how useless they are. I mean, they speak in terms of synergies and paradigms! It’s the NEW PARADIGM that creates SYNERGIES to increase our ROI and improve RETURN RATES so we can INCREASE INVESTOR CONFIDENCE.

Oh yes. We did laugh. And now, such a statement makes perfect sense.

Why do coders start out with such a negative attitude towards sales and marketing? I recall my two big complaints being “they don’t do any real work!” and “how come they get commmissions and I don’t get squat?”

Of course, I’ve learned now. The work is just as real - it’s just in a different form.

There’s a big cultural difference. Sales, in particular, is about building relationships. Developers generally find this sort of approach disconcerting - I think we’re a distrustful bunch in general. So when sales guys come up and are all friendly for no particular reason, we’re sitting there thinking “What is this guy up to? Why is he being my friend?” The sales guy is just being a sales guy.

As for commissions, I think there are two factors here. Salespeople have always received commissions. That’s just how it is. Additionally, them reaching their goal is very easy to detect and measure - they get a purchase order (and preferably some cash). If I fix a dozen bugs, that doesn’t obviously result in increased revenues for the company.

This doesn’t work perfectly in the real world, of course. If programmers were paid by commission - say, they earn $5 for each bug that they fix - then they’d probably start writing a whole lot more bugs. Likewise, salespeople will aim to get just the incentive instead of looking at the bigger picture. They’ll start making crazy claims in order to get a sale at any cost because they’re not the ones who’ll have to make it happen. They’ll get the sale, but it’ll cost the company in the long run due to misallocated development resources.

I think for a second that maybe you could pay programmers for each external support call that they resolve - that way, they can’t just generate cash on their own. But then, they have no incentive to avoid the bugs in the first place - they’ll just ship them out, wait for the support calls to roll in and then ship out a fix. So you want to be paying them for not creating bugs - maybe start with a $500 fixed bonus and dock them $5 for each bug that comes up. Then you’ll see bugs not getting reported and some really cranky programmers. You’re now penalizing them for being imperfect. Most programmers I know already beat themselves up when a bug arises - docking their pay isn’t going to help at all.

I have seen sales commissions work against salespeople, however. It seems to be widely accepted that if you are going to get commissions you’ll also get paid less base salary. This is fine if you’re good at sales and not so fine if you’re bad at it. It also means that the quality of the product you’re selling and plain old luck has a big impact on your livelihood. Happen to find someone who already loves your product and wants to buy a thousand of them? Great! You didn’t have to do any work, but you profit just the same. Some doctor at a medical centre gets regular cash from me due to a referral she wrote a year ago - because she happened to be the first doctor available on the day. In this sense, the commission isn’t really earned.

So now I’m a developer who is familiar with the languages of sales and marketing and business. Just being able to understand what !developers are saying comes in handy, and they definitely notice. Being the optimist I am (hah!) I keep thinking “what’s the worst that could happen” with my business, and being able to communicate well with everyone in an organisation is one of the big plusses.

Filesystem annotation and things that suck. Also, procrastination.

November 7th, 2007

Today was a ‘designated project day’. I’m supposed to do no client work - just product development.

I didn’t get much work done.

I seem to have spent the entire day doing admin. This is a good thing - it’d been piling up quite badly over the last few months. I found a forgotten credit card statement. I shuffled a lot of money between accounts. I still haven’t filed all of my invoices or done all of my phone calls. I still haven’t attacked the pile of dishes in the sink - there’s been literally nothing clean for about three days now.

I need a personal assistant or something.

I did bolt on a new feature to the planner - you can now set reminders on tasks. It’s bolted on in the most literal sense possible - I added a new column to my task tree and display the reminder date there. It makes absolutely no sense from a UI point of view, but it solves my immediate problem.

I did have the chance to look over the code again with fresh eyes, seeing as I haven’t touched it in weeks. I’m embarrassed to release it publicly. It works, but the design is so awful that I’d be afraid a future employer or client would see it. It looks an awful lot like a VB application I wrote back in high school. Not good.

I’m also having immediate problems with file synchronization (surprise, surprise) and am eager to start on that. Unison is sucking a whole lot less now that I’ve completed a synchronization (previously, it would crap out because it was taking too long) but it still sucks in fundamental ways, such as not being able to sync multiple directories at once. As one might commonly do, given we have lots of stuff.

Tracker is also sucking, but a lot of people have complained about that. Sorry, Jamie, it’s broken. ‘Churns disk constantly’ is not an acceptable characteristic for any piece of desktop software.

Google Desktop for Linux seemed to suck a lot less in the constant-disk-churning department, but it also did something nasty that made every file access slow. And my Thunderbird would just lock hard for thirty seconds at a time every few minutes. Probably while it was reindexing. Sigh.

This is a weak segue into a more general problem area that I’ve been thinking about which is filesystem annotation. Essentially, what Tracker is doing is reading each file and creating some sort of index. Presumably, it also checksums each file so that it knows when it’s changed. It also sets inotify(s) so that it gets a message when files change.

Well, so does a file synchronizer. I just want to know if a file has changed, which means checksum. If we were sensible and using filesystems that actually knew when disk corruption had occurred like ZFS, I could just query some sort of checksum metadata in the filesystem, in a completely non-portable and non-futureproofed fashion. It would be fast, though. Rambling aside, I want to perform some actions when a file is changed, and I want to keep a little bit of data alongside each file.

Incremental backups (and rsync) have the same problem. You just want to know which files have changed. You don’t want to touch every file on the system. Making things even worse, Linux systems enable atime by default, which means that you’re writing to every file’s metadata every time you traverse the filesystem.

(You can’t just look at file modification times because a) lots of applications don’t change them, and b) they differ between non ntpdated systems. It’s a good way to get your files out of sync without realizing.)

And every one of these applications has the attribute that you want to traverse the filesystem as often as possible, either for reliability or accuracy.

Synergies are brewing here, I tells ya.

Basically, traversing a filesystem is dumb. But we have no better mechanism right now. We want to add file metadata after the fact and the filesystem designer has no way to know what you might want to add and when you might want to add it.

Because of this fundamental problem, I am without a good backup system. I am without file indexing and I am without file synchronization. Multi-gigahertz quad-core processors don’t help with any of that.

How do we fix this, then? (I’m more or less out of premeditated content here and into the exploratory). I think that if each of these applications could get a list of files that have been modified and the datetime on which they were modified, that would help a lot. I assume something like this has to be baked into either the kernel or libc; maybe Windows has an API for this already. I do recall Google Desktop sucking a lot less on there. The list would need to be expired at some point; you can’t track every modification forever or you’ll run out of RAM. Filesystems can probably do this much more efficiently as well, given they already track modification time. And you still might have to do a full rescan in case something modifies the filesystem while your program isn’t running.

If I had my way, it’d be a filesystem option. Every block or inode is checksummed; the block checksum and modtime are stored side-by-side and there’s a fast way to query these checksums/modtimes. A full dump would probably be too slow (multi-gigabyte Maildirs seem to be a good way to show up performance problems in all sorts of software), so a list list of recent changes would be good too. It has the RAM problem, however. Keeping a queue in RAM is more desirable than acting on them immediately, especially in these sort of applications where timeliness is good, but overall system performance is more important. You do the processing while the machine is idle.

Unfortunately, I’m at the limits of what idle speculation can gain me, and more to the point I have to solve these problems using the systems we have now, not some magic pixie-dust filesystem with freaking checksums (come on! seriously! how could every common filesystem in existence not checksum its data! do you not care about reliability at all?) So it looks like inotify, ionice and load detection for now. Undoubtedly, once I start building this stuff I’ll start to understand the problems better.

Back to the dishwashing.

File synchronizers suck! Guess what, I’m procrastinating!

November 2nd, 2007

Yup, I’ve been ’slacking off’ in the sense that I haven’t worked on any business projects. Saying I’ve been snowed under with client work sounds like an excuse - I’ll always be snowed under with client work. There’s more of it than I can handle. The market economist in my head says that I should increase my hourly rate, but so far that’s been met with very poor responses.

I went to the Sydney OpenCoffee meet yesterday and had a blast. Finally, entrepreneurial types in Sydney! There was a lot of talking about mentoring and quipped that I just needed someone to kick me in the ass and tell me to do some work. Seems I’m not the only one with that problem.

I’ve decided what project 3 will be. A file sychronizer. Yup, another one. The problem that has become sorely obvious in the last couple of days is that all current file synchronizers suck. They either:

  • use timestamps to detect changes. This isn’t particularly accurate.
  • use checksums to detect changes. This is accurate, but a synchronization takes forever if you have any reasonable amount of data. My typical ‘working set’ is on the order of 4 gigabytes across a few hundred thousand files. What I want is to just sync my entire home directory, and that’s around 40 gigs.

I tried out Unison and jfilesync. Unison seems to work alright but is very slow - slow enough that I haven’t actually bothered completing a sync yet. jfilesync craps itself if you have non-files in your directory (such as named pipes) or if you have directory loops (such as many build environments) or if you breathe on it wrong, so that’s out too.

I’ve been doing the incremental rsync backup thing for a while and like it. I’d like my synchronizer to be able to handle backups too.

I don’t know how I’d achieve both performance and accuracy, yet. Obviously, checksumming every file on the system is not acceptable on a regular basis. That’s the sort of task a filesystem should be doing. Scarily enough, ZFS is the only modern filesystem which performs end-to-end checksumming; I consider that something of an outrage considering the abundance of CPU power we have nowadays. I’m hoping something like FAM can tell me when files change without requiring me to poll them or install magic kernel modules. I’ll do kernel modules, but they’re a massive support burden.

Stepping back for a moment, what I really want isn’t file synchronization. I want my working environment to be wherever I am. Carrying a laptop and doing all of your work on there solves a good chunk of the problem, but laptops are physically limited in what they’re capable of.

Location awareness is a hassle; I’m forever changing resolutions and font sizes and performance settings depending on where I am. Software such as MarcoPolo solves some of this by helping the computer figure out where it’s located automatically.

I like Philip Greenspun’s idea for using a mobile phone as a computer; you take your phone, plug it in at one location and all of your stuff pops up. Programs run on the larger CPU at your location. When you unplug, everything is somehow still on the phone, you take it to another location, plug it in, and everything pops up again. The technology is part of the way there; what’s still lacking is probably storage space on the phone. It’s certainly solvable if you paid enough money.

An idea I’ve been kicking around for a while is ‘process hibernation’. When you hibernate a process, its memory image and external references (files, sockets, etc) are saved to disk. You can then take that disk image, stick it on another computer and fire it up again exactly as if it never stopped running. That would be convenient, especially if you could do it for an entire X or Windows session.

You can sort of do it with X forwarding, but that’s too slow over a network. You can sort of do it by running out of a VMWare session saved to removable storage, but that’s too slow as well. I’d rather just carry a laptop.

I really have to get some work done on the planner, too. My ‘next task’ for a while has been to get the Windows version to work using py2exe, but I might push that back to do a Reminders module and a Scheduler module, given I actually need them now and don’t personally need a Windows port for the foreseeabe future.

Instead of task tracking, how about a software development workbench?

October 24th, 2007

I haven’t actually done any work on these projects for a week or two, and I’m going to fall back on the old excuse: I’m too busy at work. The deal is that I finish a feature and then I can go on holiday. I desperately need a holiday. So I’m working hard on the feature.

I’m planning to be in Brisbane for BarCamp in December, Melbourne for linux.conf.au in January, then hopefully New Zealand to do some mountain biking in March. There. I said it. I can’t back out now.

I’ve been thinking about the products I have in mind, their size and how I develop them into things I could actually charge money for. I doubt that I could charge for the software itself - that wasn’t really my intent to begin with. It’s more add-on services, things like online sychronization and backup, multi-user versions, reporting and so on. Hook people with the free software and get them on the extras.

So, the planner.

I’m going to preface all of my insecure rambling with “even if I sell nothing, I still have some useful software for myself, so it’s still worthwhile”. Expensive, yes, but useful.

I’m having trouble imagining how to get people to actually use the thing. Not even pay for it, just use it. I get the impression that most people are satisfied with, well, nothing. They don’t track tasks. How they get any work done is beyond me. These people are easy to spot: when you say “did you fix XYZ bug from last week?” they’ll say “I forgot”. Of course you forgot! You didn’t write it down! Aaaargh! Then I start writing down their tasks in my task tracking system or filing bugs for them in a bug tracking system in the hopes that it’ll be remembered.

Some people use a text file or Outlook/Evolution’s todo lists. Some people use bugtracking systems as their task lists. I haven’t met many people like me that use hierarchical task lists. None that I’m aware of. Moving people to any sort of task tracking is a big improvement, so text files or TaDaList might be a good step. It’s hard to justify the overhead of a hierarchical task thingy.

Phrasing the hierarchical feature in terms for the customer, I’m thinking something like “You can drill down to see detail or pull back to see the big picture - in seconds!” It’s still awkward and not terribly compelling.

What if I change the focus from task tracking to project management? After all, Microsoft Project is probably the world’s most popular hierarchical task tracker. It’s a bit clunky to use for everyday stuff, but the idea is there. And if I put in time estimation features (even Evidence Based Scheduling - sorry, Joel!) I’m halfway there.

How about a ’software workbench’ or ‘development dashboard’? A central place to monitor all of the things happening in your software project. Bugs, schedules, requirements, tasks, reminders, releases, support calls, documentation, wikis, builds, costings, blah-de-blah. I’d start by having all of those external data sources (like bugs) tie in to the existing systems, but down the track I could build new ones as well (like Trac) - although the thought of building another bug tracking system isn’t exactly appealing. I’m thinking of an Outlook-style interface with the modules listed on the left-hand side, probably implemented as plugins. I do like my rich-client interfaces, but this could make more sense as a webapp. The FogBugz sales model could work - a license gets you the code to run on your server, or we provide a hosted service for a per-month fee. I see it (perhaps naiively) as a ‘playing the numbers’ strategy - one of those modules has to be critical to someone, and that gets the sale.

But then, what happens to my humble little task planner? Relegated to a small module in an all-singing, all-dancing Broadway production. It might be something you pull up once per project so you can build a Gantt chart. It’ll be lonely, and I’ll be sad.

Or I could just let the modules be separable from the main dashboard, free to live their independent lives without the emotional baggage of the others.

Planning the ideal working environment for my Micro-ISV

October 20th, 2007

Here’s what I want. Future employers, take note!

  • Air conditioning.
  • A couch to lie on when I’m stuck on a problem. This happens a lot, and relaxing is the best way to work through it.
  • People to talk to. I’ve done the work-from-home thing. It’s OK, but it gets lonely fast.
  • Lots of desk space. Three square metres is the absolute minimum. 6-9 would be ideal.
  • An absolutely silent environment. No computer hum. No people talking. No cars revving or police sirens.
  • My lovely LSK F6 MKII speakers and permission to run them as loud as I want. Headphones are tolerable if I must.
  • Several computers, at least one of them a laptop. Probably one Linux and one Windows. An OSX machine would be ideal, too.
  • Multiple large monitors. 24″ is a good starting point.
  • Natural light. But I should be able to control the amount of natural and artificial light.
  • Plentiful food and snacks within the workplace. If I’m in the middle of something and hungry, I don’t want to have to go outside and find something.
  • Good food nearby, both cheap takeaway and nicer places for entertaining.
  • Outside the office should be quiet and pleasant. No traffic or pollution.
  • I should have a nice big window with a view.
  • To get there, a 10-15 minute walk is ideal. Less means that I don’t have proper separation between home and work. More is annoying.
  • I shouldn’t need to mention high-speed, reliable, unrestricted Internet access.
  • A source of good coffee without having to go outside.
  • I’d consider a Wii or XBox or something, but I’d be likely to abuse it. A good bookshelf would be nice, though.
  • Ooh, and a pony.

Most of these desires are merely financial. I can buy a couch, espresso machine, have a well-stocked fridge, big monitors and so on. One advantage of working from home is that a lot of these desires are taken care of already - I have the espresso machine, couch, food and so on.
Some of the desires conflict with each other and will be a compromise. For example, it’s unlikely I can have a nice quiet area with lots of good food. Good food tends to attract people, and people are noisy. Google famously solves this by providing meals.
I use a 24″ Dell monitor at home and it’s great. I really only need one of these, plus a 19″ on the side for the secondary computers. Of course, if someone was to offer me a 30″, I wouldn’t be complaining…
There are a few options I’ve considered:

  • Work from home. The big problem here is distraction and loneliness. It’s nice to separate work and relaxation instead of having them all in the one place. There’s usually no-one else around, so you have to make an extra effort to socialize. It does have the major benefits of being cheap (don’t have to pay bills twice or buy two lots of stuff), convenient (roll out of bed and flick on the computer) and relatively nice.
  • Executive suite. This seems like a good compromise - a proper office, people around, hopefully a door that closes. It is expensive, though, and unlikely to be as well equipped as a home office. You could always put the money from an executive suite into your residential rent and get a really nice place to live.
  • Share office. Great for socializing; not so great for getting work done. Less expensive than the executive suite.
  • Work in cafes and restaurants. Probably the cheapest out-of-home option, so long as you can find a venue that’s happy to have you there for eight hours per day. You can’t have things like big monitors, though - you’re stuck with a single laptop.

I suspect that a combination of the above might work well. Right now I’m attending an office four days/week for my day job and working at home for another day or two. I get some of my socializing done at the day job and get a lot of work done when I work from home.