One of the things I've gotten out of one of my business parters in Vsched (that online employee scheduling software I'm working on), is a bunch of best practices relating to managing lots of data. One of these is using start and end dates on critical data, rather than actually deleting this data.
What my partner suggests doing is to use start_date
and end_date
fields containing effective-as-of and effective-until timestamps. Then, if you need to delete this data, you UPDATE
the end_date
field with the current timestamp instead of using a DELETE query.
It does make queries over this data more complex: you need to add another WHERE
predicate (e.g. start_date >= NOW() AND (end_date IS NULL OR end_date < NOW())
). So that might be an issue, but hopefully you're looking at best practices before you implement, and you can always worry about perf later, right ;)
This will provide your system a better audit trail. You want to know who did what to your customer data and when they did it. If you rely on the method described in my 5Ws of database design post, you'll lose that trail when you do your DELETE
. But if you just UPDATE
that row appropriately, you're safe.
This will also allow an undo feature for those nasty delete features users claim to want. I say "claim", because my users claim to want to delete their data. But I know as soon as I add the feature, I'll get a bunch of them accidentally deleting data. And what will they do when they delete it? They'll curse their bad luck. Then they'll email my tech support list. Then they'll send me an email. Then they'll call me. Right when I'm about to sit down with friends and a drink. So much for that evening. But wouldn't it be better if they could undelete their own data? Yes. It would be better.
Well, as a matter of fact, I've got such a delete feature. But I didn't implement it with DELETE
. I used the above end_date
scheme. And it works great. Except I forgot the front end :). So when that accidental delete happened, I still got the call. But I could check the end_date
and last_updated_by
fields. Long story short, I found those 300 deleted rows. And with a single UPDATE
query we were back in business.