Monday, December 10, 2007

Best Practices: End Dating instead of Deleting

One of the things I've gotten out of one of my business parters in Vsched (that online employee scheduling software I'm working on), is a bunch of best practices relating to managing lots of data. One of these is using start and end dates on critical data, rather than actually deleting this data.

What my partner suggests doing is to use start_date and end_date fields containing effective-as-of and effective-until timestamps. Then, if you need to delete this data, you UPDATE the end_date field with the current timestamp instead of using a DELETE query.

It does make queries over this data more complex: you need to add another WHERE predicate (e.g. start_date >= NOW() AND (end_date IS NULL OR end_date < NOW())). So that might be an issue, but hopefully you're looking at best practices before you implement, and you can always worry about perf later, right ;)

So why would you want to do this? Well here are a few arguments:

This will provide your system a better audit trail. You want to know who did what to your customer data and when they did it. If you rely on the method described in my 5Ws of database design post, you'll lose that trail when you do your DELETE. But if you just UPDATE that row appropriately, you're safe.

This will also allow an undo feature for those nasty delete features users claim to want. I say "claim", because my users claim to want to delete their data. But I know as soon as I add the feature, I'll get a bunch of them accidentally deleting data. And what will they do when they delete it? They'll curse their bad luck. Then they'll email my tech support list. Then they'll send me an email. Then they'll call me. Right when I'm about to sit down with friends and a drink. So much for that evening. But wouldn't it be better if they could undelete their own data? Yes. It would be better.

Well, as a matter of fact, I've got such a delete feature. But I didn't implement it with DELETE. I used the above end_date scheme. And it works great. Except I forgot the front end :). So when that accidental delete happened, I still got the call. But I could check the end_date and last_updated_by fields. Long story short, I found those 300 deleted rows. And with a single UPDATE query we were back in business.

Thursday, December 6, 2007

Best Practices: The 5 W's of Database Design

My company's hosted employee scheduling software started with a prototype, which worked pretty well, but just wasn't ready for prime time. So my partners and I revisited our design documents, rewrote large portions of the code base, and rebuilt the database, almost from scratch. It hurt, but it was necessary to create a production system. In doing this, we followed some best practices. One of them is implementing the 5 W's of database design: who, what, when, where, why.

Ok, maybe we didn't quite get five Ws out of that. We've got a logging and audit system to cover where and why. But we do have who, what, and when. And it's come in handy a few times.

Every table in our database has a few special fields:
  • Creation Time
  • Created By
  • Last Update Time
  • Last Updated By

These four fields are on every tuple, and our application logic updates them appropriately. Once a user logs into our system we've got a user id. Any time he or she updates a piece of data, we store that user id and the current time. We've got a couple of features in the work to provide a front-end for this kind of information. But it's also helpful for debugging and providing support.

Earlier this week I rolled out a beta feature. Shortly after the rollout, I got an email from one of my users. There was some unexpected data floating around the system. After getting the user's permission, I took a closer look.

What had happened is that the new beta feature, let's say it was an easter egg painter, had mis-painted some eggs. Eleven out of twelve eggs had correctly been painted blue. But that twelfth egg had turned red somehow.

After groaning—this is the kind of bug that takes a lot of investigation—I dug a little deeper. I checked out the creation date of the eleven blue eggs. All of them were from just a couple of hours earlier, as expected. But that twelfth, red egg had been created a week earlier.

So I sent my user a friendly reply to her panicked email. I asked her if she had happened to create any red easter eggs last week, without the automated painter. And if so, had she accidentally mixed her blue, automatically painted eggs in with those red ones. About fifteen minutes later I got a reply saying, "oops, my mistake. Thanks Nick!"

Best practice to the rescue. Problem solved, without writing a single line of code. Well maybe just a little SQL.

Wednesday, December 5, 2007

How to Protect Against Cross Site Request Forgery

When dealing with security, I try to stick to tried and trusted practices since security is such a delicate topic. I'm not making any claims about the scheme I describe here. I'm only opening up a discussion. One of the security issues I'd like to address is cross site request forgery (CSRF).

A CSRF is an attack where one site directs a user to another site in such a way that the second site thinks the request originated on a page from itself. To illustrate, suppose I put a link here with its href like so: http://www.example.com/​?c=DeleteAccount. If example.com isn't doing the right thing, and you click that link, then your account at example.com might be accidentally deleted. And the fact that example.com password protects your account won't necessarily help here if you're logged in in another window when you click the link. example.com has failed to adequately protect you.

So let me propose a scheme to address this vulnerability, and you can tell me what you think. Suppose example.com were to sign the request strings for the urls for its sensitive actions. That is, suppose instead of allowing the above url, it were to use this one: http://www.example.com/​?c=DeleteAccount​&k=SomeSig. Here SomeSig should be a signature of c=DeleteAccount (or perhaps the whole url).

Now a clever attacker would just have to get an account with example.com, find the delete account url, and grab the url (including the unforgeable signature). The problem has not changed at all. The attacker can just craft a forum post and wait for users to delete their accounts (or transfer funds to him/herself).

So let's ditch the signature and add an expiry to the url: http://www.example.com/​?c=DeleteAccount​&e=Soon. Here Soon is a timestamp after which you'd like to invalidate the url. Many sites log users out after ten or fifteen minutes, so pick something good inside of that. If you get an expired url, you can always have a warning that the url has expired and ask the user to click a new (similar, but updated) url. The idea is to force the user to understand what is about to happen.

Now if the attacker copies the url into a forum post the link will only be valid for some short time. Of course, the attacker can just update Soon to EndOfTime and we're back at square one.

But if we combine these two approaches (and add a nonce to make cracking the signature more difficult) we're a little bit better off: http://www.example.com/​?c=DeleteAccount​&e=Soon&n=Nonce​&k=SomeSig. Now we're signing the command and the expiry so that neither can be forged.

Of course, attackers can just keep going back for updated urls (or have a bot do it for them). But we've at least we've reduced the problem (unlike the previous two attempts).

The issue here is that we're continuing to trust an untrusted source. We have a trusted url (which can't be forged), but it only says, "delete account within my expiry". But what account should be deleted? We're assuming that we should delete the account of the user currently logged in. That makes some sense (we might not want to allow users to delete arbitrary accounts). But our unforgeable url makes no claims about which account to delete.

So let's have the url assert that too: http://www.example.com/​?c=DeleteAccount​&e=Soon​&n=Nonce​&u=UserID​&k=SomeSig. Now when you get this url, check that it's signed correctly, that it hasn't expired, and that the current user matches the user the url was created for. Our url asserts all of these things. And we can trust that all of these things are true, since the url comes with our own signature.

Again, I'm not making any claims that this "solves" the problem. This addresses some aspects of the problem. Feel free to correct any mistakes I've made; that's the point of this blog post. For instance, if an attacker does obtain this "unforgeable url", he or she can still embed it in a blog post and persuade a user to click the link within the expiry. At that point someone still loses an account. Ultimately a CSRF is still possible under this scheme. And there are probably some other weaknesses to the scheme as well. And I'd love to hear about them.

Anyway, I like this scheme so far. Mostly I'd like to use this scheme for 301 redirects after a form post-back with a confirmation: I'm trying to protect against forging the confirmation dialog. But the initial form post-back is just as vulnerable and should be protected also. What do you think?

Tuesday, December 4, 2007

Vsched.com Screenshots

I hope my last post whetted your appetite for the work I'm doing in online employee scheduling software. Now I want to show you a little bit of what I'm doing. So I've got a few screen shots to show off.

Of course, the heart of the system is a shift schedule manager. Above you can see a screen shot of part of a user's weekly schedule. I've covered up some of the info to protect the innocent, but you can see that different entries are color coded corresponding to their type (availability, preferences, work shifts, etc.). You can also see a variety of options available for a work shift, including finding a substitute to take the shift.

Here I'm adding a new unavailable time to my schedule. Our schedules support click and drag, just like any modern scheduling application.

Above you can see a nice summary of what kind of schedules are in place at the different locations and jobs. As you can imagine, work schedules need to change over the year, for example you might need more people to cover registers during the holidays. Also different locations will have different schedules.

This one is a new feature I'm working on right now to import users from a spreadsheet in CSV format. Loading employees into the system is one of the first things our customers do, so we want to make sure it's as easy as possible. You can also see a small bit of instruction in the green box. You'll find these throughout the system with helpful tips and reminders. Clicking on the question mark in the upper right (also available throughout the system) brings up some more in-depth context sensitive help.

We've also got some reports which give you aggregate information about your schedules. Here you can see who can take more hours on their schedule and how much of their schedule they actually wanted.

I'd better get back to work on that user import feature. I estimate around 1000 user clicks for the average setup without it! We're trying to make the lives of our customers easier. We don't want to replace one chore with another.

Monday, December 3, 2007

Introducing Vsched.com

I've been a bad, bad blogger for the last couple of months. But it was all for a good cause. I've been working (very hard) to get my new internet start up off the ground.

Vsched.com offers on-line employee scheduling software. We went live with our first customer, Cornell University Fitness Centers, over a month ago and everything has been running very smoothly since. They love the service, and we love having them. In the past couple of weeks and from here out we're bringing on-line other customers. So if you're interested, or know someone who schedules many employees, with many shifts, in different kinds of jobs, have them send us a note.

I'm going to try to blog more frequently about Vsched now that it's out. So let me begin by explaining exactly what it is we do.

Imagine that you've got dozens of employees with hundreds of shifts across different jobs and locations. Managing that many employees is a nightmare. You might be spending anywhere from 200 to 2000 staff hours a year doing scheduling. I know, I used to manage 90 student employees at Cornell. Creating shift schedules, keeping them up to date, and handling shift swaps is a real hassle for managers and employees alike. So my partners and I have put together an on-line application which automates these processes.

Managers can create and assign shifts with a click of a button, as well as get schedule reports and overviews. We make the right information available in all the right places, so you can find a substitute for a shift, or see a location or employee's weekly schedule with a single click. Employees can log in at any time from any where to update availabilities or schedule preferences, and to swap shifts. And the most current shift schedule is always available on-line. We even integrate with other calendaring applications such as Google Calendar to publish shift schedules to an employee's personal calendar.

One big feature I'm excited about is the automated scheduler. This kind of scheduling problem is very difficult. Computer scientists call this kind of problem "computationally infeasible". While I was a graduate student at Cornell University I spent a lot of time studying this kind of scheduling problem and came up with an algorithm that does a pretty darn good job. Cornell Fitness Centers tells me they anticipate cutting out 125 staff hours per schedule using the algorithm. So I'm pretty excited about that.

Another big feature I'm excited about is that the system is a hosted service, completely on-line. I was speaking with one customer who bought some boxed software over six months ago and still can't get his IT department to set it up for him. And I don't blame them. Maintaining a server, or client software is a hassle. With a hosted service there's no need for IT infrastructure; we take care of all the technical details.

Anyway, this is what I've been up to for the past month or so. I'm very pleased with the work my partners and I have done. I'm really looking forward to the next few months as we grow our customer base. I'll try and keep you posted on our progress.