Saturday, June 6, 2009

S3 Performance Benchmarks

Over the last couple of weeks we've been working with S3 to read data to power real-time user query processing. So we've made a lot of optimizations and measurements of the kinds of performance you can expect from S3.

S3 Throughput: 20-40MB/sec (per client IP)
20 MB/sec is the neighborhood for many small objects, with as much as 40 MB/sec for larger objects. We're pushing a lot of parallel transfers and range queries on the same objects. Each request is only pushing about 200 KB/sec. I don't think I've ever seen a single connection push more than 5 or 6 MB/sec. I'm assuming this is partly S3 traffic shaping. So this should scales well if you've lots of clients.
S3 Response Time: 180 ms
We're pulling from EC2 (just across the hall from S3?) We've seen response time range between 8ms (just like a disk!) and as long as 7 or 8 seconds. But under 200ms is quite reasonable to expect on average. We're pushing a lot of parallel requests (thousands per second across our cluster, with hundreds on individual machines).
Parallel Connections to S3: 20-30 or 120
20-30 roughly maximizes throughput, 120 roughly maximizes response time. S3 seems to do some kind of traffic shaping, so you want to transfer data in parallel. If you're hosting web assets (e.g. images) at S3 this is less of an issue since your clients are widely distributed and will hit different data centers. But if you're serving complex client data requests pulling data from S3 at just a few servers, you might be able to structure your app to download data in parallel. Do it with 20-30 parallel requests. More than that and you start getting diminishing returns. We happen to run more than that (perhaps as many as 100 per process, with as many as 1000 per machine) because we're focusing on response time, rather than throughput.
S3 Retries: 1
We do see plenty of 500 or 503 errors from S3. If you haven't, just wait. We build retry logic into all our applications and typically see success with even just one or two retries with very short waits. I should recommend exponential back-off (that's what the Amazon techs say in the forums). So if you're making more than one or two retries, start waiting a second, then two, then four, and so on. I'd bail and send yourself an email if you don't get a 200 OK after four or five retries and a minute of waiting. But maybe retry the first one right away, it'll work 9.9 times out of ten :)

If you're getting different results, do let me know :)