Using AWS for research computing

2017-05-01

tagged programming, software, research, amazon, aws

Your own private computing cluster?

This is based upon my reply to a question on reddit concerning experiences with using Amazon Web Services (including ELastic Computing, Glacier, etc.) for research.

I was part of a pan-European research consortium that used it for our shared computing infrastructure: databases, a few web apps and web sites, mailers, some ad hoc computation for big projects. Here's my distilled wisdom of the experience.

Pros

You pay for what you use. If you're not using something, you can switch it off.
You can reconfigure and set up stuff when and as needed, not having to worry about waiting for institutional IT, delays with ordering, problems with department policies and so on.
Cloning and copying disks and machines through a web interface was a dream. Need a new machine: click-click. Need to make it bigger: click-click. I set up some "base" configurations, which I used for building individualised systems.
Doing things through the commandline / scripting interface was very powerful. You could write scripts to duplicate and backup systems, to push data and reboot a system, to monitor their state.
There's a lot of smart useful stuff in AWS. I especially loved Elastic Beanstalk: being able to deploy and scale a piece of software (not a machine) without all the fiddly details or the hardware and network surrounding it. The storage stuff (S3 and Glacier) was also funky. You could run an entire website out of static (and cheap) S3.
AWS is robust. You would hear horror stories of companies losing infrastructure and data in AWS but these seemed to be the long tail of experiences and AWS "just worked" around the clock. Anyway, normal resilience procedures should be used just as in any infrastructure.
AWS has solid and easy security (at least as far as I could tell). You could readily restrict at different portals to different IPs and stack these policies.

Cons

Universities and research centers are still struggling to cope with pay-as-you-go computation. They're much more comfortable for paying lump sums for hardware and depreciating it.
Costs are a little unpredictable, especially data transfer costs which are down right opaque. Don't forget to monitor what you're running or you could get surprised by costs from forgotten machines idling along.
There's the occasional issue where institutional IT won't let your external (AWS) system talk to or connect to an internal resource because of policy / security. Which I can understand: research IT is a big enough mess without increasing the complexity of the task.
It's an evolving platform so things change. (Actually changes very rarely broke anything, mostly they actually made things easier.)
There's the occasional thing that is difficult or impossible under AWS. For example, I remember mailing being a problem because mailing required all legal to and from addresses to be pre-specified. (Amazon is rightly concerned about being used by spammers as disposable mailhost.) I got around it, but it was extra work.
There's still a lot of technical details and sysadmin work to do, setting up firewall rules, configuring machines, networking etc. And since this is your infrastructure and nothing to do with the research institute, there's no help to be found. And since this is research, you're probably a postdoc or PhD student and not an IT professional, right? AWS makes it easier to setup and run a research computing infrastructure but it is by no means a simple task.

End analysis

In the end, I moved on and I believe the infrastructure largely migrated to one of the collaborators institutes, largely due to the first (cost) and last (expertise) cons. Still, I consider it a success. There was a lot that was good about using AWS.

Also see

How, when and why do you use Amazon Web Services in Bioinformatics?