As the CTO for Geezeo — an online personal finance management application — I'm asked a lot of questions about security. Ironically, I've never personally had many concerns about security. Perhaps it's because I have such an extensive background in security. From installing authentication systems for major banks when I worked at RSA Data Security to writing the document purchasing code for Gartner's website to writing custom firewall solutions in Linux as a consultant for small to mid-sized companies, security has become less a concern for me and more a customary practice.
However, the second most popular set of questions I get from people about Geezeo — after security — are questions about Amazon Web Services (AWS). AWS is Amazon’s technology platform for hosting web applications, storage and more.
People generally want to know how it was that I came to choose AWS for Geezeo, what my opinion of AWS is, what the technical challenges (if any) associated with using AWS are, how well it performs and how much does it cost?
AWS vs. everybody else
So how did I come to choose AWS? We not only looked at, but tried, a few other hosting providers — including Media Temple’s popular Grid-Service product. But soon after getting set up we started having lots of problems and running into limitations.
For instance, server downtime was a big problem at Media Temple. And many hosting providers only offer one type of dedicated server (e.g. you are usually limited to using RedHat Fedora Core X for all your services). Virtual dedicated servers are even worse. Not only can you not change the operating system, but development environments are usually missing or running out-of-date system libraries. This makes it difficult and time-consuming to get third party applications compiled.
I decided it was time to get into something more reliable and flexible. As I was gathering quotes from some high-end hosting firms I started hearing a lot of really good things about Amazon’s Simple Storage Service (S3) and their resizable computing platform called the Elastic Compute Cloud (EC2). It was in closed beta at the time but one of our co-founders, Pete Glyman, emailed Amazon about gaining access to the beta and we were in the next day.
The basics of EC2 and S3
The idea of EC2 is simple — you first create a new server, then you modify the server to meet all your specifications, upload your website to it and get everything up and running. You can then create a disk image based on the server you just configured. What does that give you? Well, you now have a bootable image that you can use to boot as many new server instances as you desire.
This is the “elastic” part of the “compute cloud”. Whenever you need more juice, there it is, simply turn it up to 11. Then, when you no longer need it, no problem, you can turn it right back down again.
Amazon's S3 storage engine compliments this service perfectly, because just like having unlimited computing power, S3 gives you unlimited storage copacity. Used together with a reverse proxying load balancer — a service that sits in front of your website and routes web requests as they come in to your Amazon servers — you never have to worry about scaling your web site. Amazon has literally solved that problem for you.
Security doesn't need a fancy acronym
Another problem that EC2 solves for you is security. By default, all ports on an Amazon instance are closed. It's up to you to open the ports you need for your web application. This makes it more difficult for users to accidentally leave ports open. Amazon also allows you to create security groups — a set of instances that have the same security configuration — for easy and fast deploying. So you could have a web server group that has port 80 open, and a secure server group that has port 443 open. Then when you create new instances you simply choose which security group they will belong to.
A small challenge
The only real technical challenge with getting AWS set up was using it to host our database. The reason being because the data on a running EC2 instance isn't persistent. In other words, if your instance crashes, your data is gone.
So, in order to host a database on EC2 we replicate our data off to S3 every hour. This means the most we lose in the event of crash would be one hour's worth of data. And we’d be back up and running within minutes of a bad crash thanks to EC2’s ability to immediately start and run instances that are pre-configured with all of our software already running.
That's what makes AWS so great. Anytime you want or need to replicate your server, you simply run a new instance. Say your website gets “dugg,” you could literally, within minutes, add X number of servers to your environment to accommodate the traffic, then scale back down later if need be.
Tools of the trade
There have been some great tools built for using S3 and EC2. Most notably the Firefox plugin EC2 UI. It puts a user-friendly face on EC2 administration, which can really come in handy:
There's also a Firefox plugin called S3 Organizer which does the same for, you guessed it, S3 organization:
Many other companies have been using AWS but one of the bigger names is 37signals. They're hosting over one terrabyte of data in Amazon's S3 for Basecamp and Campfire and estimate that they’re saving thousands upon thousands of dollars per year.
As for Geezeo, we've probably saved two or three thousand dollars in just the few short months that we've been using AWS.
Using AWS isn't for the faint of heart. You need to have a very good understanding of Unix, you need to be able to install a Java runtime environment on your local computer in order to use the Amazon command line tools and you need to do a good amount of reading in order to get things initially setup.
But if you’re up for the challenge and want a reliable, flexible and secure infrastructure — it’s certainly the way to go.