Scalable Media Hosting with Amazon S3

Scalable Media Hosting with Amazon S3

Created on:
Nov 27, 2007 4:25 PM


Why slow your web server down
by hosting media files? Craig Noeldner and AWS Evangelist Mike Culver
show how to configure your domain provider to use Amazon S3 for simple,
scalable media hosting.

By Craig Noeldner and Mike Culver, Amazon Web Services

Scenario: Imagine you have a small web site with big potential.
You’re currently using a reasonably-priced web hosting provider that
provides a good value for the amount of traffic you normally receive.
Perhaps you’ve gone one step further and are hosting your site on a
dedicated server. However, your site has caught the attention of the
blogosphere and you’re about to get much more traffic than you can
handle in your current web hosting setup.

What are you going to do?

Knowing how to scale your web site can mean the difference between
watching your idea take off or take a dive. A common technique for
scaling a web site is to use a different server to host media files
like images, videos, and audio files. This distributes the traffic and
bandwidth load between hosts and allows the primary web server to focus
on delivering web pages and server-side processing, rather than serving
up 5MB audio files (or even 100MB videos).

If you don’t want to set up, configure, and maintain a few extra
servers just for hosting your media files, then use Amazon S3. Amazon
S3 is storage for the Internet and gives any developer access to the
same highly scalable, reliable, fast, inexpensive data storage
infrastructure that Amazon uses to run its own global network of web
sites.

This tutorial walks through the steps necessary for hosting media
files for your web site using Amazon S3. We’ll use a domain we’ve
already registered, webscalecomputing.info, to set up a new sub-domain,
media.webscalecomputing.info, that will host the images, videos, and
audio files in Amazon S3.

While we won’t go into any programming details for using Amazon S3,
you’ll need to have a basic understanding of web networking and DNS to
read this article. (Or, you’ll need enough background to translate the
concepts to your own hosting provider.)

More on Amazon S3

Amazon S3 provides a simple web services interface that can be used
to store and retrieve any amount of data, at any time, from anywhere on
the web. Generally, software developers use Amazon S3 in their
applications that need the same highly scalable, reliable, fast,
inexpensive data storage infrastructure that Amazon uses to run its own
global network of web sites.

You can always improve your web site performance by moving your
media files from your main web server. This could be as simple as
creating a sub-domain that points to a host that serves your media
files. Of course, you still have to worry about the typical
heavy-lifting for any type of hosting, such as:

  • How much traffic will this setup accommodate? What happens if I get more traffic than it can handle?
  • What happens if the host goes down?
  • How do I backup the files so they’re not lost?
  • How much am I paying for idle capacity?

Amazon S3 provides answers to those questions, without the need for
worrying about the pesky details of, well, implementing them.

The web services interface is simple enough that you can retrieve
data using a URL, so it’s well-suited for basic web hosting tasks, like
serving up media files.

The pricing for Amazon S3 is on a pay-as-you-go basis, so there is
no minimum fee. This means you don’t have to invest in a large amount
of hosting infrastructure or services in order to ensure that your web
site handles the occasional traffic spike.

Use the AWS Simple Monthly Calendar provided by AWS to estimate your monthly bill.

http://calculator.s3.amazonaws.com/calc5.html

Amazon S3 in Action

Blue Origin is one small company with a big idea that successfully
scaled its web site using Amazon S3. On January 2, 2007, the company
posted information and videos on its web site about a test launch for a
new vertical take-off, vertical-landing vehicle. Within the next day,
the news was covered by both SlashDot and Boing Boing, sending a
tremendous amount of traffic to its web site. With its media files
stored in Amazon S3, it was able to instantly scale and handle the 3.5
million requests and 758 GBs in bandwidth in a single day.

Had the company hosted the web site completely on one of its
internal servers, the traffic on January 04 would have overwhelmed
their system capacity. If they had used a basic hosting package from a
popular provider, they would have overwhelmed that service, or—even
worse—exceeded the maximum allowed bandwidth for the month and occurred
massive overage fees.

Blue Origin’s total charge for Amazon S3 in January? Just over $300.

SmugMug, www.smugmug.com, is another company that’s using Amazon S3 for hosting its media files. After 12 months, they’ve saved almost $1M.

Now, let’s go through the steps of hosting your media files on Amazon S3, like Blue Origin.

Signing up for Amazon S3

If you haven’t already, sign up for Amazon S3 at http://aws.amazon.com/s3. After signing up for Amazon S3, you’ll have two access identifiers needed for uploading your media files:

  • Access Key ID
  • Secret Access Key

The Access Key ID is a public identifier, like a user name, that
specifies a particular Amazon S3 account. The Secret Access Key is the
private identifier, like a password, that ensures you’re the one making
a request.

Important: Your Secret Access Key is a secret, and
should be known only by you and AWS. You should never e-mail your
Secret Access Key to anyone. It is important to keep your Secret Access
Key confidential to protect your account.

Uploading Your Media Files

Without going into too many details, Amazon S3 uses concepts of a bucket and object to store data. Buckets help organize a collection of objects, like how a folder might contain a list of files.

There are many tools available for working with Amazon S3 without
having to write a software application. For this tutorial, we’ll use a
plug-in for the Firefox browser, called S3Fox (https://addons.mozilla.org/en-US/firefox/addon/3247). You can also use one of the many code samples and tools available through the Amazon S3 Resource Center (http://aws.amazon.com/resources) or use a product built on Amazon S3 in the Solutions Catalog (http://solutions.amazonwebservices.com).

First, create a bucket in your Amazon S3 account that corresponds to
the domain you’ll use to host your media files. For our web site, we’ll
create a bucket called, “media.webscalecomputing.info”.

Important: Use lower-case letters only to name
buckets that will be used in DNS redirects. This requirement is a
function of the way that DNS handles names (always lower case).

Why use this specific bucket name? Amazon S3 has a virtual hosting
feature that allows inbound requests from a web site, so it will serve
up content from the bucket by the same name. We’ll talk more about this
feature in the next section when we configure our domain.

Next, add your media files to the new bucket in Amazon S3. Using the
Firefox plug-in, it’s as simple as selecting the files on your local
system, then clicking the transfer button.

Amazon S3 has a rich set of access privileges for both buckets and
objects, so make sure that permissions are set on both the bucket and
your objects to allow everyone access. The Firefox plug-in we’re using
sets this for us using a dialog box.

All the media files are now accessible through a URL that points to Amazon S3. The basic URL syntax for S3 is http://<bucket_name>.s3.amazonaws.com/<object_name>, so the files we uploaded have the following URLs:

The simplest way to use Amazon S3 for media hosting is to simply update our web pages to point to these files. For example:

 <img src=”http://media.webscalecomputing.info.s3.amazonaws.com/jeff-at-web20.jpg”/> 

However, when people download our files, we want them to look like
they’re coming from our domain, and not s3.amazonaws.com. If someone
chooses to download our audio file, we want users to think it’s coming
from our site. We’ll now set up our domain hosting so that the files
are available through a URL under http://media.webscalecomputing.info/.

Setting up Your Domain

Since we already host our web site on www.webscalecomputing.info,
we now want to create a sub-domain that we’ll point to the files
located in Amazon S3. This is done by using a CNAME entry on our
hosting provider.

Most popular web hosting companies will let you create a new CNAME
record for your domain. For our hosting company, creating a new CNAME
record consisted of logging into our account, then navigating through a
few DNS configuration pages until we ended up at one that allows us to
create a CNAME record.

To create the CNAME record, we specify an alias, “media”, and the
domain it points to, “media.webscalecomputing.info.s3.amazonaws.com”.

Now, with the CNAME record in place, the media files are now available through the following URLs:

Our web page can now reference the media files.

 <img src=”http://media.webscalecomputing.info/jeff-at-web20.jpg”/> 

That’s it!

Automatically Copying Files to Amazon S3

There are also more ways you can use Amazon S3, including
automatically copying files to Amazon S3. The Resource Center in the
AWS Developer Connection web site has technical documentation, code
samples, and other resources you can use to learn more about Amazon S3
and build your own applications to use the service. As always, the
exact tutorial to read depends on the language you’re using, but here
are a few possibilities.

Language

Tutorials

Java

C#

PHP

Ruby

Learning More About Amazon S3

Why not host the entire web site in Amazon S3 and just use a domain
provider to set up the appropriate CNAME records? Although it’s
certainly possible, you may want to have a web server running to
perform server-side processing on a script or to access a database.
Amazon S3 is a storage solution, so it does not perform any server-side
processing (but check out Amazon EC2 for information on scalable,
virtual computing).

Of course, you don’t have to have a web site to use Amazon S3. Like Jeremy Zawodny, we use Amazon S3 to backup our home computers. (Craig pays just over $1 a month to backup his important files.)

Here are a few links for learning more about Amazon S3:

 

 

Leave a Reply