NEWS AND RESOURCES

Its All About Choice: Data Storage Products on AWS

Kenneth Johnson / February 12, 2015

Moving to Amazon Web Services offers a dizzying array of new capabilities, and the pace of innovation on the AWS platform is hard to keep pace with for the average user. Take storage for example. AWS offers at least 6 different datastore products:
1) DynamoDB
2) Instance Store (ephemeral EC2 storage)
3) Amazon RDS
4) Amazon EBS
5) Amazon S3
6) Amazon Glacier
How does one choose the best storage option for a given storage need? There is no end to the amount of information and documentation on AWS storage products, but I have seldom seen a 30,000 foot view of how to go about initially selecting a product all in one place.  Many users don’t give it much thought and simply attach EBS volumes to EC2 instances (or worse, use the ephemeral storage on the instance) to store every type of data.   So, I’d like to offer the approach that is used in most Amazon training courses with a little modification and combined with some information gleaned from other sources.

1) Analyze the data requirements for your application. There is no “one size fits all”

a. Data formats (SQL, NoSQL or Object Store)

1. Predefined schema as in a relational database

2. Loosely defined schema as in JSON/XML or NoSQL database

3. Undefined schema as in text documents and pictures

b. What is the size of your datastore?

c. What is your query frequency?

d. How fast do you need to access your data (what is your acceptable latency?)

e. How long do you need to keep the data (retention period)?

f. Are your storing temporary and/or processed data

2) What is the temperature of your data? AWS publishes the following chart to explain the concept of hot and cold data:

Its All About Choice:  Data Storage Products on AWS

3) So now you know your format (SQL, NoSQL, or Object Store), the size of your datastore, how frequently you query, your acceptable latency, required retention period and whether the data are hot or cold. Now choose your product. Again, the following Datastore Comparison from Amazon is very helpful:

Its All About Choice:  Data Storage Products on AWS

Need block storage for temporary SQL data that will be queried frequently with a requirement for low latency? You probably want to consider simply using the ephemeral instance storage on your EC2 instances. Need high durability for cold data objects that change infrequently? How about the eleven 9’s of durability of S3? Loose schema data from multiple sources coming to you rapidly, queried frequently but you also need very high durability? AWS offers DynamoDB, is proprietary NoSQL engine which offers regional redundancy with redundancy similar to S3.

The last question you may want to consider is whether you go for managed or non-managed data services. For SQL structured databases AWS offers Amazon RDS a managed database services that takes care of all of the “undifferentiated heavy lifting” of patching, updates, etc. and offers built in multi-availability zone redundancy.  This services is available on MySQL, Oracle, Microsoft SQL Server, PostgreSQL, or Amazon Aurora. Encryption at rest is supported for MySQL and PostgreSQL. This service is really beyond reproach in my experience but beware the 3 TB limit and the loss of certain control features like direct links between SQL servers and the ability to do your own application-related performance tuning.   DynamoDB is also a managed NoSQL service that will probably outperform and cost less than anything you can build yourself with 3rd party products and, like RDS offers built-in region-level redundancy.

So although the products are many and can be baffling at first, it’s really very simple to choose the right product when you approach it systematically with Amazon’s recommend thought process.  Also worth consideration is: WHERE do you want your data to be, meaning in which AWS region?  Things to consider here include latency (driven by the distance from your users) cost and regulatory requirements.  Luckily AWS allows you to specify the geographic region to store your data and all of the products discussed here are limited by region or availability zone within a region so there is no danger of violating regulatory requirements for the location of data.  Finally consider how to encrypt both at rest and in transit and what level of control do you need over your encryption method and keys.  Again AWS allows you to do just about any flavor of encryption you require.

Blue Sentry is an advanced-tier Amazon Web Services (AWS) consulting partner specializing in application and data migrations, expert managed services and virtual desktops. Blue Sentry serves clients globally, with operations in North Carolina and South Carolina.