AWS: Content Storage using Amazon (S3)

7 min readJan 17, 2019

Amazon Simple Storage Service (S3) is one of the core object storage services available on AWS. Amazon S3 provides secure, durable and highly-scalable cloud storage which is easy-to-use object storage with a simple web service interface that can be used to store and retrieve any amount of data from anywhere on the web. Amazon S3 follows pay-as-you-go model which allows you to pay only for the storage actually used.

Amazon S3 is cloud object storage. Instead of being closely associated with a server, Amazon S3 storage is independent of a server and can be accessed over the Internet. Instead of managing data as blocks or files using SCSI, CIFS or NFS protocols, data is managed as objects using an Application Program Interface (API) built on standard HTTP verbs.

Below are few use cases for Amazon S3 storage:

Content, media storage and distribution
Static website hosting
Disaster recovery
Backup and archive of on-premises or cloud data

Storage Entities: Endpoints, Buckets & Objects

A bucket is simply a container (a web folder) which can contain objects (files) stored in Amazon S3. Bucket names are global and should be unique. There could be multiple buckets, but there could not be a sub-bucket within a bucket. An unlimited number of objects can be stored in a bucket.

Amazon S3 bucket is created in a specific chosen Region. This lets us control where our data is stored. A bucket can be located close to a particular set of end users or customers in order to minimize latency, or located in a particular region to satisfy data locality. Or can be located far away from the primary facilities in order to satisfy disaster recovery and compliance needs.

Objects are the files stored in Amazon S3 Buckets. Each object is identified by a unique key (filename) within a bucket which can be specified by users and can be stored in any format. Size of each object can range from 0 bytes to 5TB as of today (ref) and an unlimited number of objects can be stored in each Bucket. Each object has data (the file) and is associated with metadata (attributes or properties about the data). Metadata associated with an object is a set of key/value pairs that describe the object.

Each Amazon S3 object has a unique URL which is formed by the combination of the endpoint, the bucket name, and the object key. For example, with the URL:

http://mybucket.s3.amazonaws.com/jack.doc

“mybucket” in above URL is the S3 bucket name, and jack.doc is the key or filename.

“s3.amazonaws.com” is the endpoint. An endpoint is a URL that is the entry point for an object in S3 bucket.

S3 Operations

S3 allows users to perform various operations using REST (Representational State Transfer) APIs. With the REST interface, standard HTTP or HTTPS requests can be used to perform a wide range of operations. Some common operations are:

Create/delete a bucket
Write an object
Read an object
Delete an object
List keys in a bucket

All the detailed S3 Operations are described here.

Durability and Availability

Durability addresses the question. “Will the data still be there always?” Availability addresses the question, “Can the data be accessed right now?”

Amazon S3 is designed to provide both very high durability and very high availability for all the data. Amazon standard storage is designed for 99.999999999% durability and 99.99% availability of objects over a given year (Ref).

Data Security: Access Control

Amazon S3 is secure by default; when a bucket or an object is created in Amazon S3, only the creator can access. To give controlled access to others, Amazon S3 provides both coarse-grained access controls (Amazon S3 Access Control Lists [ACLs]), and fine-grained access controls (Amazon S3 bucket policies, AWS IAM security policies, and query-string authentication).

Static Website Hosting

Static website hosting is a very common use case for Amazon S3 storage. Because every Amazon S3 object has a unique and globally accessible URL, it is relatively straightforward to turn a bucket into a website. Many websites don’t need the services of a full web server. They only need some static web pages hosted somewhere that can be accessed globally. A static website is a website where all of the pages contain only static content like HTML, CSS and JavaScript etc and do not require server-side processing such as PHP, ASP.NET or JSP. Generally, static website URL hosted in S3 follows below pattern:

<bucket-name>.s3-website-<AWS-region>.amazonaws.com

Storage Classes

Amazon S3 offers several types of storage classes suitable for various use cases based on durability and availability that governs the cost one pays.

S3 Standard: It offers high / real time availability, high durability, frequently accessed, low latency and high-performance object storage for general purpose use.

S3 Standard — Infrequent Access (Standard-IA): It offers the same durability, low latency, and high throughput as Amazon S3 Standard, but is designed for long-lived and less frequently accessed data.

S3 Reduced Redundancy Storage (RRS) offers slightly lower durability (4 nines) than Amazon S3 Standard or Standard-IA at a reduced cost. RRS provides a solution to distribute or share content that is durably stored elsewhere, or to store processed data, transcoded media or thumbnails which can be easily reproduced.

Glacier: This class of storage offers secure, durable, and extremely low-cost cloud storage for archives and long-term backups.

Object Lifecycle Management

Amazon S3 Object Lifecycle Management is very similar to automated storage tiering in traditional IT storage infrastructures. For example, some business documents are frequently accessed initially at the time of creation, but then become much less frequently accessed over time.

Using Amazon S3 lifecycle configuration rules, storage costs can be reduced significantly by automatically transitioning data from one storage class to another or even automatically deleting data after a period of time. For example, the lifecycle rules for data backup could be:

Store data initially in S3 Standard.
After 30 days, transitions to S3 Standard-IA.
After 90 days, archive to Glacier.
After 3 years, delete.

Object Versioning

Amazon S3 versioning helps in protecting data against accidental or malicious deletion. This can be done by keeping multiple versions of each object in the bucket and each version is identified by a unique version ID. Versioning allows you to preserve, retrieve, and restore each version of the objects stored in your Amazon S3 bucket. In case of accidental change or deletion, objects can be restored to its original state simply by referencing the version ID in addition to the bucket and object key. Enabling versioning feature is only possible at the bucket level. Versioning can’t be enabled or disabled at individual file level within the bucket. Versioning can be suspended. Suspending versioning stops maintaining versioning for the duration versioning suspended but preserves any existing object versions.

Cross-Region Replication

Cross-region replication is generally used to reduce the latency to access objects stored in Amazon S3 by placing objects closer to a set of users. This feature allows you to asynchronously replicate all new objects from the source bucket in one AWS region to a target bucket in another region. Any metadata and ACLs associated with the object in the source bucket are also part of the replication. After setting up cross-region replication on your source bucket, any changes to the data, metadata, or ACLs on an object trigger a new replication to the destination bucket.

To enable cross-region replication, versioning must be turned on for both source and target buckets. An IAM policy must be used to give Amazon S3 permission to replicate objects.

Cross-region replication can be used for scenarios like Efficient disaster recovery, Faster reads, Easier traffic management, Easy regional migration and Live data migration.

Logging

Amazon S3 server access logs can be enabled to track all the requests to your Amazon S3 bucket. By default, logging is disabled. After enabling, access logs can be stored in the same bucket or in a different bucket.

Logs include information such as:

Requestor account and IP address
Bucket name
Request time
Action (GET, PUT, LIST etc.)
Response status or error code

Event Notifications

Amazon S3 event notifications are set up at the bucket level can be sent in response to actions taken on objects uploaded or stored in Amazon S3. And notifications can be configured through the Amazon S3 console or the REST API, or by using an SDK. Amazon S3 event notifications can be sent for following actions:

New objects are created in S3 (by a PUT, POST, or multipart upload completion)
Objects are removed from S3 (by a DELETE)
An RRS object was lost.

Notification messages can be sent through either Amazon Simple Notification Service (Amazon SNS) or Amazon Simple Queue Service (SQS) or delivered directly to AWS Lambda to invoke AWS Lambda functions.

References

The best reference for further readings on Amazon S3 is AWS documentation: Amazon Simple Storage Service Documentation

AWS: Content Storage using Amazon (S3)

Storage Entities: Endpoints, Buckets & Objects

Written by Sumant Mishra