Amazon Simple Storage Service (commonly called S3) is an Object level storage service that allows virtually unlimited storage, with high data durability.
Composition and characteristics of S3 Objects
- S3 Object has a flat structure. Despite the appearance of folders (due to buckets), objects are stored simply as key-value pairs
- Key is the Object name, which is actually the complete path including bucket names and the object name
- Example – unique-bucket-name-0101010101/another-bucket/aws-advisor-logo.jpg
- Value is the Object itself
- Example – the actual image (of aws-advisor-logo.jpg)
- Version ID – version id of each Object (and its changed states)
- Each object can be up to 5 TB
- Metadata – additional key-value information regarding the object (user-defined, and system-defined)
- Access Control Information – specifics of the permissions for the Object(s)
- Key is the Object name, which is actually the complete path including bucket names and the object name
Key points for Simple Storage Service (S3)
- You can store virtually any kind of data in any format.
- Virtually unlimited storage.
- Objects stay within a region, but are synced across all AZs (within that Region) for high availability and durability.
- S3 provides 99.999999999% (11 9’s) of durability.
- S3 has a global namespace, and thus each S3 bucket name has to be globally unique.
- Data stays within the Region, but can be replicated to other Regions
- Versioning – allows your treat an Object like a version-controlled configuration item.
- Versioning keeps older versions of the Objects any time you update / delete an existing object
- By default, S3 fetches the latest version of the Object
- Once enabled, you can then suspend versioning (but not disable it) – it would maintain older versions though
- Once an item is deleted, the Object is still kept with a “delete” marker – you can un-delete the Object, or permanently delete the Object (can enable MFA on these deletes to avoid unintended permanent deletes)
- You can leverage S3 Inventory report to get CSV, ORC, or Parquet file output of your Objects and associated Metadata on daily or weekly basis for an S3 Bucket or Prefix.
S3 Storage Classes
Standard
- Designed for frequently accessed data.
- 99.999999999% durability (eleven 9s)
- 3+ AZ replication
- 99.99% availability
- Low Latency and High Throughput
- Most expensive; but not minimum object size / storage duration fee
- It is the default class
- Supports SSL for data in transit and encryption for data at rest
- Use Cases – storage for cloud applications, dynamic websites, mobile and gaming applications
Standard – Infrequent Access (S3-IA)
- Designed for objects that not accessed frequently, but require quick access when needed
- 99.999999999% durability (eleven 9s)
- 3+ AZ replication
- 99.9% availability
- Same Low Latency and High Throughput as that of Standard class
- Minimum charge is 30 days; 128 KB minimum storage charge; object retrieval fee
- Supports SSL for data in transit and encryption for data at rest
- Use Cases – backups, DR storage
One Zone Infrequent Access (S3 One Zone-IA)
- Designed for non-critical, reproducible objects
- 99.999999999% durability (eleven 9s) within the scope of specific AZ. However, the data will be lost in case that AZ is destroyed due to any catastrophe
- Store only in 1 AZ
- 99.5% availability
- Same Low Latency and High Throughput as that of Standard class
- Supports SSL for data in transit and encryption for data at rest
- Minimum charge is 30 days; 128 KB minimum storage charge; object retrieval fee
Glacier
- Designed for long-term archival (warm or cold backups)
- May take several hours for objects to be retrieved
- You can upload data directly to Glacier, or move via Lifecycle management policy
- 99.999999999% durability (eleven 9s)
- 3+ AZ replication
- Supports SSL for data in transit and encryption for data at rest
- 90 day and 40 KB minimum charge; object retrieval fee
- Retrieval: three configurable options, ranging from minutes to hours
- Provisioned Capacity Unit (PCU) – you can purchase PCU to guarantee retrieval capacity for Expedited retrievals. Each PCU ensures at least 3 Expedited retrievals every 5 minutes, with up to 150 MB/s of throughput.
Glacier Deep Archive
- Long-term archival (cold backups)
- Longer retrievals, but cheaper than Glacier – replacement for tape-style storage
- 3+ AZ replication
- 180 day and 40 KB minimum charges; object retrieval fee
- Use Cases – storage for data that is required to be retained for long term for regulatory and compliance reasons, and accessed infrequently (like once or twice a year)
- Retrieval: within 12 hours
- You can move data to Glacier Deep Archive via API, or through Lifecycle manager policy
Intelligent-Tiering
- Designed for Object storage when you have unknown and unpredictable storage requirements
- S3 automatically moves data to the most cost-effective access tier, without performance impact or operational overhead
- There is a fee charged per month per object, for monitoring and automation (of class movement); however there is no retrieval fee if object is retrieved and moved inter-class
- Same low latency and high throughput performance as of S3 Standard
- 99.999999999% durability (eleven 9s)
- Multiple AZ replication
- 99.99% availability
- Supports SSL for data in transit and encryption for data at rest
Performance comparison chart for the classes:
Security
Amazon S3 provides several ways to secure the data you store in S3
- Access Management
- Data Security
Access Management
S3 evaluates all applicable policies – access policies, user policies, and resource-based policies (Bucket Policy, Bucket ACL, Object ACL) to decide whether to authorize the request or not. For each request, S3 compiles and assesses relevant policies against following three contexts:
- User Context
- S3 evaluates user policies attached to user (by its parent account)
- S3 also evaluates resource policies (Bucket Policy, Bucket ACL, and Object ACL)
- This (User Context) step is skipped if the request comes from the Root account
- Bucket Context
- S3 evaluates the policies owned by the Bucket’s owner account
- Access evaluation is done at both the Bucket level, and at Object level – e.g, if specific Deny exists for set of Object(s), request will be denied
- Object Context
- S3 evaluates policies defined by Object owner
Note: for object level the last step is not performed when the request is at bucket level.
Following diagram shows the first two Context steps being performed for Bucket level request:
Image courtesy of AWS
Following diagram shows the first three Context steps being performed for Object level request:
Image courtesy of AWS
Access Control Lists (ACLs), and when to use them:
ACLs exist at Bucket level as well as at Object level. To truly understand when to use what, you need to know these grounds rules:
- A Bucket (owned by Owner-A) may have various Objects owned by various Owners (different from Owner-A)
- Example: your account creates a Bucket and allow publicly or selectively other account owners to upload Objects to this Bucket. Each uploaded Object would be owned by uploading account, and not your account
- A Bucket owner cannot grant permissions on Objects that it does not own
You should use Object ACL in cases like:
- You don’t own the Bucket which contains your Object(s) – in this case you only have the option to set permissions at Object level
- You own both Bucket and Object(s), and need to vary access at Object level
Amazon recommends that you use Bucket ACL only in following case:
- To grant write permission to S3 Log Delivery group to write access log Objects to your bucket
Bucket Policy
A Bucket Policy specifies what actions are allowed or denied for which principals on the bucket this policy is attached to.
- Policies are applied to all Objects under the Bucket
- Bucket policies are limited to 20 KB in size
- Final authorization is combination of all applicable policies. Priority order is (1) Explicit Deny, (2) Explicit Allow, (3) Implicit Deny
IAM Policy
An IAM Policy grants permissions (allow / deny) to IAM User / Group / Role for AWS resource – in this context, to S3 resources.
Note: AWS recommends not to use ACLs, and rather make use of Bucket Policies and / or IAM Policies for S3 access management.
Encryption
- At rest data encryption can be configured at Object level
- In-transit data is always encrypted between S3 and a client
- Encryption options – you can opt for one of the following ways to manage encryption for S3 Objects:
- Client-Side Encryption
- You manage encryption / decryption and associated keys.
- Typically used by clients for extremely sensitive or regulated data.
- This option has major admin overhead.
- Server-Side Encryption with Customer-Managed Keys (SSE-C)
- S3 manages encryption / decryption, and you manage the associated keys
- Keys must be provided with each PUT or GET request
- Server-Side Encryption with S3-Managed Keys (SSE-S3)
- S3 manages encryption / decryption as well as the associated keys
- Keys are stored with objects in an encrypted form. With right permission you can decrypt and access the Keys
- Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)
- Encryption managed through individually generated keys by KMS
- Encrypted keys are stored with the encrypted objects. You would need both S3 as well as KMS permissions to decrypt and access the Keys
- Client-Side Encryption
Additional Security notes for S3:
- “Block Public Access” – if applied – will trump other security policies.
- Data Access Auditing – you have an option to enable setting to create log records of all requests made against a specific bucket.
- You can leverage S3 Object Lock to block Object version deletion during a specified retention period.
- You can also use Access Analyzer for S3 to monitor your access policies to ensure such policies provide only the intended access to your S3 resources.
- Additionally, you can use Amazon Macie service to recognize sensitive data stored in your S3 bucket(s), such as personally identifiable information (PII) or intellectual property, and learn additional info like where the data is stored, how it’s being used in your organization.
- S3 Access Points allow you to expose S3 Objects aligned to a specific need – such as with a specific application or use case – this gives you additional way of apply access to select-Objects within any Bucket.
- You create Access Points on top of Bucket creation.
- You can create up to 1,000 Access Points per Region, and can submit request to increase the limit.
- There is no additional cost for using Access Points.
Pricing
S3 is priced for following components:
- Storage – per GB
- Pricing varies by Storage class – Standard, Standard Infrequent Access, One Zone Infrequent Access, Glacier, Glacier Deep Archive, and Intelligent Tier pricing
- Pricing is tiered – example – first 50 TB at one price, next 450 TB at lower price, etc.
- Requests and Data Retrievals – per 1,000 requests
- Pricing varies by Request type – example – PUT/COPY/POST/LIST have different price than GET/SELECT, etc.
- Pricing also varies by Storage class
- For Glacier class, pricing is dependent on urgency data request – Expedited / Standard / Bulk
- For Glacier Deep Archive, pricing is dependent on Standard / Bulk request type
- Data Transfer – per GB per Month
- Data Transfer “in” to S3 from Internet is Free
- Data Transfer “out” from S3 to Internet is charged – tiered pricing
- Data Transfer “out” from S3 to CloudFront is Free
- Management and Replication – some S3 related management features are charged based on their usage
- S3 Inventory – per million Objects listed
- S3 Analytics Storage Class Analysis – per million Objects monitored per month
- S3 Object Tagging – per 10,000 tags per month
- For Replication, you pay for the storage and requests
- S3 Batch Operations – per Job, and per million Object operations performed
External Resources