S3 — Beyond Basic Storage — AWS for Backend Engineers

Every backend engineer knows S3 as “the place you put files.” Upload an image, store a backup, host a static site. But that barely scratches the surface. S3 is one of the most versatile building blocks in AWS — a durable, scalable object store that underpins everything from data lakes to real-time processing pipelines.

The difference between using S3 as a dumb file bucket and using it as an architecture component is understanding its features. Storage classes can cut your bill by 80%. Presigned URLs let users upload directly without touching your servers. Event notifications trigger entire processing pipelines. Let’s dig in.

S3 storage classes and lifecycle transitions

Storage Classes — Stop Overpaying

Not all data is accessed equally. A user’s profile photo is hit thousands of times a day. Last month’s logs get accessed once for debugging. Last year’s compliance archives might never be touched again. S3 charges differently based on how you classify your data.

Storage Class	Access Pattern	Storage Cost (GB/mo)	Retrieval Cost	Min Duration
Standard	Frequent access	$0.023	None	None
Intelligent-Tiering	Unknown/changing	$0.023 + monitoring fee	None	None
Standard-IA	Infrequent (1x/month)	$0.0125	$0.01/GB	30 days
One Zone-IA	Infrequent, non-critical	$0.01	$0.01/GB	30 days
Glacier Instant	Rare, milliseconds access	$0.004	$0.03/GB	90 days
Glacier Flexible	Archives, minutes-hours	$0.0036	$0.01-$12/GB	90 days
Glacier Deep Archive	Compliance, 12-hour retrieval	$0.00099	$0.02/GB	180 days

Intelligent-Tiering deserves special attention. It automatically moves objects between access tiers based on usage patterns. There’s a small monitoring fee (~$0.0025 per 1,000 objects), but for data with unpredictable access patterns, it’s the safest choice. You never pay retrieval fees, and objects automatically move to cheaper tiers when not accessed.

Lifecycle Policies — Automate Cost Savings

Lifecycle policies are rules that automatically transition or delete objects based on age. This is where the real cost savings happen.

{
  "Rules": [
    {
      "ID": "logs-lifecycle",
      "Filter": { "Prefix": "logs/" },
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER_IR"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    },
    {
      "ID": "cleanup-incomplete-uploads",
      "Filter": { "Prefix": "" },
      "Status": "Enabled",
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 7
      }
    }
  ]
}

That second rule is critical and often missed. Incomplete multipart uploads sit in your bucket invisibly, accumulating storage charges. Always add a cleanup rule.

A real-world example: a media company storing user uploads had 2 TB of log data in Standard storage at $46/month. After adding lifecycle rules — IA after 30 days, Glacier after 90 days, Deep Archive after a year — the same data cost under $8/month. The lifecycle policy paid for itself in a day.

Versioning

Enable versioning and S3 keeps every version of every object. Delete a file? It gets a delete marker, but all previous versions remain. Overwrite a file? The old version is preserved.

This is essential for:

Accidental deletion recovery — restore any previous version
Audit trails — see who changed what and when
Concurrent writes — no data lost from race conditions

import { S3Client, GetObjectCommand, ListObjectVersionsCommand } from '@aws-sdk/client-s3';

const s3 = new S3Client({ region: 'us-east-1' });

// List all versions of an object
async function listVersions(bucket, key) {
  const response = await s3.send(new ListObjectVersionsCommand({
    Bucket: bucket,
    Prefix: key
  }));

  return response.Versions.map(v => ({
    versionId: v.VersionId,
    lastModified: v.LastModified,
    size: v.Size,
    isLatest: v.IsLatest
  }));
}

// Get a specific version
async function getVersion(bucket, key, versionId) {
  const response = await s3.send(new GetObjectCommand({
    Bucket: bucket,
    Key: key,
    VersionId: versionId
  }));
  return response.Body;
}

Combine versioning with lifecycle policies to expire old versions after N days. Otherwise, you’re paying to store every version forever.

Presigned URLs — Direct Client Uploads

This is the pattern that changes how you think about file uploads. Instead of your server receiving the file and forwarding it to S3 (doubling your bandwidth and latency), generate a presigned URL that lets the client upload directly to S3.

import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';

const s3 = new S3Client({ region: 'us-east-1' });

// Generate a presigned URL for upload
async function getUploadUrl(userId, fileName, contentType) {
  const key = `uploads/${userId}/${Date.now()}-${fileName}`;

  const command = new PutObjectCommand({
    Bucket: process.env.UPLOAD_BUCKET,
    Key: key,
    ContentType: contentType,
    Metadata: {
      'uploaded-by': userId
    }
  });

  const url = await getSignedUrl(s3, command, {
    expiresIn: 300  // URL valid for 5 minutes
  });

  return { url, key };
}

// Generate a presigned URL for download
async function getDownloadUrl(key) {
  const command = new GetObjectCommand({
    Bucket: process.env.UPLOAD_BUCKET,
    Key: key
  });

  return getSignedUrl(s3, command, {
    expiresIn: 3600  // 1 hour
  });
}

The client-side upload is then simple:

// Client-side: upload directly to S3 using the presigned URL
async function uploadFile(file) {
  // 1. Get presigned URL from your API
  const { url, key } = await fetch('/api/upload-url', {
    method: 'POST',
    body: JSON.stringify({
      fileName: file.name,
      contentType: file.type
    })
  }).then(r => r.json());

  // 2. Upload directly to S3
  await fetch(url, {
    method: 'PUT',
    body: file,
    headers: { 'Content-Type': file.type }
  });

  // 3. Tell your API the upload is complete
  await fetch('/api/upload-complete', {
    method: 'POST',
    body: JSON.stringify({ key })
  });
}

Your server never touches the file data. It generates a short-lived URL, the client uploads directly, and your server is notified when it’s done. This works for files of any size and dramatically reduces server load.

Multipart Uploads

For files larger than 100 MB, multipart uploads are a must. They break the file into parts, upload them in parallel, and can resume from where they left off if a part fails.

import {
  CreateMultipartUploadCommand,
  UploadPartCommand,
  CompleteMultipartUploadCommand,
  AbortMultipartUploadCommand
} from '@aws-sdk/client-s3';

async function multipartUpload(bucket, key, fileBuffer) {
  const PART_SIZE = 10 * 1024 * 1024; // 10 MB parts

  // Step 1: Initiate
  const { UploadId } = await s3.send(new CreateMultipartUploadCommand({
    Bucket: bucket,
    Key: key
  }));

  try {
    // Step 2: Upload parts in parallel
    const parts = [];
    const totalParts = Math.ceil(fileBuffer.length / PART_SIZE);

    const uploadPromises = [];
    for (let i = 0; i < totalParts; i++) {
      const start = i * PART_SIZE;
      const end = Math.min(start + PART_SIZE, fileBuffer.length);
      const partNumber = i + 1;

      uploadPromises.push(
        s3.send(new UploadPartCommand({
          Bucket: bucket,
          Key: key,
          UploadId,
          PartNumber: partNumber,
          Body: fileBuffer.slice(start, end)
        })).then(response => ({
          PartNumber: partNumber,
          ETag: response.ETag
        }))
      );
    }

    const completedParts = await Promise.all(uploadPromises);

    // Step 3: Complete
    await s3.send(new CompleteMultipartUploadCommand({
      Bucket: bucket,
      Key: key,
      UploadId,
      MultipartUpload: {
        Parts: completedParts.sort((a, b) => a.PartNumber - b.PartNumber)
      }
    }));
  } catch (err) {
    // Abort on failure — don't leave incomplete uploads
    await s3.send(new AbortMultipartUploadCommand({
      Bucket: bucket,
      Key: key,
      UploadId
    }));
    throw err;
  }
}

In practice, use the @aws-sdk/lib-storage Upload utility which handles all of this for you. But understanding the underlying mechanics matters when debugging upload failures.

Event Notifications — S3 as a Trigger

S3 can fire events when objects are created, deleted, or restored. These events can trigger Lambda functions, send messages to SQS, or publish to SNS.

S3 event notification triggering a processing pipeline

// Lambda triggered by S3 event
export async function handler(event) {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, ' '));
    const size = record.s3.object.size;
    const eventType = record.eventName;

    console.log(`${eventType}: ${bucket}/${key} (${size} bytes)`);

    if (key.startsWith('uploads/') && key.endsWith('.jpg')) {
      await generateThumbnail(bucket, key);
    }

    if (key.startsWith('data/') && key.endsWith('.csv')) {
      await processDataFile(bucket, key);
    }
  }
}

Common patterns:

Image processing pipeline: upload → S3 event → Lambda → generate thumbnails → save back to S3
Data ingestion: CSV dropped in S3 → Lambda → parse and load into DynamoDB
Video transcoding: video uploaded → S3 event → SQS → ECS/Fargate task picks up and transcodes
Audit trail: any object change → S3 event → EventBridge → audit log in DynamoDB

For EventBridge integration (recommended over direct S3 notifications for new architectures):

# Enable EventBridge notifications on a bucket
aws s3api put-bucket-notification-configuration \
  --bucket my-bucket \
  --notification-configuration '{"EventBridgeConfiguration":{}}'

Now all S3 events for that bucket flow to EventBridge, where you can write sophisticated matching rules.

S3 + CloudFront — Global Content Delivery

Serving S3 content directly works, but users far from your S3 region see high latency. CloudFront caches content at 400+ edge locations worldwide.

The setup is straightforward:

Create a CloudFront distribution with your S3 bucket as the origin
Use an Origin Access Control (OAC) so only CloudFront can read from S3
Your bucket stays private — CloudFront is the only public entry point

{
  "S3BucketPolicy": {
    "Version": "2012-10-17",
    "Statement": [{
      "Sid": "AllowCloudFrontServicePrincipal",
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudfront.amazonaws.com"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringEquals": {
          "AWS:SourceArn": "arn:aws:cloudfront::123456789:distribution/EDFDVBD6EXAMPLE"
        }
      }
    }]
  }
}

Benefits beyond caching:

HTTPS with custom domains — CloudFront handles SSL termination
Edge functions — CloudFront Functions or Lambda@Edge for URL rewrites, auth, A/B testing
Compression — automatic gzip/brotli for text-based content
DDoS protection — AWS Shield Standard is included free

Access Control

S3 access control has evolved through multiple layers. Here’s what to use in 2026:

Block Public Access — enable this at the account level. It prevents accidental public exposure. Turn it on and forget about it.
Bucket Policies — the primary access control mechanism. JSON policies attached to the bucket that define who can do what.
IAM Policies — control what your services and users can do. Prefer this for application access.
ACLs — legacy. AWS recommends disabling them entirely with the “BucketOwnerEnforced” setting.

// Good: IAM role-based access. Your Lambda/ECS task gets an execution
// role with exactly the permissions it needs.
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:PutObject"
    ],
    "Resource": "arn:aws:s3:::my-app-uploads/uploads/*"
  }]
}
// This role can only read/write objects under the uploads/ prefix.
// Nothing else. Least privilege.

S3 Select — Query in Place

Instead of downloading an entire CSV or JSON file to process one field, S3 Select lets you run SQL queries directly on the object. S3 does the filtering and only returns the matching data.

import { SelectObjectContentCommand } from '@aws-sdk/client-s3';

async function queryCSV(bucket, key) {
  const response = await s3.send(new SelectObjectContentCommand({
    Bucket: bucket,
    Key: key,
    ExpressionType: 'SQL',
    Expression: "SELECT s.name, s.amount FROM s3object s WHERE s.amount > '1000'",
    InputSerialization: {
      CSV: { FileHeaderInfo: 'USE', Comments: '#' }
    },
    OutputSerialization: {
      JSON: { RecordDelimiter: '\n' }
    }
  }));

  const records = [];
  for await (const event of response.Payload) {
    if (event.Records) {
      records.push(event.Records.Payload);
    }
  }

  return Buffer.concat(records).toString('utf-8')
    .trim()
    .split('\n')
    .map(line => JSON.parse(line));
}

This is perfect for log analysis, data exploration, and any scenario where you need a small slice of a large file. You pay only for the data scanned and returned, not the entire object.

Performance Optimization

Prefix Partitioning

S3 automatically partitions your data by key prefix for performance. Modern S3 handles 5,500 GET and 3,500 PUT requests per second per prefix. If you need more, spread your keys across multiple prefixes:

// Bad — all keys under one prefix
logs/2026-03-31-001.gz
logs/2026-03-31-002.gz

// Better — date-partitioned
logs/2026/03/31/001.gz
logs/2026/03/31/002.gz

// Best for high throughput — add a hash prefix
logs/a1/2026/03/31/001.gz
logs/b2/2026/03/31/002.gz

Transfer Acceleration

For uploads from distant regions, S3 Transfer Acceleration routes data through CloudFront edge locations. Enable it on your bucket and use the accelerated endpoint. Speed improvements of 50-500% are common for cross-continent uploads.

Static Website Hosting

S3 can serve a static website directly with:

Index document (index.html)
Error document (404.html)
Redirect rules

But for production, always put CloudFront in front. S3 website endpoints don’t support HTTPS, and CloudFront gives you caching, custom domains, and edge functions.

Cost Optimization Checklist

Lifecycle policies — move old data to cheaper tiers automatically
Intelligent-Tiering — for data with unpredictable access patterns
Abort incomplete multipart uploads — they silently accumulate charges
S3 Storage Lens — dashboard showing usage patterns and savings opportunities
Requester Pays — make the downloader pay for transfer (useful for shared datasets)
Same-region access — cross-region data transfer costs $0.02/GB
VPC endpoints — free data transfer from EC2/Lambda to S3 within the same region
Compression — gzip your files before upload when possible

S3 is deceptively simple on the surface. PutObject, GetObject, done. But the services built on top of that simple interface — lifecycle management, event-driven processing, global distribution, in-place querying — make it one of the most powerful components in your AWS toolkit. Use it as a building block, not just a bucket.