Attribute Compression

Reduce storage costs and throughput consumption by compressing large attribute values or offloading them to S3.

What is attribute compression?

DynamoDB supports storing compressed values using algorithms like GZIP or LZO, which produce binary output that can be stored in a Binary attribute type.1 By compressing long text strings or binary data, you can reduce the size of your items, leading to lower storage costs and fewer RCUs/WCUs consumed per operation.

Key benefits:

  • Reduced storage costs - Smaller items mean lower storage charges
  • Lower throughput consumption - Fewer capacity units consumed per read/write
  • Overcome size limits - Compress large attributes to fit within the 400 KB item size limit
  • Cost-effective at scale - Savings compound with high-volume tables

Example scenario:

Product reviews table with uncompressed text:

  • Average review size: 8 KB
  • 1 million reviews
  • Storage cost: 8 GB × $0.25/GB = $2.00/month

With GZIP compression (70% reduction):

  • Compressed review size: 2.4 KB
  • Storage cost: 2.4 GB × $0.25/GB = $0.60/month
  • Monthly savings: $1.40/month (70% reduction)

Implementation

Client-side compression

Compression must be performed client-side before writing to DynamoDB.1

Python example:

import boto3
import gzip
from decimal import Decimal

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ProductReviews')

# Compress review text
review_text = "This is a very long product review..." * 100
compressed_review = gzip.compress(review_text.encode('utf-8'))

# Store compressed binary in DynamoDB
table.put_item(
    Item={
        'product_id': '12345',
        'review_id': 'review-001',
        'compressed_review': compressed_review,  # Binary attribute
        'review_length': len(review_text),
        'compressed_length': len(compressed_review)
    }
)

# Retrieve and decompress
response = table.get_item(Key={'product_id': '12345', 'review_id': 'review-001'})
decompressed_review = gzip.decompress(response['Item']['compressed_review']).decode('utf-8')
print(f"Original size: {response['Item']['review_length']} bytes")
print(f"Compressed size: {response['Item']['compressed_length']} bytes")

Important limitations

Filtering and querying

Compressed attribute values cannot be used for filtering operations directly in DynamoDB queries or scans.1 To filter on compressed data:

  1. Retrieve all potential data sets using Query or Scan
  2. Decompress values in your application code
  3. Apply filters programmatically in your application

Performance trade-offs

Compression overhead:

  • CPU overhead - Compression/decompression requires client-side CPU resources
  • Latency impact - Additional processing time for compress/decompress operations
  • Memory usage - Must hold uncompressed data in memory during processing

Throughput considerations:

  • Compressed items consume fewer capacity units, reducing costs
  • But compression adds client-side processing time
  • For latency-sensitive applications, test compression impact before implementing at scale

Alternative: Store large attributes in S3

For extremely large attributes, consider storing the data in Amazon S3 and storing only the S3 path in DynamoDB:2

Benefits of S3 offloading

  • Lower storage costs - S3 storage ($0.023/GB) is 91% cheaper than DynamoDB ($0.25/GB)
  • No item size limits - S3 can store objects up to 5 TB
  • Lifecycle policies - Automatically transition old data to cheaper storage classes (Glacier, Deep Archive)

Hybrid approach

For tables with many large text attributes, combine both approaches:

Example architecture:

Product catalog table:

  • Short description (uncompressed): Used for search/filtering, <2 KB
  • Long description (compressed): Displayed on product pages, 10-50 KB
  • User manual PDF (S3): Rarely accessed, >500 KB, store S3 path in DynamoDB

This approach optimizes for both cost and performance across different access patterns.

Resources

Footnotes

  1. DynamoDB Data Types 2 3

  2. Best practices for storing large values