Attribute Compression | DynamoDB cost optimization

What is attribute compression?

DynamoDB supports storing compressed values using algorithms like GZIP or LZO, which produce binary output that can be stored in a Binary attribute type.¹ By compressing long text strings or binary data, you can reduce the size of your items, leading to lower storage costs and fewer RCUs/WCUs consumed per operation.

Key benefits:

Reduced storage costs - Smaller items mean lower storage charges
Lower throughput consumption - Fewer capacity units consumed per read/write
Overcome size limits - Compress large attributes to fit within the 400 KB item size limit
Cost-effective at scale - Savings compound with high-volume tables

Example scenario:

Product reviews table with uncompressed text:

Average review size: 8 KB
1 million reviews
Storage cost: 8 GB × $0.25/GB = $2.00/month

With GZIP compression (70% reduction):

Compressed review size: 2.4 KB
Storage cost: 2.4 GB × $0.25/GB = $0.60/month
Monthly savings: $1.40/month (70% reduction)

Implementation

Client-side compression

Compression must be performed client-side before writing to DynamoDB.¹

Python example:

import boto3
import gzip
from decimal import Decimal

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ProductReviews')

# Compress review text
review_text = "This is a very long product review..." * 100
compressed_review = gzip.compress(review_text.encode('utf-8'))

# Store compressed binary in DynamoDB
table.put_item(
    Item={
        'product_id': '12345',
        'review_id': 'review-001',
        'compressed_review': compressed_review,  # Binary attribute
        'review_length': len(review_text),
        'compressed_length': len(compressed_review)
    }
)

# Retrieve and decompress
response = table.get_item(Key={'product_id': '12345', 'review_id': 'review-001'})
decompressed_review = gzip.decompress(response['Item']['compressed_review']).decode('utf-8')
print(f"Original size: {response['Item']['review_length']} bytes")
print(f"Compressed size: {response['Item']['compressed_length']} bytes")

Important limitations

Filtering and querying

Compressed attribute values cannot be used for filtering operations directly in DynamoDB queries or scans.¹ To filter on compressed data:

Retrieve all potential data sets using Query or Scan
Decompress values in your application code
Apply filters programmatically in your application

Performance trade-offs

Compression overhead:

CPU overhead - Compression/decompression requires client-side CPU resources
Latency impact - Additional processing time for compress/decompress operations
Memory usage - Must hold uncompressed data in memory during processing

Throughput considerations:

Compressed items consume fewer capacity units, reducing costs
But compression adds client-side processing time
For latency-sensitive applications, test compression impact before implementing at scale

Alternative: Store large attributes in S3

For extremely large attributes, consider storing the data in Amazon S3 and storing only the S3 path in DynamoDB:²

Benefits of S3 offloading

Lower storage costs - S3 storage ($0.023/GB) is 91% cheaper than DynamoDB ($0.25/GB)
No item size limits - S3 can store objects up to 5 TB
Lifecycle policies - Automatically transition old data to cheaper storage classes (Glacier, Deep Archive)

Hybrid approach

For tables with many large text attributes, combine both approaches:

Example architecture:

Product catalog table:

Short description (uncompressed): Used for search/filtering, <2 KB
Long description (compressed): Displayed on product pages, 10-50 KB
User manual PDF (S3): Rarely accessed, >500 KB, store S3 path in DynamoDB

This approach optimizes for both cost and performance across different access patterns.