What is attribute compression?
DynamoDB supports storing compressed values using algorithms like GZIP or LZO, which produce binary output that can be stored in a Binary attribute type.1 By compressing long text strings or binary data, you can reduce the size of your items, leading to lower storage costs and fewer RCUs/WCUs consumed per operation.
Key benefits:
- Reduced storage costs - Smaller items mean lower storage charges
- Lower throughput consumption - Fewer capacity units consumed per read/write
- Overcome size limits - Compress large attributes to fit within the 400 KB item size limit
- Cost-effective at scale - Savings compound with high-volume tables
Example scenario:
Product reviews table with uncompressed text:
- Average review size: 8 KB
- 1 million reviews
- Storage cost: 8 GB × $0.25/GB = $2.00/month
With GZIP compression (70% reduction):
- Compressed review size: 2.4 KB
- Storage cost: 2.4 GB × $0.25/GB = $0.60/month
- Monthly savings: $1.40/month (70% reduction)
Implementation
Client-side compression
Compression must be performed client-side before writing to DynamoDB.1
Python example:
import boto3
import gzip
from decimal import Decimal
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ProductReviews')
# Compress review text
review_text = "This is a very long product review..." * 100
compressed_review = gzip.compress(review_text.encode('utf-8'))
# Store compressed binary in DynamoDB
table.put_item(
Item={
'product_id': '12345',
'review_id': 'review-001',
'compressed_review': compressed_review, # Binary attribute
'review_length': len(review_text),
'compressed_length': len(compressed_review)
}
)
# Retrieve and decompress
response = table.get_item(Key={'product_id': '12345', 'review_id': 'review-001'})
decompressed_review = gzip.decompress(response['Item']['compressed_review']).decode('utf-8')
print(f"Original size: {response['Item']['review_length']} bytes")
print(f"Compressed size: {response['Item']['compressed_length']} bytes")
Important limitations
Filtering and querying
Compressed attribute values cannot be used for filtering operations directly in DynamoDB queries or scans.1 To filter on compressed data:
- Retrieve all potential data sets using Query or Scan
- Decompress values in your application code
- Apply filters programmatically in your application
Performance trade-offs
Compression overhead:
- CPU overhead - Compression/decompression requires client-side CPU resources
- Latency impact - Additional processing time for compress/decompress operations
- Memory usage - Must hold uncompressed data in memory during processing
Throughput considerations:
- Compressed items consume fewer capacity units, reducing costs
- But compression adds client-side processing time
- For latency-sensitive applications, test compression impact before implementing at scale
Alternative: Store large attributes in S3
For extremely large attributes, consider storing the data in Amazon S3 and storing only the S3 path in DynamoDB:2
Benefits of S3 offloading
- Lower storage costs - S3 storage ($0.023/GB) is 91% cheaper than DynamoDB ($0.25/GB)
- No item size limits - S3 can store objects up to 5 TB
- Lifecycle policies - Automatically transition old data to cheaper storage classes (Glacier, Deep Archive)
Hybrid approach
For tables with many large text attributes, combine both approaches:
Example architecture:
Product catalog table:
- Short description (uncompressed): Used for search/filtering, <2 KB
- Long description (compressed): Displayed on product pages, 10-50 KB
- User manual PDF (S3): Rarely accessed, >500 KB, store S3 path in DynamoDB
This approach optimizes for both cost and performance across different access patterns.
Resources
- DynamoDB Data Types - Binary
- DynamoDB Item Size and Format
- Best Practices for Storing Large Values
- AWS SDK for Python (Boto3)
- Amazon S3 Pricing