Why it matters
- Provisioned Concurrency keeps execution environments warm so you can eliminate cold starts and avoid burst throttling for critical paths.
- When those environments are well utilized, the combined provisioned + duration pricing can be cheaper than pure on-demand; when they’re not, you usually pay more overall.
Cost model
- You pay for:
- Provisioned concurrency: GB-seconds for the pre-warmed environments, metered separately from standard Lambda and not covered by the Lambda free tier.
- Invocations: requests and duration, similar to standard Lambda, on top of the provisioned charge.
- It becomes cost-effective only when a provisioned environment is busy for a large portion of the hour; underused capacity is pure overhead and will usually cost more than staying fully on-demand.
The chart below compares the effective cost of standard on-demand Lambda with provisioned concurrency across different utilization levels. You need to use around 60% of the provisioned concurrency to make it cost-effective.
When to use
✓ Consistent or slowly-varying traffic patterns
✓ Low-latency requirements where cold starts are unacceptable
✓ Functions with a clear baseline of concurrent executions
✓ Workloads where you can keep provisioned environments well utilized
Implementation
Scheduled scaling
For predictable daily/weekly patterns:
# Set provisioned concurrency
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--provisioned-concurrent-executions 10
Application Auto Scaling
For spiky or hard-to-predict patterns:
# Register target
aws application-autoscaling register-scalable-target \
--service-namespace lambda \
--resource-id function:my-function:prod \
--scalable-dimension lambda:function:ProvisionedConcurrentExecutions \
--min-capacity 5 \
--max-capacity 100
# Create scaling policy
aws application-autoscaling put-scaling-policy \
--service-namespace lambda \
--resource-id function:my-function:prod \
--scalable-dimension lambda:function:ProvisionedConcurrentExecutions \
--policy-name target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration file://config.json
Practical tips
- Use CloudWatch metrics (
ConcurrentExecutions,ProvisionedConcurrentExecutions,ProvisionedConcurrencyUtilization) to find your baseline before turning this on. - Start with a small provisioned value that covers your steady-state concurrency, not the peak.
- Use scheduled scaling when you know peak hours; add target tracking for unpredictable surges.
- Re-evaluate provisioned levels after major code changes or traffic pattern shifts.
- Pair provisioned concurrency with Optimize Function Performance and Right-Size Memory first, so you’re paying to pre-warm functions that are already efficient.