Provisioned Concurrency | Lambda cost optimization

Why it matters

Provisioned Concurrency keeps execution environments warm so you can eliminate cold starts and avoid burst throttling for critical paths.
When those environments are well utilized, the combined provisioned + duration pricing can be cheaper than pure on-demand; when they’re not, you usually pay more overall.

Cost model

You pay for:
- Provisioned concurrency: GB-seconds for the pre-warmed environments, metered separately from standard Lambda and not covered by the Lambda free tier.
- Invocations: requests and duration, similar to standard Lambda, on top of the provisioned charge.
It becomes cost-effective only when a provisioned environment is busy for a large portion of the hour; underused capacity is pure overhead and will usually cost more than staying fully on-demand.

The chart below compares the effective cost of standard on-demand Lambda with provisioned concurrency across different utilization levels. You need to use around 60% of the provisioned concurrency to make it cost-effective.

When to use

✓ Consistent or slowly-varying traffic patterns
✓ Low-latency requirements where cold starts are unacceptable
✓ Functions with a clear baseline of concurrent executions
✓ Workloads where you can keep provisioned environments well utilized

Implementation

Scheduled scaling

For predictable daily/weekly patterns:

# Set provisioned concurrency
aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --provisioned-concurrent-executions 10

Application Auto Scaling

For spiky or hard-to-predict patterns:

# Register target
aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:my-function:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrentExecutions \
  --min-capacity 5 \
  --max-capacity 100

# Create scaling policy
aws application-autoscaling put-scaling-policy \
  --service-namespace lambda \
  --resource-id function:my-function:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrentExecutions \
  --policy-name target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration file://config.json

Practical tips

Use CloudWatch metrics (ConcurrentExecutions, ProvisionedConcurrentExecutions, ProvisionedConcurrencyUtilization) to find your baseline before turning this on.
Start with a small provisioned value that covers your steady-state concurrency, not the peak.
Use scheduled scaling when you know peak hours; add target tracking for unpredictable surges.
Re-evaluate provisioned levels after major code changes or traffic pattern shifts.
Pair provisioned concurrency with Optimize Function Performance and Right-Size Memory first, so you’re paying to pre-warm functions that are already efficient.