Provisioned Concurrency

Use Lambda provisioned concurrency to remove cold starts and, at high utilization, reduce end-to-end latency and effective compute cost.

Why it matters

  • Provisioned Concurrency keeps execution environments warm so you can eliminate cold starts and avoid burst throttling for critical paths.
  • When those environments are well utilized, the combined provisioned + duration pricing can be cheaper than pure on-demand; when they’re not, you usually pay more overall.

Cost model

  • You pay for:
    • Provisioned concurrency: GB-seconds for the pre-warmed environments, metered separately from standard Lambda and not covered by the Lambda free tier.
    • Invocations: requests and duration, similar to standard Lambda, on top of the provisioned charge.
  • It becomes cost-effective only when a provisioned environment is busy for a large portion of the hour; underused capacity is pure overhead and will usually cost more than staying fully on-demand.

The chart below compares the effective cost of standard on-demand Lambda with provisioned concurrency across different utilization levels. You need to use around 60% of the provisioned concurrency to make it cost-effective.

When to use

✓ Consistent or slowly-varying traffic patterns
✓ Low-latency requirements where cold starts are unacceptable
✓ Functions with a clear baseline of concurrent executions
✓ Workloads where you can keep provisioned environments well utilized

Implementation

Scheduled scaling

For predictable daily/weekly patterns:

# Set provisioned concurrency
aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --provisioned-concurrent-executions 10

Application Auto Scaling

For spiky or hard-to-predict patterns:

# Register target
aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:my-function:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrentExecutions \
  --min-capacity 5 \
  --max-capacity 100

# Create scaling policy
aws application-autoscaling put-scaling-policy \
  --service-namespace lambda \
  --resource-id function:my-function:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrentExecutions \
  --policy-name target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration file://config.json

Practical tips

  • Use CloudWatch metrics (ConcurrentExecutions, ProvisionedConcurrentExecutions, ProvisionedConcurrencyUtilization) to find your baseline before turning this on.
  • Start with a small provisioned value that covers your steady-state concurrency, not the peak.
  • Use scheduled scaling when you know peak hours; add target tracking for unpredictable surges.
  • Re-evaluate provisioned levels after major code changes or traffic pattern shifts.
  • Pair provisioned concurrency with Optimize Function Performance and Right-Size Memory first, so you’re paying to pre-warm functions that are already efficient.