Infralyst | Automated Terraform PRs to cut AWS costs

Introduction

ECS Fargate bills per vCPU-hour and per GB-hour. Unlike EC2, there's no instance to share across workloads. Every unit of CPU and memory you allocate to a task definition is billed whether your containers use it or not.

Most teams pick a CPU and memory configuration when they first create a service and never revisit it. The service works, so nobody looks again. Over time, that adds up to real money spent on capacity that sits idle.

This post covers downsizing: reducing the CPU and memory allocated to each task. It doesn't cover autoscaling policies or adjusting task count. Those are separate levers. Downsizing targets the per-task allocation, which determines what you pay for every running copy of the task, regardless of how many copies there are.

The change itself is simple. You're updating two values in a task definition: the CPU and memory allocation. But picking the right target requires looking at the right metrics over a long enough window.

How Fargate pricing works

Fargate charges are based on the CPU and memory you allocate at the task definition level, not what your containers actually consume. A task configured for 1 vCPU and 4 GB of memory costs the same whether your application uses 5% of that or 95%.

This means ECS Fargate cost optimization comes down to one question: can you allocate less CPU and memory without affecting performance? If your service never pushes past 20% CPU and 30% memory, you're paying for headroom you don't need.

Valid Fargate configurations

Fargate doesn't let you pick arbitrary CPU and memory values. You choose from a fixed set of configurations:

CPU (units)	vCPU	Memory options (MB)
256	0.25	512, 1024, 2048
512	0.5	1024, 2048, 3072, 4096
1024	1	2048, 3072, 4096, 5120, 6144, 7168, 8192
2048	2	4096 through 16384 (1024 increments)
4096	4	8192 through 30720 (1024 increments)
8192	8	16384 through 61440 (4096 increments)
16384	16	32768 through 122880 (8192 increments)

This table matters because your target configuration has to be one of these combinations. You can't just halve the memory and call it done. You need to land on a valid pair.

Pre-flight checks

Before looking at metrics, rule out services that aren't good candidates.

Already at the smallest configuration. If a service is running at 0.25 vCPU and 512 MB, there's nowhere to go.

Shared task definitions. If multiple ECS services share the same task definition family, evaluate utilization across all of them. If any service needs the current allocation, you can't safely reduce it. Separate task definitions make independent downsizing simpler.

How much data you need

Use at least 30 days of utilization data. The service should have been running with its current CPU and memory configuration for the entire observation window. If you recently changed the configuration, the old metrics don't apply to the current allocation.

Within that window, check that the service was actually running for at least 95% of the time. A service that was stopped for two of the last four weeks doesn't have enough data to make a confident decision. ECS Fargate reports metrics at 5 minute intervals by default, so you should expect roughly 288 data points per day.

More data is better. If you have 60 days or more, you can run seasonality checks (more on that below). Up to 365 days is useful for catching annual patterns, but anything older than a year is stale.

CPU

Use P99.5 CPU utilization as your primary signal. If P99.5 is below 40% across the observation window, the service is a candidate for downsizing.

Why 40%? Because going down one CPU tier in Fargate typically halves your available CPU. A service sitting at 40% P99.5 on 1 vCPU will peak around 80% on 0.5 vCPU. The average will be much lower, so there's still headroom for normal variation.

Why P99.5 and not P95 or average? P99.5 captures nearly all real usage while filtering out the handful of one-off spikes from deploys or restarts. Averages hide peaks. P95 is reasonable but can miss the tail that matters most after a downsize. It's less noisy than raw max (which overreacts to blips) but doesn't smooth over the peaks that matter.

Fargate reports CPU utilization as a percentage of the allocated CPU. You can find this in CloudWatch under the ECS namespace, broken down by cluster and service name. Most monitoring tools that pull from CloudWatch will surface it. However you query the data, compute the percentile across all 5 minute data points in your observation window.

Memory

The same threshold applies: P99.5 memory utilization below 40%.

Unlike standalone EC2 instances, Fargate publishes memory utilization to CloudWatch automatically. You don't need to install an agent. This makes the memory check straightforward, and we recommend you always include it. The metric represents memory usage as a percentage of the allocated task-level memory.

If your service has multiple containers in the same task, the utilization metric reflects the aggregate across all containers. That's what you want, since the task-level memory allocation is what Fargate bills for and what you're downsizing.

Seasonal spike detection

CPU and memory thresholds tell you whether the service is oversized right now. Some workloads have predictable peaks that only show up at certain times: end of quarter processing, holiday traffic, monthly batch jobs.

If you have at least 60 days of data, check for seasonality by breaking the observation window into weekly buckets and computing P95 for each bucket. If any single week exceeds 40% CPU or memory utilization, hold off on the downsize.

Why P95 instead of P99.5 for seasonality? The weekly buckets have fewer data points, so P99.5 would be too sensitive to individual outliers within a single week. P95 gives a more stable signal at this granularity.

If you have less than 60 days of data, you can still proceed with the P99.5 check from the previous sections, but be aware of the blind spot. Check back after you've accumulated more history.

One step at a time

Don't jump two tiers at once. If a service is running at 2 vCPU and 8 GB, don't go straight to 0.5 vCPU and 2 GB even if the metrics look like it would fit. Drop one configuration step, observe for a few weeks, then re-evaluate.

When both CPU and memory can be reduced, try reducing memory first (staying at the same CPU tier) before dropping the CPU tier. Memory reductions within the same CPU tier are lower risk because they don't affect CPU scheduling. If the current memory is already at the minimum for the CPU tier, then reduce CPU.

Refer back to the valid configurations table. If your service is at 1 vCPU / 8 GB, you could first try 1 vCPU / 4 GB. If that holds up, evaluate whether a CPU reduction to 0.5 vCPU makes sense on the next cycle.

Container-level limits

Fargate task definitions have two levels of resource allocation: the task level and the container level.

When you reduce the task-level values, make sure no individual container's limits exceed the new allocation. If a container has a hard memory limit of 4 GB and you're dropping the task to 2 GB, the deployment will fail.

Check each container in the task definition for:

CPU reservation: must not exceed the new task-level CPU
Hard memory limit: must not exceed the new task-level memory
Soft memory limit (reservation): must not exceed the new task-level memory

However you manage your task definitions (console, CLI, infrastructure as code), make sure to update any container-level limits that would exceed the new task-level values.

Deployment configuration

Before applying the change, make sure your ECS service has a deployment configuration that allows a safe rollout. The key settings are the minimum healthy percent and maximum percent on the service.

A reasonable starting point:

1 task: minimum healthy percent at 0, maximum at 200. With a single task, ECS needs to be able to stop it and start the replacement. This means brief downtime.
2 to 5 tasks: minimum healthy percent at 50, maximum at 200. Keeps at least half the tasks running during the update.
6+ tasks: minimum healthy percent at 100, maximum at 200. Full capacity maintained throughout.

Watch out for the deadlock case: if your service runs a single task with minimum healthy percent at 100 and maximum percent at 100, ECS can't start a new task (already at max) and can't stop the old one (would drop below min). The deployment will hang. Check this before applying.

Rollback

Have a plan to revert before you apply the change. However you manage your infrastructure, make sure you can quickly switch back to the previous CPU and memory values. The same deployment configuration that got you to the smaller size will roll you back to the larger one.

Watch CPU and memory closely for the first few days after the change. If either metric starts consistently hitting the new ceiling, roll back and revisit. Don't stack the downsize with other changes (deploys, config updates, scaling policy changes) in the same window. You want to isolate the variable.

Conclusion

ECS Fargate cost optimization is simpler than EC2 rightsizing in some ways. Fargate publishes both CPU and memory metrics automatically, the configuration space is a fixed set of valid pairs, and there are no disk or network tiers to worry about. Get at least 30 days of data. Confirm P99.5 CPU and memory are both under 40%. Check for seasonal spikes if you have enough history. Step down one configuration at a time. The same thresholds and methodology from EC2 rightsizing apply here, just adapted to Fargate's fixed configuration tiers.

How to Downsize ECS Fargate Tasks