Best Practices for Rightsizing EC2 Instances in 2025

Safe, practical methods for downsizing EC2 within the same family using CPU and memory percentiles, lookback windows, and simple guardrails.

AWSEC2
Best Practices for Rightsizing EC2 Instances in 2025

Introduction

Rightsizing is the quickest way to reduce EC2 spend while keeping performance steady. This post focuses on downsizing within the same instance family, one size at a time, using conservative thresholds. The goal is simple: gather enough history, read the right metrics, and apply a few safety checks so you can shrink instances without surprises.

How long to observe

Use at least 30 days of data, preferably 90. If you can, also scan 365 days to catch seasonal peaks. Ensure the instance actually ran for more than half of the window so the data is representative. A useful practice is to examine the worst 7 day block inside your window and confirm that it still meets the CPU thresholds below.

CPU first

CPU is the only useful signal you get out of the box in CloudWatch, so start here. For a safe downsize, keep average CPU under 20 percent across the lookback. Check the high end too: p95 should be at or below 50 percent, p99 at or below 70 percent, and the maximum should not exceed about 85 percent once you exclude obvious reboot or deploy blips. If the worst 7 day block violates these numbers, wait or collect more data.

Memory matters

Downsizing without understanding memory usage is risky because a smaller size reduces RAM. If you have memory metrics, treat p95 under 70 percent and p99 under 85 percent as healthy, and keep swap p95 under 10 percent. If you do not have memory, proceed only when CPU looks very low and stable, and include a clear note that memory was not observed. Many teams add the CloudWatch Agent later to upgrade this check.

Disk and network guardrails

A smaller size can also change bandwidth and IOPS ceilings. Use simple guards to avoid landing below what the workload needs. Keep EBS VolumeQueueLength p95 below 1.0 and ensure gp2 BurstBalance p95 stays above 80 percent when applicable. If you look up the smaller size limits, aim to keep EBS and network p95 below roughly half of those caps. If you do not check caps yet, at least confirm there is no obvious saturation trend in IOPS or throughput.

Burstable instances

T family instances accumulate CPU credits during low usage and spend them during bursts. Before downsizing a T instance, confirm CPUCreditBalance is not trending toward zero and that CPUSurplusCreditsCharged is zero across the lookback. A workload that lives on constant bursting is a poor downsizing candidate.

Seasonality and recent changes

Yearly peaks, monthly jobs, and end of quarter spikes can hide in shorter windows. That is why a 90 day lookback plus a 365 day scan is ideal when available. Also avoid acting within about 14 days of a major deploy or a recent upsizing, since metrics can be noisy or unrepresentative.

When not to downsize

Do not shrink instances that are memory bound even if CPU is idle. Do not move to a size that removes required instance store NVMe, reduces ENI count below what you use, or lowers network or EBS ceilings below observed p95 needs. If any single safeguard fails, keep the current size and recheck next cycle.

Conclusion

Keep rightsizing simple and conservative. Use 30 to 90 days of history, validate CPU with averages and percentiles, confirm memory headroom when available, and run a few disk and network guards. One size down within the same family is usually enough to capture savings without drama. Revisit the data monthly and tighten the process as your observability improves.

Automate Your EC2 Savings in One Click

Infralyst continuously runs every check you just learned, 24×7. Join the early-access list and let us cut the bill for you.

© Luna Forge Ltd. All rights reserved.
Built in London with ❤️ by the Infralyst team.