tl;dr;
Automated periodic deletion of ECR container images is a straightforward and effective way to optimize AWS costs. By leveraging Lambda functions and Step Functions, you can implement custom policies that meet your specific needs, ensuring that only necessary images are retained.
Introduction
Managing AWS costs can be challenging, especially with the increasing use of Elastic Container Registry (ECR) for storing container images. I've found that one effective way to cut costs is by periodically deleting unnecessary ECR container images. In this guide, I'll walk you through the steps to set up an automated cleanup process using Go.
Why Optimize ECR Storage?
ECR is a great tool for storing Docker container images, but as your CI/CD pipelines push more images, storage costs can quickly add up. Without regular cleanup, these costs can become significant. By implementing a strategy to automatically delete old or unused images, you can save money and keep your storage lean.
Using ECR Lifecycle Policies
ECR lifecycle policies are a built-in way to manage image cleanup. They allow you to set rules for automatically deleting images based on criteria such as age or tag. However, lifecycle policies have limitations, especially when you need to combine multiple conditions.
Challenges with ECR Lifecycle Policies
While ECR lifecycle policies provide a good starting point, they have limitations:
Single Condition Policies: ECR lifecycle policies are designed to handle single-condition rules easily. For example, you can delete images older than a specific number of days or keep only the most recent N images. However, they struggle when you need to combine multiple conditions, such as "delete images older than X days and not among the latest N images."
AND Conditions: The inability to use AND conditions in lifecycle policies means you can't create complex rules directly. For example, if you want to delete images that are older than 30 days and not part of the latest 10 images, you can't do this with a single lifecycle policy. You need a more sophisticated solution to handle such cases.
Granular Control: Lifecycle policies provide limited control over the exact criteria used for image deletion. If your requirements are specific, such as retaining images based on custom tags or metadata, lifecycle policies may not suffice.
Global vs. Repository-Specific Rules: Defining rules that apply globally to all repositories can be challenging. Lifecycle policies need to be set up for each repository individually, which can become cumbersome in environments with many repositories.
Custom Cleanup Solution
To overcome the limitations of lifecycle policies, we can use AWS Lambda functions and Step Functions to create a custom cleanup process. This approach offers more flexibility and control over which images get deleted.
Workflow Overview
Our custom solution involves the following steps:
- GetContainerRepositories Lambda Function: Retrieves a list of all ECR repositories in your AWS account.
- DeleteExpiredContainerImages-Map State: Processes each repository's image list.
- DeleteExpiredContainerImages Lambda Function: Evaluates and deletes images based on specified criteria.
Here's a visual representation of the workflow:
Implementation Details
Let's dive into the implementation of each step using Go.
- GetContainerRepositories: This Lambda function fetches a list of all ECR repositories and returns their details as JSON.
package main
import (
"context"
"log"
"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/ecr"
)
type ImageDetail struct {
ImageDigest string `json:"imageDigest"`
ImagePushedAt string `json:"imagePushedAt"`
}
type Response struct {
Images []ImageDetail `json:"images"`
}
func getImages(repositoryName string) ([]ImageDetail, error) {
svc := ecr.New(session.New())
var images []ImageDetail
input := &ecr.DescribeImagesInput{
RepositoryName: aws.String(repositoryName),
}
err := svc.DescribeImagesPages(input, func(page *ecr.DescribeImagesOutput, lastPage bool) bool {
for _, image := range page.ImageDetails {
images = append(images, ImageDetail{
ImageDigest: *image.ImageDigest,
ImagePushedAt: image.ImagePushedAt.String(),
})
}
return !lastPage
})
return images, err
}
func handleRequest(ctx context.Context) (Response, error) {
repositoryName := "my-repository"
images, err := getImages(repositoryName)
if err != nil {
return Response{}, err
}
return Response{Images: images}, nil
}
func main() {
lambda.Start(handleRequest)
}
DeleteExpiredContainerImages-Map: This Map state iterates through each repository and invokes the
DeleteExpiredContainerImages
Lambda function.DeleteExpiredContainerImages: This Lambda function evaluates which images should be deleted based on criteria such as retaining the latest N images and those pushed within the last X days.
package main
import (
"context"
"time"
<span class="s">"github.com/aws/aws-lambda-go/lambda"</span>
<span class="s">"github.com/aws/aws-sdk-go/aws"</span>
<span class="s">"github.com/aws/aws-sdk-go/aws/session"</span>
<span class="s">"github.com/aws/aws-sdk-go/service/ecr"</span>
)
type ImageDetail struct {
ImageDigest string json:"imageDigest"
ImagePushedAt time.Time json:"imagePushedAt"
}
type Request struct {
RepositoryName string json:"repositoryName"
Images []ImageDetail json:"images"
}
func filterExpiredImages(images []ImageDetail) []ImageDetail {
const (
retainImageCount = 10
retainSinceImagePushedDays = 30
)
<span class="k">var</span> <span class="n">toDelete</span> <span class="p">[]</span><span class="n">ImageDetail</span>
<span class="n">now</span> <span class="o">:=</span> <span class="n">time</span><span class="o">.</span><span class="n">Now</span><span class="p">()</span>
<span class="n">retainLimit</span> <span class="o">:=</span> <span class="n">now</span><span class="o">.</span><span class="n">AddDate</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="o">-</span><span class="n">retainSinceImagePushedDays</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">images</span><span class="p">)</span> <span class="o">></span> <span class="n">retainImageCount</span> <span class="p">{</span>
<span class="n">images</span> <span class="o">=</span> <span class="n">images</span><span class="p">[</span><span class="o">:</span><span class="n">retainImageCount</span><span class="p">]</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">image</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">images</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">image</span><span class="o">.</span><span class="n">ImagePushedAt</span><span class="o">.</span><span class="n">Before</span><span class="p">(</span><span class="n">retainLimit</span><span class="p">)</span> <span class="p">{</span>
<span class="n">toDelete</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">toDelete</span><span class="p">,</span> <span class="n">image</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">toDelete</span>
}
func deleteImages(svc ecr.ECR, repositoryName string, imageIds []string) error {
input := &ecr.BatchDeleteImageInput{
RepositoryName: aws.String(repositoryName),
ImageIds: make([]ecr.ImageIdentifier, 0, len(imageIds)),
}
for _, id := range imageIds {
input.ImageIds = append(input.ImageIds, &ecr.ImageIdentifier{ImageDigest: aws.String(id)})
}
<span class="n">_</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">svc</span><span class="o">.</span><span class="n">BatchDeleteImage</span><span class="p">(</span><span class="n">input</span><span class="p">)</span>
<span class="k">return</span> <span class="n">err</span>
}
func handleRequest(ctx context.Context, request Request) (string, error) {
svc := ecr.New(session.New())
toDelete := filterExpiredImages(request.Images)
var imageIds []string
for _, image := range toDelete {
imageIds = append(imageIds, image.ImageDigest)
}
err := deleteImages(svc, request.RepositoryName, imageIds)
if err != nil {
return "Failed to delete images", err
}
return "Successfully deleted images", nil
}
func main() {
lambda.Start(handleRequest)
}
Periodic Triggers
To automate this process, schedule the Step Functions state machine using EventBridge rules. For instance, you can set it to run weekly on Friday nights.
Example Policies
Here are example policies showing both possible and not possible implementations:
Implementation Possible
Older than X days since push | Included in latest N images? | Action |
---|---|---|
✅ | ✅ | Delete |
✅ | ❌ | Delete |
❌ | ✅ | Delete |
❌ | ❌ | Keep |
Implementation Not Possible
Older than X days since push | Included in latest N images? | Action |
---|---|---|
✅ | ✅ | Delete |
✅ | ❌ | Keep |
❌ | ✅ | Keep |
❌ | ❌ | Keep |
Results
By implementing this periodic deletion strategy, you can significantly reduce your ECR storage costs. In my experience, this approach led to substantial savings, cutting unnecessary expenses and optimizing our AWS usage.
Thank you for reading, and happy optimizing!
For more tips and insights on security and log analysis, follow me on Twitter @Siddhant_K_code and stay updated with the latest & detailed tech content like this.