Ephemeral storage in AWS Lambads?
Ephemeral storage in AWS Lambda is a temporary storage provided in the form of a directory(/tmp
) on the lambda file system. This storage is unique to each lambda execution environment.
You can read, write, and do all sorts of file operations to this directory. Multiple lambda invocations can share the same execution environments, so even though the storage is temporary, it can be shared across multiple lambda invocations.
By default, all lambdas come with 512MB of ephemeral storage, however, the storage can be extended up to 10,240MB in 1MB increments. The default 512MB comes at no extra cost to your lambda.
Why use Ephemeral Storage?
Well, it's available out of the box in your lambda instance, why not use it? 😄
There are several use cases for ephemeral storage in AWS Lambdas. In general, any form of lambda operation that can benefit from a file system or sharing temporary states across multiple lambda invocations(caching 👀) can benefit from ephemeral storage in AWS Lambdas.
Use Case: Zip Up S3 Files
Zipping is a common use case in many software applications that deliver bulk files to clients/customers efficiently over the internet. In this article, we will explore a practical example of leveraging ephemeral storage in AWS lambda to zip S3 files. The example lambda will receive a list of S3 keys as input, it will zip up the files(leveraging the ephemeral storage), and upload the zipped output to S3. Below is the source code(written in TypeScript) of the lambda.
import { GetObjectCommand, PutObjectCommand, S3Client } from '@aws-sdk/client-s3';
import { createReadStream, createWriteStream } from 'fs';
import { mkdir, rm } from 'fs/promises';
import path from 'path';
import { Readable } from 'stream';
import archiver from 'archiver';
import { randomUUID } from 'crypto';
const s3Bucket = 'zip-files-test';
const s3Client = new S3Client({ region: process.env.AWS_REGION });
const streamS3ObjectToFile = async (s3Key: string, filePath: string) => {
const { Body } = await s3Client.send(new GetObjectCommand({
Bucket: s3Bucket,
Key: s3Key
}));
if (!Body) throw Error(`S3 object not found at: ${s3Key}`);
const writeStream = createWriteStream(filePath);
return new Promise((res, rej) => {
(Body as Readable)
.pipe(writeStream)
.on('error', (error) => rej(error))
.on('close', () => res('ok'));
})
}
const archiveFiles = (filePaths: string[], outputFilePath: string) => {
return new Promise((res, rej) => {
const output = createWriteStream(outputFilePath);
output.on('close', () => {
console.log(archive.pointer() + ' total bytes');
res('ok');
});
const archive = archiver('zip', { zlib: { level: 9 } });
archive.on('error', (err) => rej(err));
archive.pipe(output);
filePaths.forEach(filePath => archive.file(filePath, { name: path.basename(filePath) }));
archive.finalize();
})
}
export const handler = async (event: { inputS3Keys: string[]; outputS3Key: string; }) => {
const { inputS3Keys, outputS3Key } = event;
// Basic validation of event data
if (!Array.isArray(inputS3Keys) || typeof outputS3Key !== 'string') {
throw Error('Provide list of s3 keys');
}
// create a sub-directory in ephemeral storage(/tmp)
const tmpFolder = `/tmp/${randomUUID()}`;
await mkdir(tmpFolder);
// Stream S3 files to tmp storage
const tmpFiles: string[] = [];
const streamFilesAsynchronously = inputS3Keys.map(async (s3Key) => {
const fileName = path.basename(s3Key);
const filePath = `${tmpFolder}/${fileName}`;
await streamS3ObjectToFile(s3Key, filePath);
tmpFiles.push(filePath);
})
await Promise.all(streamFilesAsynchronously);
// Zip files
const zipFilePath = `${tmpFolder}/${path.basename(outputS3Key)}`;
await archiveFiles(tmpFiles, zipFilePath);
// Upload zip output
await s3Client.send(new PutObjectCommand({
Body: createReadStream(zipFilePath),
Bucket: s3Bucket,
Key: outputS3Key,
}));
// Remove all files written to /tmp
await rm(tmpFolder, { recursive: true, force: true });
console.log('Done!');
};
In the source code above, there are 3 primary functions:
streamS3ObjectToFile
: This function will stream the s3 object defined by thes3Key
parameter to a file path defined by thefilePath
parameter.archiveFiles
: This function will archive a list of files defined by theinputFilePaths
parameter and write the resulting zipped output to a file defined by theoutputFilePath
parameter.handler
: This is the core function executed on invocation of the lambda. The function will extract the inputs from the event object, callstreamS3ObjectToFile
to stream the input files to the lambda ephemeral storage, archive the files, store the archived output to ephemeral storage, upload the zipped file to s3, and then delete the content written to/tmp
folder.
Testing:
Stream to File(ephemeral storage) vs In-Memory: To minimize memory usage in the Lambda function, I opted to stream all S3 objects to ephemeral storage in the /tmp directory instead of loading them into memory. Even the archiving process was performed by streaming to the ephemeral storage. The alternative—loading S3 objects and performing the archiving operation in-memory using buffers—would have significantly increased the Lambda's memory requirements. For context, streaming to files allowed me to compress a collection of files totaling around 300MB using a Lambda with just 128MB of RAM (the minimum configuration). In contrast, handling the same files in-memory would have required at least 300MB of memory just to load them, not to mention the additional memory needed for processing.
Pro Tip - Cleanup /tmp
files: While the /tmp
folder is temporary, there's no guarantee of when the content of the /tmp
folder will be destroyed. AWS won't auto-delete the content of the ephemeral storage when the lambda is finished, in fact, this /tmp
folder will be shared across multiple lambda invocations that use the same execution context. For this reason, it's encouraged to clean up whatever you write to the /tmp
folder unless you deliberately want to share the data across multiple lambda invocations e.g for caching.
Conclusion:
Ephemeral storage is a powerful feature that shouldn't be overlooked in AWS Lambdas. I've found it particularly very useful for heavy data processing and complex media/graphic processing tasks.
Are you leveraging ephemeral storage for something interesting, please share in the comment section.