Reliable S3 Data Replication: Automatically Mirror New Files Without Worrying About Deletions

Rahul Kumar Sharma - Aug 17 - - Dev Community

Ensuring the security and redundancy of your data is crucial in today's data-driven environment. Duplicating your data over several storage places is a good approach to protect it. We'll discuss how to automatically copy data from a primary S3 bucket to a backup bucket in this blog article. This will make sure that your important files are always safe and backed up, even in the event that the primary bucket is filled with garbage.

A primary S3 bucket, event notifications, an AWS Lambda function, and a backup S3 bucket are the main elements of the architecture. These elements cooperate in the following ways to provide smooth data replication:

  • Primary Bucket: You upload your stuff to the primary S3 bucket here. An event notice is sent whenever a new file is added to this bucket. Although it is merely a portion of the solution, this bucket serves as the source of truth for your data.

  • Event Notification: You can set up AWS S3 to send out notifications for particular events, such the uploading of a new file. Here, the event notification is configured to be triggered each time a file is added to the main bucket. This notice initiates the subsequent stage in the process; it doesn't carry out any action on its own.

  • AWS Lambda Function: An AWS Lambda function is automatically called when an event notification is received. A little serverless piece of code that executes in response to events is called a lambda function. The freshly uploaded file is copied from the primary bucket to the backup bucket by this function in our setup. The backup bucket is always updated with the most recent files thanks to this nearly fast process.

  • Backup Bucket: The replicated files are kept in the backup S3 bucket. The backup bucket is set up to keep files even when they are removed from the primary bucket, in contrast to the latter. This implies that the backup copy stays secure and undamaged in the backup bucket even if a file is unintentionally or purposely deleted from the primary bucket.

How S3 Replication Work?

Source: AWS
Source:AWS

Why do we need this?

  • Data Redundancy: Every file uploaded to the primary bucket is immediately replicated to a backup site thanks to this architecture. This redundancy, which offers a backup copy of your data that you can rely on in the event that the primary bucket's data is lost, is essential for disaster recovery and data security.

  • Protection Against Deletion: The backup bucket does not synchronize deletions, which is one of this setup's most notable characteristics. A file stays in the backup bucket even after it is removed from the primary bucket. Because you can always restore the file from the backup bucket, this is especially helpful in preventing inadvertent data loss.

Steps:
Step1: Create Two S3 Buckets: primary-bucket and backup-bucket.
Step2: Create an IAM Role for the Lambda Function

  • Go to the IAM service in the AWS Management Console.
  • Click on "Roles" in the left-hand menu, then "Create role".
  • Select "AWS service" and choose "Lambda".
  • Click "Next: Permissions".
  • Click "Create policy" and go to the JSON tab.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::primary-bucket-7",
        "arn:aws:s3:::primary-bucket-7/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::backup-bucket-7/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "sns:Publish"
      ],
      "Resource": "arn:aws:sns:ap-south-1:965519929135:s3DataBackUpSNS"
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Attack the policy to Lambda role.

Step3: Create an SNS Topic and Subscribe Your Email.
Step4: Create a Lambda function.

import boto3
import urllib.parse

s3 = boto3.client('s3')
sns = boto3.client('sns')

def lambda_handler(event, context):
    for record in event['Records']:
        source_bucket = record['s3']['bucket']['name']
        source_key = urllib.parse.unquote_plus(record['s3']['object']['key'])
        destination_bucket = 'backup-bucket-7'
        copy_source = {'Bucket': source_bucket, 'Key': source_key}

        try:
            # Copy the object to the backup bucket
            s3.copy_object(CopySource=copy_source, Bucket=destination_bucket, Key=source_key)
            print(f'Successfully copied {source_key} from {source_bucket} to {destination_bucket}')

            # Send SNS notification
            sns.publish(
                TopicArn='arn:aws:sns:ap-south-1:965519929135:s3DataBackUpSNS',
                Subject='File Uploaded to Backup Bucket',
                Message=f'The file {source_key} has been successfully uploaded to {destination_bucket}.'
            )
            print(f'Successfully sent SNS notification for {source_key}')
        except Exception as e:
            print(f'Error copying {source_key} from {source_bucket} to {destination_bucket}: {str(e)}')
            raise e

Enter fullscreen mode Exit fullscreen mode

The source bucket in the Lambda function is dynamically determined based on the event that triggers the function. This means that we don't need to hard-code the source bucket name in the Lambda function code. Instead, the source bucket is extracted from the event record whenever a new object is uploaded to the primary-bucket.

S3-Lambda

Step5: Configure S3 Event Notifications for the Primary Bucket

  • Go to the S3 service in the AWS Management Console.
  • Select the primary-bucket.
  • Go to the "Properties" tab.
  • Scroll down to "Event notifications" and click "Create event notification".
  • Configure the event to trigger on s3:ObjectCreated:* and select the Lambda function S3ReplicationFunction.
  • Save the event notification.

Step6: Configure S3 Event Notifications for the Backup Bucket

  • Go to the S3 service in the AWS Management Console.
  • Select the backup-bucket.
  • Go to the "Properties" tab.
  • Scroll down to "Event notifications" and click "Create event notification".
  • Configure the event to trigger on s3:ObjectCreated:* and select the same Lambda function S3ReplicationFunction.
  • Save the event notification.

Happy Blogging!

Let's Connect

. . . . .
Terabox Video Player