Database backup using AWS Lambda

Mario Dagrada
FAUN — Developer Community 🐾
4 min readNov 13, 2021

--

An example of using Lambda to automate repetitive tasks

Photo by Art Wall - Kittenprint on Unsplash

Within the AWS ecosystem, Lambda is the flagship service for building serverless applications. Usually, one can use Lambda for building the backend of a web application where each API request is processed by a distinct Lambda execution, process data at scale without worrying about peaks in the data flow, or build scalable event-driven applications.

However, Lambda can be extremely useful also in different contexts such as automating boring administration tasks within a complex AWS infrastructure. As an example, this post shows how to build a Lambda function that performs automatic backups of a managed OpenSearch (formerly ElasticSearch) domain. Let’s get started!

The Lambda function

Lambda functions support a wide range of programming languages, I choose Python in this blog. The code in a different language will look very similar since it performs only HTTP requests to the OpenSearch domain.

ElasticSearch (on which OpenSearch is based) offers a backup mechanism based on snapshots of one or more indices. The snapshots can then be stored in a durable and safe location such as Amazon S3. The code for creating a new snapshot of the OpenSearch domain is the bulk of the Lambda function handler. The whole code is shown below:

This code is based on the AWS official documentation and quite straightforward. However, there are a couple of important points to notice:

  • All required configuration is passed via environmental variables. This is the preferred method since environmental variables can be easily set when configuring the Lambda. For sensitive information, AWS Secrets Manager can be used instead.
  • Despite the Lamba being executed within a virtual private network (VPC), HTTP requests to the ElasticSearch domain must always be signed using AWS credentials holding the right IAM permissions. This is achieved via the request_aws4auth library, as suggested by the AWS documentation.

Provisioning the Lambda function

For configuring and running the Lambda function on AWS, let’s use Terraform, one of the best infrastructure-as-a-code (IaaC) solutions out there.

Despite the common narrative around Lambda, creating a function needs more than just the source code. Above all, IAM permissions are essential if one wants the function to access other AWS services. In this case, the Lambda role must be able to access both the OpenSearch service for creating the backup snapshots and the S3 bucket where the snapshots are stored. To achieve so, let’s create an IAM role conveniently named SnapshotRole. This role needs to perform a couple of actions:

  • first of all, it needs access to the OpenSearch service.
  • It must have the permissions to create snapshots of the database sending HTTP PUT requests to the desired OpenSearch domain, called here my-aws-es-domain-url for simplicity.
  • Finally, it needs also read/write to a pre-existing S3 bucket my-es-backup-bucket where the snapshots are stored.

This is achieved with the snapshot_policy in the snippet below.

The snapshot role has now all the necessary permissions. However, before attaching it to the Lambda, one needs to additionally configure the iam:PassRolepermission. This is crucial to avoid privilege escalation within the Lambda function code and it should be implemented every time a new Lambda is created. However, it can also be quite confusing at times. If you want to better understand why iam:PassRole is needed in this context, here is a very clear explanation.

Terraform makes this configuration pretty straightforward.

The policy created above can be directly attached to the Lambda function.

Let’s go now to the most important component of our infrastructure which is the Lambda function itself. This can be provisioned using the convenient aws-lambda official Terraform module. Make sure to give the right path to the Lambda handler code and fill in the right values for the environmental variables which configure the Lambda function.

In the code above, I assumed that a virtual private network (VPC) and security group were previously created and just given as input in a Terraform variable.

Finally, we need to define a trigger for the Lambda function. In this case, I want to have a daily backup of the database. The right trigger can be created using a CloudWatch event with the desired schedule expression to determine the backup schedule periodicity:

If you did everything correctly, the only step left is to initialize and run the Terraform code:

terraform init
terraform apply -auto-approve

Now you should be able to see your Lambda function in the AWS console and test that it works directly from there.

Wrapping up

In this post, I showed an alternative usage of Lambda functions for automating repetitive administrative tasks such as performing periodic backups of databases. All the infrastructure is deployed using Terraform and the whole code (Lambda handler and Terraform scripts) can be found on Github. If you have any questions or doubts about this post, do not hesitate to contact me by commenting here or writing me on LinkedIn.

Join FAUN: Website 💻|Podcast 🎙️|Twitter 🐦|Facebook 👥|Instagram 📷|Facebook Group 🗣️|Linkedin Group 💬| Slack 📱|Cloud Native News 📰|More.

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author 👇

--

--