Database backup using AWS Lambda
An example of using Lambda to automate repetitive tasks
Within the AWS ecosystem, Lambda is the flagship service for building serverless applications. Usually, one can use Lambda for building the backend of a web application where each API request is processed by a distinct Lambda execution, process data at scale without worrying about peaks in the data flow, or build scalable event-driven applications.
However, Lambda can be extremely useful also in different contexts such as automating boring administration tasks within a complex AWS infrastructure. As an example, this post shows how to build a Lambda function that performs automatic backups of a managed OpenSearch (formerly ElasticSearch) domain. Let’s get started!
The Lambda function
Lambda functions support a wide range of programming languages, I choose Python in this blog. The code in a different language will look very similar since it performs only HTTP requests to the OpenSearch domain.
ElasticSearch (on which OpenSearch is based) offers a backup mechanism based on snapshots of one or more indices. The snapshots can then be stored in a durable and safe location such as Amazon S3. The code for creating a new snapshot of the OpenSearch domain is the bulk of the Lambda function handler. The whole code is shown below:
This code is based on the AWS official documentation and quite straightforward. However, there are a couple of important points to notice:
- All required configuration is passed via environmental variables. This is the preferred method since environmental variables can be easily set when configuring the Lambda. For sensitive information, AWS Secrets Manager can be used instead.
- Despite the Lamba being executed within a virtual private network (VPC), HTTP requests to the ElasticSearch domain must always be signed using AWS credentials holding the right IAM permissions. This is achieved via the
request_aws4auth
library, as suggested by the AWS documentation.
Provisioning the Lambda function
For configuring and running the Lambda function on AWS, let’s use Terraform, one of the best infrastructure-as-a-code (IaaC) solutions out there.
Despite the common narrative around Lambda, creating a function needs more than just the source code. Above all, IAM permissions are essential if one wants the function to access other AWS services. In this case, the Lambda role must be able to access both the OpenSearch service for creating the backup snapshots and the S3 bucket where the snapshots are stored. To achieve so, let’s create an IAM role conveniently named SnapshotRole
. This role needs to perform a couple of actions:
- first of all, it needs access to the OpenSearch service.
- It must have the permissions to create snapshots of the database sending HTTP PUT requests to the desired OpenSearch domain, called here
my-aws-es-domain-url
for simplicity. - Finally, it needs also read/write to a pre-existing S3 bucket
my-es-backup-bucket
where the snapshots are stored.
This is achieved with the snapshot_policy
in the snippet below.
The snapshot role has now all the necessary permissions. However, before attaching it to the Lambda, one needs to additionally configure the iam:PassRole
permission. This is crucial to avoid privilege escalation within the Lambda function code and it should be implemented every time a new Lambda is created. However, it can also be quite confusing at times. If you want to better understand why iam:PassRole
is needed in this context, here is a very clear explanation.
Terraform makes this configuration pretty straightforward.
The policy created above can be directly attached to the Lambda function.
Let’s go now to the most important component of our infrastructure which is the Lambda function itself. This can be provisioned using the convenient aws-lambda
official Terraform module. Make sure to give the right path to the Lambda handler code and fill in the right values for the environmental variables which configure the Lambda function.
In the code above, I assumed that a virtual private network (VPC) and security group were previously created and just given as input in a Terraform variable.
Finally, we need to define a trigger for the Lambda function. In this case, I want to have a daily backup of the database. The right trigger can be created using a CloudWatch event with the desired schedule expression to determine the backup schedule periodicity:
If you did everything correctly, the only step left is to initialize and run the Terraform code:
terraform init
terraform apply -auto-approve
Now you should be able to see your Lambda function in the AWS console and test that it works directly from there.
Wrapping up
In this post, I showed an alternative usage of Lambda functions for automating repetitive administrative tasks such as performing periodic backups of databases. All the infrastructure is deployed using Terraform and the whole code (Lambda handler and Terraform scripts) can be found on Github. If you have any questions or doubts about this post, do not hesitate to contact me by commenting here or writing me on LinkedIn.
Join FAUN: Website 💻|Podcast 🎙️|Twitter 🐦|Facebook 👥|Instagram 📷|Facebook Group 🗣️|Linkedin Group 💬| Slack 📱|Cloud Native News 📰|More.
If this post was helpful, please click the clap 👏 button below a few times to show your support for the author 👇