Private PyPi Server on AWS

Deploy it in less than 5 minutes with Terraform

Published in

FAUN — Developer Community 🐾

5 min readFeb 24, 2021

While working with complex and multi-module Python projects it quickly becomes crucial to share libraries across different components, enable developers to easily install those libraries into their local development environment, and use them in continuous integration tools. A private PyPi repository is a good solution to this problem since it allows installing internal libraries anywhere just by using regular pip install commands while keeping full control over the Python packages.

If your application is running on the cloud, you likely want to deploy your PyPi server within your infrastructure. In this post, I focus on the AWS cloud and show how to deploy a password-protected PyPi server on a small EC2 instance within an existing VPC. As a server, I am going to use a minimal PyPi server implementation that is easy to set up, not demanding in terms of resources, and, most importantly, still actively maintained on GitHub. The cloud infrastructure is built using Terraform, a great tool that has become the de-facto standard for infrastructure-as-a-code (IaaC) provisioning. Thanks to Terraform, running your own PyPi repository on AWS can be done in less than a minute. But now, let’s dive in.

Infrastructure code

The main building block of the infrastructure is an EC2 instance where the PyPi server is running. If the PyPi server is deployed within an existing infrastructure that already includes a load balancer to serve HTTP requests, we can leverage it for improving the security and usability of the PyPi server: With the load balancer and few additional resources, one can send HTTPS requests to the PyPi server using an already owned Route53 domain.

Compute instance with PyPi server

Provisioning an EC2 instance with Terraform is straightforward since only one resource is needed. However, we need to make sure that the PyPi server is actually running at instance startup and enforces basic authentication with username and password. One of the simplest solutions for this is to run the PyPi server as an OS process started at the instance boot with restart on failure. Let’s see what we need to do to achieve that:

First, we need to create username and password for PyPi server authentication using the htpasswd utility (use the terraform.tfvars file to set the credentials as Terraform variables to avoid hard-coding them in the infrastructure code).
We need also a shell script for starting the server.
To execute the script as a OS service, we need to write a system service file and place it in the etc/system.d/system folder.
Finally, we can start the PyPi server using thesystemctl utility.

All these steps can be automated with a cloud-init.yml template file:

As you can see, the above template requires 3 variables: The PyPi server’s username and password and a mount point. The mount point is associated with an EBS volume where Python packaged are stored and which can be regularly backed up and encrypted if required. It is important to save the packages within an EBS volume and not in the root device volume of the EC2 instance to avoid losing data in case of instance failure or reboot.

This template can then be passed to the userdata field in the EC2 instance configuration and it will be executed from the cloud-init agent at startup. The EC2 instance and the attached EBS volume can be provisioned using the following Terraform resource blocks:

HTTPS requests to the PyPi server

The infrastructure code in the previous section spins up a fully functional PyPi server accessible via the EC2 instance public IP address. However, this setup is not very secure since the PyPi server is queried using clear-text HTTP requests and the EC2 instance has a public IP open to the Internet. To remove these security weaknesses, the PyPi server can be put in front of a load balancer accepting only HTTPS requests and, at the same time, the EC2 can run within a private VPC subnet.

To complete the configuration of the PyPi server, few additional resources are needed:

the application load balancer
a certificate for HTTPS traffic encryption/decryption issued by the AWS Certificate Authority
a valid domain name bought, for example, via AWS Route53

These resources are often used when deploying applications on AWS and, therefore, are likely already part of the could infrastructure where the PyPi server is deployed. If not, the Terraform code to deploy them is pretty standard and uses a couple of official AWS modules.

Load balancer for HTTPS requests to the PyPi server

Finally, the last missing piece is to make sure that the load balancer actually points to the EC2 instance where the PyPi server is running. This can be done by configuring a suitable HTTP(S) listener and target group associated with the load balancer just provisioned as shown below.

Target group and listener to associate PyPi server to the load balancer.

Use the private repository

If you made it this far, there is one last thing that remains to discuss: How can we actually use the PyPi server? Nothing easier! For uploading a new package create a setup.py and use the standard Python distribution command (you need to properly configure the .pypirc file with the server username and password as detailed here):

python setup.py sdist upload -r cloud_aws

Packages in the repository can instead be installed with one line using pip (if you do not use the load balancer, replace the domain name with the public IP address of the EC2 instance):

pip install --index-url https:/username:password@your-domain-name:8080/simple/ PACKAGE [PACKAGE2...]

And with this, we have all the ingredients for provisioning and using a private PyPi repository on AWS. The complete code for this post can be found on GitHub and it is also packaged as a Terraform module to greatly simplify its deployment. If you have any comments or suggestions to improve this post feel free to contact me on LinkedIn.

👋 Join FAUN today and receive similar stories each week in your inbox! ️ Get your weekly dose of the must-read tech stories, news, and tutorials.

Follow us on Twitter 🐦 and Facebook 👥 and Instagram 📷 and join our Facebook and Linkedin Groups 💬