This article will walk through how we correctly persist static & media files for a Django application hosted on Heroku. As a bonus, it will also explain how we can satisfy the additional constraint of specifying private versus public media files based on model definitions.
Before I begin, this post extends from this TestDriven.io article that was written awhile back. I frequent it often when setting up my projects, and have built some extra functionality on top of it over the years. I decided to create a more focused post that references Heroku & Bucketeer with these extra features after helping an individual on StackOverflow.
So without further ado, let's first dive into what static & media files are and how Heroku dynos manage their filesystem?
What are Media & Static Files
If you are working with a Django project, then you inevitably have all of your Python application code written around a bunch of .py files. These are the code paths of your application, and the end-user - hopefully - never actually sees these files or their contents.
Outside of these business-logic files, it is common to serve users directly from your server's file system. For these static files, Django doesn't need to run any code for them; the framework looks up the file and returns the contents for the requesting user to view.
Some examples of static files include:
Non-templated HTML
CSS & JavaScript files to make your page look nice
User profile pictures
Generated PDFs
Media files in Django are a particular variant of static files. Media files are read from the server's file system as well. Unlike static files, though, they are usually generated files uploaded by users or generated by your application and are associated with a model's FileField or ImageField. In the examples above, user profile pictures and generated PDFs are typical examples of media files.
Django with Media & Static Files
When a new media file is uploaded to a Django web application, the framework looks at the DEFAULT_FILE_STORAGE settings configuration to determine how to store that file. By default, it uses the django.core.files.storage.FileSystemStorage class, which is what most projects start off as having configured. This implementation looks at the MEDIA_ROOT configuration that is defined in the settings.py file and copies the uploaded file contents to a deterministically-created file path under that given MEDIA_ROOT.
For example, if the MEDIA_ROOT is set as /var/www/media, all uploaded files will be copied and written to a location under /var/www/media/.
Heroku with Media & Static Files
Storing these static files on your server's disk file system is okay until you start to work with a containerization platform such as Heroku. To explain why this is the case, it helps to take a step back.
When downloading files on your personal computer, it's okay that these get written to the file system - usually under ~/Downloads or somewhere similar. This download is because you expect your computer's file system to persist across restarts and shutdowns; if you download a file and restart your computer, that downloaded file should still be there once the laptop is finished restarting.
Heroku uses containerization to execute customer workloads. One fact of this environment is that the associated file systems do not persist across restarts and reschedules. Heroku dynos are ephemeral, and they can be destroyed, restarted, and moved without any warning, which replaces the associated filesystem. This situation means that any uploaded files referenced by FileField's andImageField's are just deleted without a trace every time the dyno is restarted, moved, or scaled.
Complete Example Codebase
I will be stepping through the process of configuring the Django application for Heroku & S3-compatible storage, but feel free to reference the repository below for the complete code to browse through.
This tutorial aims to help you retrofit an existing Django project with S3-compatible storage, but I'll quickly go through the steps I used to set up the example Django application. It may help those new to Django & Heroku or those who encounter bugs following the rest of the setup process.
You can view the tagged project before the storage change at commit 299bbe2.
All of the Django code is under the example package, and the manage.py file is in the root. I've always found this structure cleaner than the Django apps defined in the project root.
Configured the project for Heroku
django-heroku package to automatically configure ALLOWED_HOSTS, DATABASE_URL, and more. This reduces the headache of deploying Django on Heroku considerably
An app.json is defined with some fundamental configuration values and resources defined for the project to work
A release process definition in the Procfile and an associated scripts/release.sh script that runs staticfile collection and database migrations
Introducing Heroku's Bucketeer Add-On
Before we can start managing static and media files, the Django application needs a persistent place to store the files. Again, we can look to Heroku's extensive list of Add-Ons for s3-compatible storage. Ours of choice will be one called Bucketeer.
Heroku's Bucketeer add-on provides an AWS S3 storage bucket to upload and download files for our application. The Django application will use this configured bucket to store files uploaded by the server and download them from the S3 when a user requests the files.
If you'd like to learn more about AWS S3, the widely-popular data storage solution that Bucketeer is built upon, you can read the S3 user documentation.
It is worth mentioning that the base plan for Bucketeer - Hobbyist - is $5 per month. If you plan on spinning up the one-click example posted above, it should only cost a few cents if you proactively destroy the application when you are done using it.
Including the Bucketeer Add-On
To include the Bucketeer add-on in our application, we can configure it through the Heroku CLI, web dashboard, or via the project's app.json file. We will use the third method of including the add-on in an app.json file.
If the project does not have one already, we can create the basic structure listed below, with the critical part being the addition of the "add-ons" configuration. This array defines the "bucketeer:hobbyist" resource that our application will use, and Heroku will install the add-on into our application if it does not already exist. We also include the " as" keyword, which will preface the associated configuration variables with the term BUCKETEER. This prefacing is helpful to keep the generated configuration value names deterministic because, by default, Heroku will generate the prefix as a random color.
With the required resources being defined, we can start integrating with our storage add-on.
Implementing Our Storage Solution
The django-storages package is a collection of custom, reuseable storage backends for Django. It aids immensely in saving static and media files to different cloud & storage provider options. One of the supported storage providers is S3, which our Bucketeer add-on is built on. We will leverage the S3 django-storages backend to handle different file types.
Installing django-storages
Begin by installing the django-storages package and the related boto3 package used to interface with AWS's S3. We will also lock our dependencies to ensure poetry and our Heroku deployment continue to work as expected.
poetry add django-storages boto3 && poetry lock
Then, just like most Django-related packages, django-storages will need to be added to the project's INSTALLED_APPS in the projects settings.py file. This will allow Django to load the appropriate code flows as the application starts up.
Implementing Static, Public & Private Storage Backends
We will return to the settings.py file later to configure the usage of django-storages, but before that can be done, we will implement three custom storage backends:
A storage backend for static files - CSS, Javascript, and publicly accessible images - that will be stored in version control - aka git - and shipped with the application
A public storage backend for dynamic media files that are not stored in version control, such as uploaded files and attachments
A private storage backend for dynamic media files that are not stored in the version control that require extra access to be viewed, such as per-user reports and potentially profile images. Files managed by this backend require an access key and will block access to those without a valid key.
We can extend from django-storages 's S3Boto3Storage storage backend to create these. The following code can be directly "copy and paste "'d into your project. The different settings attributes read in the module will be written shortly, so do not expect this code to work if you import it right now.
# FILE: example/utils/storage_backends.py
fromdjango.confimportsettingsfromstorages.backends.s3boto3importS3Boto3StorageclassStaticStorage(S3Boto3Storage):"""Used to manage static files for the web server"""location=settings.STATIC_LOCATIONdefault_acl=settings.STATIC_DEFAULT_ACLclassPublicMediaStorage(S3Boto3Storage):"""Used to store & serve dynamic media files with no access expiration"""location=settings.PUBLIC_MEDIA_LOCATIONdefault_acl=settings.PUBLIC_MEDIA_DEFAULT_ACLfile_overwrite=FalseclassPrivateMediaStorage(S3Boto3Storage):"""
Used to store & serve dynamic media files using access keys
and short-lived expirations to ensure more privacy control
"""location=settings.PRIVATE_MEDIA_LOCATIONdefault_acl=settings.PRIVATE_MEDIA_DEFAULT_ACLfile_overwrite=Falsecustom_domain=False
The attributes listed in each storage backend class perform the following:
location: This dictates the parent directory used in the S3 bucket for associated files. This is concatenated with the generated path provided by a FileField or ImageField 's upload_to method.
default_acl: This dictates the access policy required for reading the files. This dictates the storage backend's access control through values of None, public-read, and private. django-storages and the S3Boto3Storage parent class with translate these into object policies.
file_overwrite: In most cases, it's better not to overwrite existing files if we update a specific path. With this set to False, a unique suffix will be appended to the path to prevent naming collisions.
With our storage backends defined, we can configure them to be used in different situations via the settings.py file. However, it is challenging to use S3 and these different cloud storage backends while in development, and I've always been a proponent of keeping all resources and files "local" to the development machine, so we will create a logic path that will:
Use the local filesystem to store static and media files for convenience. The Django server will be responsible for serving these files directly.
Use the custom S3 storage backends when an environment variable is enabled. We will use the S3_ENABLED variable to control this, enabling it in our Heroku configuration variables.
First, we will assume that you have a relatively vanilla settings.py file concerning the static- & media-related variables. For reference, a new project should have a block that looks similar to the following:
We will design a slightly advanced control flow that will seamlessly handle the two cases defined above. In addition, it will provide enough control to override each part of the configuration as needed.
Since there are already default values for the static file usage, we can add default values for media file usage. These will be used when serving files locally from the server while in development mode.
To begin the process of including S3, let's create the controls to manage if we should serve static & media files from the local server or through the S3 storage backend. We will create three variables
S3_ENABLED: controls whether media & static files should use S3 storage by default
LOCAL_SERVE_MEDIA_FILES: controls whether media files should use S3 storage. Defaults to the negated S3_ENABLED value
LOCAL_SERVE_STATIC_FILES: controls whether static files should use S3 storage. Defaults to the negated S3_ENABLED value
fromdecoupleimportconfig# import explained below
# ...STATIC and MEDIA settings here...
# The following configs determine if files get served from the server or an S3 storage
S3_ENABLED=config('S3_ENABLED',cast=bool,default=False)LOCAL_SERVE_MEDIA_FILES=config('LOCAL_SERVE_MEDIA_FILES',cast=bool,default=notS3_ENABLED)LOCAL_SERVE_STATIC_FILES=config('LOCAL_SERVE_STATIC_FILES',cast=bool,default=notS3_ENABLED)if (notLOCAL_SERVE_MEDIA_FILESornotLOCAL_SERVE_STATIC_FILES)andnotS3_ENABLED:raiseValueError('S3_ENABLED must be true if either media or static files are not served locally')
In the example above, we are using the python-decouple package to make it easier to read and cast environment variables to Python variables. I highly recommend this package when working with settings.py configurations. We also include a value check to ensure consistency across these three variables. If all three variables are defined in the environment but conflict with one another, the program will throw an error.
We can now start configuring the different configuration variables required by our file storage backends based on those control variables' value(s). We begin by including some S3 configurations required whether we are serving static, media, or both types of files.
The above defines some of the variables required by the django-storages S3 backend and sets the values to environment configurations that are provided by the Bucketeer add-on. As previously mentioned, all of the add-on environment variables are prefixed with BUCKETEER_. The S3_SIGNATURE_VERSION environment variable is not required and most likely does not need to be included.
With the S3 configuration together, we can reference the LOCAL_SERVE_MEDIA_FILES and LOCAL_SERVE_STATIC_FILES control variables to override the default static and media file settings if they are desired to be served via S3.
Notice the last line where STATICFILES_STORAGE is set to the custom Backend we created. That ensures it follows the location & ACL (Access Control List) policies that we configured initially. With this configuration, all static files will be placed under /static/ in the bucket, but feel free to update STATIC_LOCATION if desired.
We can configure a very similar situation for media files.
The big difference here is that we have configured two different storage backends for media files; one for publicly accessible objects and one for objects that require an access token. When the file is requested, this token will be generated internally by django-storages so you do not have to worry about anonymous public access.
Local Development Serving
Since we will have S3_ENABLED set to False in our local development environment, it will serve static and media files locally through the Django server instead of from S3. We will need to configure the URL routing to handle this scenario. We can configure our urls.py file to serve the appropriate files like so:
This will locally serve the static or media files based on the values of the LOCAL_SERVE_STATIC_FILES and LOCAL_SERVE_MEDIA_FILES settings variables we defined.
Enabling S3 Storage
We can enable these storages and our add-on in the app.json file to start using these storage backends. This will effectively disable LOCAL_SERVE_STATIC_FILES and LOCAL_SERVE_MEDIA_FILES to start serving both via S3 when deployed to Heroku.
{// ...rest of configs..."env":{// ...rest of envs..."S3_ENABLED":{"description":"Enable to upload & serve static and media files from S3","value":"True"},}}
Using the Private Storage
By default, Django will use the PublicMediaStorage class for uploading media files, meaning the contents will be publicly accessible to anyone with the link. However, a model can utilize the PrivateMediaStorage backend when desired, which will create short-lived access tokens that prevent the public from viewing the associated object.
The below is an example of using public and private media files on the same model.
fromdjango.dbimportmodelsfromexample.utils.storage_backendsimportPrivateMediaStorageclassOrganization(models.Model):"""A sample Organization model with public and private file field usage
"""logo=models.ImageField(help_text='A publicly accessible company logo')expense_report=models.FileField(help_text='The private expense report requires a short-lived access token'storage=PrivateMediaStorage()# will create private files
)
You can see the code for this complete example at commit 265becc. This configuration will allow your project to scale efficiently using Django on Heroku using Bucketeer.
In a future post, we will discuss how to upload and set these files using vanilla Django & Django REST Framework.
As always, if you find any bugs, issues, or unclear explanations, please reach out to me so I can improve the tutorial & experience for future readers.