: Fun with Django storage backends

Published at
Thursday 18th July, 2013

Django abstracts file storage using storage backends, from simple filesystem storage to things like S3. This can be used for processing file uploads, storing static assets, and more. This is just a brief look at some things you can do which are kind of fun.

Using Amazon S3

django-storages is “a collection of custom storage backends”, including support for Amazon S3. You want to use the boto-based one, because it has lots of useful features. You can use it pretty quickly without customisation just by adding a few variables to your settings.py; I tend to put AWS access keys in environment variables rather than have different settings.py for different uses, because it plays better with Heroku.

AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY']
AWS_STORAGE_BUCKET_NAME = os.environ['AWS_STORAGE_BUCKET_NAME']
AWS_QUERYSTRING_AUTH = False
AWS_HEADERS = {
  'Cache-Control': 'max-age=86400',
}
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
# these next two aren't used, but staticfiles will complain without them
STATIC_URL = "https://%s.s3.amazonaws.com/" % os.environ['AWS_STORAGE_BUCKET_NAME']
STATIC_ROOT = ''

DEFAULT_FILE_STORAGE is used when you want to store file-like things attached to your models, using field types like FileField and ImageField; STATICFILES_STORAGE is where the static files pulled together from apps and your project by the collectstatic command end up.

Okay, great. But say we want to do more?

Put static files in a slightly different place

If you subclass the S3BotoStorage class, you can override some of its configuration. There are lots of these, but location is an interesting one because it acts as a prefix for the keys stored in S3.

import storages.backends.s3boto

class PrefixedStorage(storages.backends.s3boto.S3BotoStorage):
  def __init__(self, *args, **kwargs):
    from django.conf import settings
    kwargs['location'] = settings.ASSETS_PREFIX
    return super(PrefixedStorage, self).__init__(*args, **kwargs)

So if we plonk a suitable bit of configuration into our settings.py:

ASSETS_PREFIX = 'assets'
STATICFILES_STORAGE = 'prefixed_storage.PrefixedStorage'

then our assets will be separated from our uploaded media. (You could also put them in a different bucket, using the bucket argument, for which you might also want to set access_key and secret_key differently to the default configuration we put in settings.py earlier.)

Protect some file storage

Most of your media uploads – user avatars, for instance – you want to be public. But if you have some media that requires authentication before you can access it – say PDF resumes which are only accessible to members – then you don’t want S3BotoStorage’s default S3 ACL of public-read. Here we don’t have to subclass, because we can pass in an instance rather than refer to a class.

from django.db import models
import storages.backends.s3boto

protected_storage = storages.backends.s3boto.S3BotoStorage(
  acl='private',
  querystring_auth=True,
  querystring_expire=600, # 10 minutes, try to ensure people won't/can't share
)

class Profile(models.Model):
  resume = models.FileField(
    null=True,
    blank=True,
    help_text='PDF resume accessible only to members',
    storage=protected_storage,
  )

There is no permanent publicly-accessible URL for the uploaded resumes, but it’s easy to write a view that will redirect to a temporary URL. Because we set up S3BotoStorage to use query string-based authentication, when asked for the field’s URL it will contact S3 and ask for a temporary one to be created. The configuration above gives use 600 seconds, or 10 minutes, before that URL becomes invalid and can no longer be used.

from django.views.generic import DetailView
from django.http import HttpResponseForbidden, HttpResponseNotFound, HttpResponseRedirect

class ResumeView(DetailView):
  model = Profile

  def get(self, *args, **kwargs):
    obj = super(ResumeView, self).get_object()
    if not request.user.is_authenticated():
      return HttpResponseForbidden()
    if obj.resume is None:
      return HttpResponseNotFound()
    return HttpResponseRedirect(obj.resume.url)

Or you could just put it in a template, only for members:

{% if user.is_authenticated %}
  <a href='{{ profile.resume.url }}'>Grab my resume</a>
{% endif %}

Making a staging version of your live database

This is something I needed to do recently for NSFWCORP: come up with an easy way of taking a live database dump and making a staging instance out of it. This is all run on Heroku, so moving the database dumps around is easy, and writing something to throw away all non-staff users, old conversation threads and so on is also simple. But I also needed to duplicate the media files from the live bucket to the staging bucket. My solution is as follows:

import os
import os.path
import shutil
import sys

from django.conf import settings
from django.core.management.base import BaseCommand
from django.db.models import get_models, FileField
from storages.backends.s3boto import S3BotoStorage


class Command(BaseCommand):
  output_transaction = True

  def handle(self, *args, **options):
    # we want a django-storages s3boto backend for live, using
    # a dedicated read-only key pair
    storage = S3BotoStorage(
      bucket='nsfw-live',
      access_key=settings.LIVE_READ_ONLY_ACCESS_KEY_ID,
      secret_key=settings.LIVE_READ_ONLY_SECRET_KEY,
    )
    # now just go through all the models looking for stuff to do
    for model in get_models():
      fields = filter(lambda x: isinstance(x, FileField), model._meta.fields)
      if len(fields) > 0:
        sys.stdout.write(u"Copying media for %s..." % model._meta.object_name)
        sys.stdout.flush()
        for obj in model.objects.all():
          for field in fields:
            _if = None
            _of = None
            _file = getattr(obj, field.name)
            if not _file.name:
              continue
            try:
              _if = storage.open(_file.name, 'rb')
              if not settings.AWS_AVAILABLE:
                full_path = _file.path
                directory = os.path.dirname(full_path)
                if not os.path.exists(directory):
                  os.makedirs(directory)
                if not os.path.exists(full_path):
                  with open(full_path, 'wb'):
                    pass
              _of = _file.storage.open(_file.name, 'wb')
              shutil.copyfileobj(_if, _of)
            except Exception as e:
              sys.stdout.write(u"\n  failed %s(pk=%i).%s = %s: " % (
                model._meta.object_name,
                obj.pk,
                field.name,
                _file.name
              ))
              sys.stdout.write(unicode(e))
            finally:
              if _if is not None:
                _if.close()
              if _of is not None:
                _of.close()
        sys.stdout.write("done.\n")

Note that there are three new settings.py variables: LIVE_READ_ONLY_ACCESS_KEY_ID and LIVE_READ_ONLY_SECRET_KEY should be fairly obvious, and AWS_AVAILABLE just tells me whether AWS support is configured in the environment, which I use to ensure the destination path and file exist in advance for local storage. I could avoid that by doing something like _file.save(_file.name, _of), although I’m not entirely sure that will preserve file paths and names. It’s cleaner though, and is probably a better solution.

Summing up

The Django storage API and pluggable backends gives a lot of flexibility in how you manage both static assets and file-like things. As well as django-storages there are plenty of other options for when the built-in file system options aren’t suitable for you.