Django abstracts file storage using storage backends, from simple filesystem storage to things like S3. This can be used for processing file uploads, storing static assets, and more. This is just a brief look at some things you can do which are kind of fun.
Using Amazon S3
django-storages
is “a collection of custom storage backends”, including support for Amazon S3. You want to use the boto-based one, because it has lots of useful features. You can use it pretty quickly without customisation just by adding a few variables to your settings.py
; I tend to put AWS access keys in environment variables rather than have different settings.py
for different uses, because it plays better with Heroku.
AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY']
AWS_STORAGE_BUCKET_NAME = os.environ['AWS_STORAGE_BUCKET_NAME']
AWS_QUERYSTRING_AUTH = False
AWS_HEADERS = {
'Cache-Control': 'max-age=86400',
}
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
# these next two aren't used, but staticfiles will complain without them
STATIC_URL = "https://%s.s3.amazonaws.com/" % os.environ['AWS_STORAGE_BUCKET_NAME']
STATIC_ROOT = ''
DEFAULT_FILE_STORAGE
is used when you want to store file-like things attached to your models, using field types like FileField
and ImageField
; STATICFILES_STORAGE
is where the static files pulled together from apps and your project by the collectstatic
command end up.
Okay, great. But say we want to do more?
Put static files in a slightly different place
If you subclass the S3BotoStorage
class, you can override some of its configuration. There are lots of these, but location
is an interesting one because it acts as a prefix for the keys stored in S3.
import storages.backends.s3boto
class PrefixedStorage(storages.backends.s3boto.S3BotoStorage):
def __init__(self, *args, **kwargs):
from django.conf import settings
kwargs['location'] = settings.ASSETS_PREFIX
return super(PrefixedStorage, self).__init__(*args, **kwargs)
So if we plonk a suitable bit of configuration into our settings.py
:
ASSETS_PREFIX = 'assets'
STATICFILES_STORAGE = 'prefixed_storage.PrefixedStorage'
then our assets will be separated from our uploaded media. (You could also put them in a different bucket, using the bucket
argument, for which you might also want to set access_key
and secret_key
differently to the default configuration we put in settings.py
earlier.)
Protect some file storage
Most of your media uploads – user avatars, for instance – you want to be public. But if you have some media that requires authentication before you can access it – say PDF resumes which are only accessible to members – then you don’t want S3BotoStorage
’s default S3 ACL of public-read
. Here we don’t have to subclass, because we can pass in an instance rather than refer to a class.
from django.db import models
import storages.backends.s3boto
protected_storage = storages.backends.s3boto.S3BotoStorage(
acl='private',
querystring_auth=True,
querystring_expire=600, # 10 minutes, try to ensure people won't/can't share
)
class Profile(models.Model):
resume = models.FileField(
null=True,
blank=True,
help_text='PDF resume accessible only to members',
storage=protected_storage,
)
There is no permanent publicly-accessible URL for the uploaded resumes, but it’s easy to write a view that will redirect to a temporary URL. Because we set up S3BotoStorage
to use query string-based authentication, when asked for the field’s URL it will contact S3 and ask for a temporary one to be created. The configuration above gives use 600 seconds, or 10 minutes, before that URL becomes invalid and can no longer be used.
from django.views.generic import DetailView
from django.http import HttpResponseForbidden, HttpResponseNotFound, HttpResponseRedirect
class ResumeView(DetailView):
model = Profile
def get(self, *args, **kwargs):
obj = super(ResumeView, self).get_object()
if not request.user.is_authenticated():
return HttpResponseForbidden()
if obj.resume is None:
return HttpResponseNotFound()
return HttpResponseRedirect(obj.resume.url)
Or you could just put it in a template, only for members:
{% if user.is_authenticated %}
<a href='{{ profile.resume.url }}'>Grab my resume</a>
{% endif %}
Making a staging version of your live database
This is something I needed to do recently for NSFWCORP: come up with an easy way of taking a live database dump and making a staging instance out of it. This is all run on Heroku, so moving the database dumps around is easy, and writing something to throw away all non-staff users, old conversation threads and so on is also simple. But I also needed to duplicate the media files from the live bucket to the staging bucket. My solution is as follows:
import os
import os.path
import shutil
import sys
from django.conf import settings
from django.core.management.base import BaseCommand
from django.db.models import get_models, FileField
from storages.backends.s3boto import S3BotoStorage
class Command(BaseCommand):
output_transaction = True
def handle(self, *args, **options):
# we want a django-storages s3boto backend for live, using
# a dedicated read-only key pair
storage = S3BotoStorage(
bucket='nsfw-live',
access_key=settings.LIVE_READ_ONLY_ACCESS_KEY_ID,
secret_key=settings.LIVE_READ_ONLY_SECRET_KEY,
)
# now just go through all the models looking for stuff to do
for model in get_models():
fields = filter(lambda x: isinstance(x, FileField), model._meta.fields)
if len(fields) > 0:
sys.stdout.write(u"Copying media for %s..." % model._meta.object_name)
sys.stdout.flush()
for obj in model.objects.all():
for field in fields:
_if = None
_of = None
_file = getattr(obj, field.name)
if not _file.name:
continue
try:
_if = storage.open(_file.name, 'rb')
if not settings.AWS_AVAILABLE:
full_path = _file.path
directory = os.path.dirname(full_path)
if not os.path.exists(directory):
os.makedirs(directory)
if not os.path.exists(full_path):
with open(full_path, 'wb'):
pass
_of = _file.storage.open(_file.name, 'wb')
shutil.copyfileobj(_if, _of)
except Exception as e:
sys.stdout.write(u"\n failed %s(pk=%i).%s = %s: " % (
model._meta.object_name,
obj.pk,
field.name,
_file.name
))
sys.stdout.write(unicode(e))
finally:
if _if is not None:
_if.close()
if _of is not None:
_of.close()
sys.stdout.write("done.\n")
Note that there are three new settings.py
variables: LIVE_READ_ONLY_ACCESS_KEY_ID
and LIVE_READ_ONLY_SECRET_KEY
should be fairly obvious, and AWS_AVAILABLE
just tells me whether AWS support is configured in the environment, which I use to ensure the destination path and file exist in advance for local storage. I could avoid that by doing something like _file.save(_file.name, _of)
, although I’m not entirely sure that will preserve file paths and names. It’s cleaner though, and is probably a better solution.
Summing up
The Django storage API and pluggable backends gives a lot of flexibility in how you manage both static assets and file-like things. As well as django-storages
there are plenty of other options for when the built-in file system options aren’t suitable for you.