Git remotes

Published at: Sunday 5th June, 2016

I’m a mentor for Xapian within Google Summer of Code again this year, and since we’re a git-backed project that means introducing people to the concepts of git, something that is difficult partly because git is complex, partly because a lot of the documentation out there is dense and can be confusing, and partly because people insist on using git in ways I consider unwise and then telling other people to do the same.

Two of the key concepts in git are branches and remotes. There are many descriptions of them online; this is mine.

remotes are repositories other than the one you’re using, and to which you have access (via a URL and probably some authentication)
remotes have names; you probably have one called “origin”, which will be wherever you cloned your repository from – throughout this article I assume that collaborators all push their changes back to this remote, and fetch others’ changes from there
branches are, somewhat confusingly, pointers to particular commits in your repository; they are either local branches or remote (tracking) branches
local branches are ones that you can work on
remote tracking branches are branches in your repository which mirror local branches in remotes you’re working with
remote tracking branches don’t automatically update when someone changes the remote; you have to tell them to update, and you do so using git fetch <remote>, which pulls down the commits you don’t have, and adjusts the remote tracking branches to point to the right place
at that point your repository has commits from other people, but they aren’t yet incorporated into the code you see on your local branch
you can see which commits are behind which branches using git show-branch
you can then incorporate commits from remote tracking branches into your local branch using a range of options; here I’ll talk about git merge and git rebase, because they both play well in collaborative environments

For the rest of this article I’m only going to consider the common case of multiple people collaborating to make a single stream of releases (whether that’s open source software tagged and packaged, or perhaps commercial software that’s deployed to a company’s infrastructure, like a webapp). I also won’t consider what happens when merges or rebases fail and need manual assistance, as that’s a more complex topic.

Getting work from others

One of the key things you need to be able to do in collaborative development is to accept in changes that other people have made while you were working on your own changes. In git terms, this means that there’s a remote that contains some commits that you don’t have yet, and a local branch (in the repository you’re working with) that probably contains commits that the remote doesn’t have yet either.

First you need to get those commits into your repository:

$ git fetch origin
remote: Counting objects: 48, done.
remote: Total 48 (delta 30), reused 30 (delta 30), pack-reused 18
Unpacking objects: 100% (48/48), done.
From git://github.com/xapian/xapian
   9d2c1f7..91aac9f  master     -> origin/master

The details don’t matter so much as that if there are no new commits for you from the remote, there won’t be any output at all.

Note that some git guides suggest using git pull here. When working with a lot of other people, that is risky, because it doesn’t give you a chance to review what they’ve been working on before accepting it into your local branch.

Say you have a situation that looks a little like this:

[1] -- [2] -- [3] -- [4] <--- HEAD of master
         \
          \-- [5] -- [6] -- [7] <--- HEAD of origin/master

(The numbers are just so I can talk about individual commits clearly. They actually all have hashes to identify them.)

What the above would mean is that you’ve add two commits on your local branch master, and origin/master (ie the master branch on the origin remote) has three commits that aren’t in your local branch.

You can see what state you’re actually in using git show-branch. The output is vertical instead of horizontal, but contains the same information as above:

$ git show-branch origin/master master
! [origin/master] 7 message
 * [master] 4 message
--
 * [master] 4 message
 * [master^] 3 message
+  [origin/master] 7 message
+  [origin/master^] 6 message
+  [origin/master~2] 5 message
+* [origin/master~3] 2 message

Each column on the left represents one of the branches you give to the command, in order. The top bit, above the line of hyphens, gives a summary of which commit each branch is at, and the bit below shows you the relationship between the commits behind the various branches. The things inside [] tell you how to address the commits if you need to; after them come the commit messages. (The stars * show you which branch you currently have checked out.)

From this it’s fairly easy to see that your local branch master has two commits that aren’t in origin/master, and origin/master has three commits that aren’t in your local branch.

Incorporating work from others

So now you have commits from other people, and additionally you know that your master branch and the remote tracking branch origin/master have diverged from a common past.

There are two ways of incorporating that other work into your branch: merging and rebasing. Which to use depends partly on the conventions of the project you’re working on (some like to have a “linear” history, which means using rebase; some prefer to preserve the branching and merging patterns, which means using merge). We’ll look at merge first, even though a common thing to be asked to do to a pull request on github is to “rebase on top of master” or similar.

Merging to incorporate others’ work

Merging leaves two different chains of commits intact, and creates a merge commit to bind the changes together. If you merge the changes from origin/master in the above example into your local master branch, you’ll end up with something that looks like this:

[1] -- [2] -- [3] -- [4] --------- [8] <--- HEAD of master
         \                         /
          \-- [5] -- [6] -- [7] --/

You do it using git merge:

$ git merge origin/master
Updating 9d2c1f7..91aac9f
Fast-forward
 .travis.yml | 26 ++++++++++++++++++++++++++
 bootstrap   | 10 ++++++++--
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 .travis.yml

It will list all the changes in the remote tracking branch which were incorporated into your branch.

Rebasing to incorporate others’ work

What we’re doing here is to take your changes since your local branch and remote tracking branch diverged and move them onto the current position of the remote tracking branch. For the example above you’d end up with something that looks like this:

[1] -- [2] -- [5] -- [6] -- [7] -- [3'] -- [4'] <--- new HEAD of master

Note that commits [3] and [4] have become [3'] and [4'] – they’re actually recreated (meaning their hash will change), which is important as we’ll see in a minute.

You do this as follows:

$ git rebase origin/master
First, rewinding head to replay your work on top of it...
Applying: 3 message
Applying: 4 message.

Some caution around rebasing

Rebasing is incredibly powerful, and some people get trigger happy and use it perhaps more often than they should. The problem is that, as noted above, the commits you rebase are recreated; this means that if anyone had your commits already and you rebase those commits, you’ll cause difficulties for those other people. In particular this can happen while using pull requests on github.

A good rule of thumb is:

you can rebase at any time up until the point when you submit code for review (either at the point you open the pull request, or the point where you ask people to look at it)
from then on, you shouldn’t rebase until everyone has finished reviewing the code, you have made changes based on those comments, and they have checked those changes to ensure their concerns have been addressed; if someone suggests a change which you then make, but you rebase in the process, it can be difficult for them to see what’s happened
when making changes based on pull request comments, you can use git commit --fixup <earlier commit> to quickly make a commit with a message that will be easy to flatten into the earlier commit just before merging the pull request
at the end of review, before a pull request is merged, you can do a final rebase (a lot of projects have a process where someone will explicitly prompt that this is the time to do so); that allows you both to ensure you’re properly integrated with the latest upstream code and to collapse “fixup” commits into the right place

Rebasing during pull requests is discussed in this Thoughtbot article.

In summary

Most of the time, your cycle of work is going to look like this:

git add -p to add changes to the git stage
git commit -v to create commits out of those changes
git fetch to get others’ recent changes
git show-branch to see what those changes are
git merge or git rebase to incorporate those changes

Following that you can use git push and pull requests, or whatever other approach you need to do to start the review process ahead of getting your changes applied.

Styling select elements with padding

Published at: Sunday 13th December, 2015

There are a lot of articles around recommending using -webkit-appearance: none or -webkit-appearance: textfield to enable you to style select boxes on Chrome and Safari (particularly when you need to set padding). You then have to add back in some kind of dropdown icon (an arrow or whatever) as a background image (or using generated content), otherwise there’s no way for a user to know it’s a dropdown at all.

In case you don’t know what I’m talking about, c.bavota’s article on styling a select box covers it pretty well.

However this introduces the dropdown icon for all (recent) browsers, meaning that Firefox and IE will now have two icons indicating that you’ve got a select dropdown. The solution many articles take at this point is:

apply -moz-appearance: none to match things on Firefox
apply some selector hack to undo the dropdown icon on Internet Explorer
ignore any other possible user agents

Basically what this does is to introduce complexity for every browser other than the ones we’re worried about. If we ignored Chrome and Safari, we’d just apply padding to our select, set text properties, colours and border, and move on. It’s because of them that we have to start jumping through hoops; we should really constrain the complexity to just those browsers, giving us a smaller problem for future maintenance (what happens if a future IE isn’t affected by the selector hack? what happens if -moz-appearance stops being supported in a later Firefox, or a Firefox-derived browser?).

Here’s some CSS to style various form controls in the same way. It’s pretty simple (and can probably be improved), but should serve to explain what’s going on.

    input[type=text],
    input[type=email],
    input[type=password],
    select {
        background-color: white;
        color: #333;
        display: block;
        box-sizing: border-box;
        padding: 20px;
        margin-bottom: 10px;
        border-radius: 5px;
        border: none;
        font-size: 20px;
        line-height: 30px;
    }

We want to add a single ruleset that only applies on Webkit-derived browsers, which sets -webkit-appearance and a background image. We can do this as follows:

    :root::-webkit-media-controls-panel,
    select {
        -webkit-appearance: textfield;
        background-image: url(down-arrow.svg);
        background-repeat: no-repeat;
        background-position: right 15px center;
        background-color: white;
    }

A quick note before we look at the selector; we’re using a three-value version of background-position to put the left hand edge of the arrow 15px from the right of the element, vertically centred. In the case I extracted this from, the arrow itself is 10px wide, providing 5px either side of it (within the 20px padding) before we hit either the edge of the select or the worst-case edge of the text. It’s possible to do this using a generated content block, but I found it easier in this case to position the arrow as a background image.

The selector is where we get clever. :root is part of CSS 3, targeting the root element; ::-webkit-media-controls-panel is a vendor-specific extension CSS 3 pseudo-element selector, supported by Chrome and Safari, which presumably supports applying rules to video and audio player controls. They’re not going to match the same element, so we can chain them together to make a selector that will match nothing, but which will be invalid on non-Webkit browsers.

Browsers that don’t support ::-webkit-media-controls-panel will drop the entire ruleset; if you want the details, I’ve tried to explain below. Otherwise, you’re done: that entire ruleset will only apply in the situation we want, and we only have to support Chrome and Webkit if they change their mind about things in the future. (And it’s only if they drop support for one of -webkit-appearance: textfield or ::-webkit-media-controls-panel – but not both – that things will break even there.)

Trying to explain from the spec

CSS 2.1 (and hence CSS 3, which is built on it) says implementations should ignore a selector and its declaration block if they can’t parse it. Use of vendor-specific extensions is part of the syntax of CSS (and :: to introduce pseudo-elements is part of the CSS 3 Selector specification, so pre-CSS 3 browsers will probably throw the entire ruleset out just on the basis of the double colon introducing the pseudo-element). CSS 2.1 (4.1.2.1) says, about vendor-specific extensions:

An initial dash or underscore is guaranteed never to be used in a property or keyword by any current or future level of CSS. Thus typical CSS implementations may not recognize such properties and may ignore them according to the rules for handling parsing errors.

This is incredibly vague in this situation, because we care about the “keyword” bit, while the second sentence really focussed on properties. Also, implementations may ignore them sounds as if a conforming CSS implementation could choose to accept any validly-parsed selector, and just assume it never matches if they don’t understand a particular pseudo-class or pseudo-element that has a vendor-specific extension. The next sentence probably helps the implementation choice a little:

However, because the initial dash or underscore is part of the grammar, CSS 2.1 implementers should always be able to use a CSS-conforming parser, whether or not they support any vendor-specific extensions.

This suggests, lightly, that implementations should ignore rules they don’t understand even if they can be parsed successfully. Certainly this seems to be the conservative route that existing implementations have chosen to take.

Django files at DUTH

Published at: Saturday 7th November, 2015

On Friday I gave a talk at Django Under The Hood on files, potentially a dry and boring topic which I tried to enliven with dinosaurs and getting angry about things. I covered everything from django.core.files.File, through HTTP handling of uploads, Forms support, ORM support and on to Storage backends, django.contrib.staticfiles, Form.Media, and out into the wider world of asset pipelines, looking at Sprockets, gulp and more, with some thoughts about how we might make Django play nicely with all these ideas.

You used to be able to watch the talk online, but then Elastic bought Opbeat and apparently that’s that. You can still check out my slides, although you probably want them with presenter notes (otherwise they’re pretty opaque). The talk was well received, and I suspect there’s still some useful work to be done in this space.

Digital priorities for Labour

Published at

Tuesday 13th October, 2015

Tagged as

Labour Party
UK Politics

Almost two weeks ago, Labour’s Party Conference was coming to a close in Brighton. Thousands of people, sleep deprived but full of energy, were on their journey home, fuelled with ideas to take back to their friends, constituencies and campaigns. A lot of talk over the four days had been about the future within the party, as much as about politics for the country, and I imagine lots of people were thinking about priorities as they went home. Getting ready for next May. Growing the membership. Local issues that can become campaign lynchpins.

When considering digital technology, it’s important to establish what Labour needs, and then find ways that technology can help (as I wrote before); so how could those priorities — winning in May, new members, and so on — be translated to “digital priorities”, tools to build and programmes to run?

This feels particularly important if Labour is going to move to digital in everything we do, because anything that is different to existing structures needs to be “sold in”. A new digital team, with either new actors within the party or existing ones with significantly redefined roles, will want to cement its usefulness, as or before it takes on any bigger, gnarlier challenges.

It’s easier to support something that has already helped you, so for any new digital folk within Labour it will be important to deliver some tactical support before, say, building a system to directly crowd source policy proposals to feed into the National Policy Forum.

Supporting what’s there

In Brighton, I heard from new members who felt isolated; who didn’t know what was going on; who didn’t know who to talk to, or were frustrated by imprecations to “get involved” without a clear idea of what that might mean, or where sometimes there was an assumption that involvement would always mean volunteering to knock on doors.

It’s not that the support isn’t there. The membership team has lots of resources explaining how the party works — but it isn’t all readily available or easy to find online. Some Constituency Labour Parties (CLPs) have terrific ways of welcoming new members, from the personal touch to digital resources helping people figure out where they want to fit and contribute. Much of the future of Labour is already here, it’s just unevenly distributed.

This is something where digital tools, and the processes that lead to them, can help. I’ve probably encountered a dozen Labour branded websites; most of them don’t link to each other, and most of them aren’t linked from the main website. Many of them haven’t been updated since the general election campaign. Not all CLPs list events online, even within Membersnet (and it’s no longer clear to me if the events are actually in Membersnet and not yet another website). A proper clean up of these would involve finding places for all the information members might want to get online, including information that will need creating or collating for the first time. If you slip through the cracks for any reason (an email goes astray, or just the number of people joining means it takes time for your branch or CLP to get in touch), it should be easier to self-start; that will also help everyone else.

It could also make it easier for CLPs to cross-fertilise ideas. Anyone bristling that they already do this, consider that there are constituencies with no Labour MP, even no councillors, and perhaps only the rump of a CLP. But they’re still getting new members, and if those new members have the right encouragement, they can start to turn things around.

There are lots of other areas where digital specialists can help. Fighting the elections next May, for instance; I’m certain that Sadiq Khan will have no shortage of digital help where it needs, but among the hundreds of other campaigns there will be some that would like a little extra.

As the waves of digital transformation continue to change many aspects of our lives, people thinking through policy will sometimes need support in digesting the implications or latest changes in the technology landscape.

(I sometimes wonder what would have happened if there’d been a digital technology specialist in the room when David Cameron decided to sound off about protecting the security forces from the dangers of encryption; maybe someone could have explained to him the difficulty, not to mention the economic dangers, of trying to legislate access to mathematics. Labour is fortunate to have shadow ministers with the background and knowledge to talk through the different angles — but they probably don’t have time to answer questions from a councillor on how the “sharing economy” is going to affect demand for parking spaces.)

Of course, a few digital specialists working out of London or Newcastle cannot possibly support the entire movement. That will come out of helping members across the country to provide that support.

Networking the party

I wrote before about curating pools of volunteers in different disciplines, and it’s an idea I heard back from others in Brighton. It’s really only valuable if this is done across the country — so a campaign in Walthamstow can track down a carpenter in Cambridge who has a bit of time, or a constituency party in Wales can find the video editor it needs, even if they’re in Newcastle.

This is about strengthening and enabling the network within the party. Again (and this is something of a theme), it isn’t that there’s no network at all. Campaigns aren’t run in isolation, and people do talk to and learn from each other. The history of the Labour movement in this country is one of networks.

But networks are stronger the more connections they have. Internet-based tools allow networks to exist beyond geographic considerations (James Darling talked about this in Open Labour). They allow people to form communities that don’t just come together periodically, face to face. And they can allow other people to draw down the experience and talent within a community when they have a need for it.

Building the digital community

There are two types of digital assistance that people are likely to draw on: tools to use, and people to use them. In the highly tactical world of campaigning and politics, the people to use them — and adapt them, and build other things on top of them — are the more important. Nonetheless, at the heart of Labour’s digital work is digital tools. Right now with Nation Builder, CFL Caseworker and so on. In the future with digital tools yet to be built.

Digital tools are built by a digital community. Communities aren’t created; they grow. But you can foster them.

We need a digital community in Labour because there’s no monopoly on good ideas. We need a community because that way it can survive whatever happens within the Labour party. We need a community because, quite frankly, the party cannot afford to build and support all the tools we need centrally — because the tool used on a campaign in the Scottish Highlands may not be right for a group in the Welsh valleys, with different needs and priorities. They may have both started from the same idea, they may share code and design and history, but they may also end up in very different places.

We need a community of tool builders. And that community should welcome anyone who wants to get involved.

In the open source world, there are people thinking about how to encourage new people to get involved in their communities. Making it easy for people to contribute for the first time, and providing lists of suitable pieces of work to pick up are two recent examples. In the corporate world, this is called “onboarding”, an ugly word that masks an incredibly important function. I have, in some companies, expended more energy working on this than on anything else; it’s hard to overstate its importance.

(None of this is easy. The technology industry, and open source communities, are both struggling with inclusivity and diversity. That struggle is an active one, and any digital community within Labour can and must learn from them, both from the successes and the failures and problems. I don’t want to understate the difficulty or importance of getting this right.)

The Labour movement can go further than encouraging people to participate, by actively training people. Open source projects, and many companies, rely on pre-qualified people to turn up and want to start work. But the party, at various levels, already trains people to be councillors, and to run campaigns, and to analyse demographic data. It can also train people so they can join the digital community: to become documentation authors, and product designers, and software developers, and to support the tools we make.

A lot of this will rely on local communities and volunteers arranging and promoting sessions. There are initiatives such as Django Girls and codebar that can start people on the journey to becoming programmers; there are similar for other skills. We can help people become aware of them, and provide the confidence to sign up. And we can run our own courses, using materials that others have developed.

So much of the success of digital transformation depends on people. In the case of the Labour party it’s the local members, councillors, CLP officers, and the volunteers from across the country who are going to make it work. However there’s also a need for some people at the centre: balancing needs and priorities from across the movement, setting direction, getting people excited, and filling in the gaps where no one else is doing things. (You think management is all glory and bossing people around? It’s mostly helping other people, and then picking up the bits they don’t have time for.)

This will involve talking to people. It will involve getting out to constituency and branch meetings around the country and talking passionately about what can be done, and getting them excited to go out and talk to more people.

Whoever takes on these roles isn’t going to be stuck in London all the time, head down, squinting at a laptop. It won’t work like that. They’d better love talking to people. They’d better love learning from people. And they’d better love public transport.

This was originally published on Medium.

Digital in a Labour age

Published at

Wednesday 23rd September, 2015

Tagged as

UK Politics
Labour Party
Digital Transformation

“If you are hopeful about exciting innovations that are more participative, a general election campaign is the last place you should go looking. They will happen in a less partisan race, with millions being spent not billions, and where the political situation is much more strange” — Tom Steinberg, interviewed in 2012

Since his victory, we’ve started to hear little nuggets from inside Jeremy Corbyn’s campaign to become leader of the Labour party. It was already clear to many the value of the strong and often grassroots social media power the campaign attracted; and the phone bank system written by a volunteer reminds me of aspects of Dashboard, the online system for campaign volunteers built by Obama for America in 2012.

With the new Labour front bench including a number of MPs who’ve been strong proponents of digital thinking for some time, and with experiments such as crowd sourcing PMQs, it’s a good time to think about how digital technology can benefit the Labour party.

However it’s more important to concentrate on what the Labour party and movement needs to be, and then think about ways technology can help. That’s what is driving the thinking behind pieces such as James Darling’s Open Labour and Peter Wells’ Growing an open Labour Party, but in an age of 140 characters and continuous partial attention it’s easy to bundle this up as “Labour in the digital age”, a convenient moniker that can lead the conversation adrift.

Digital technology, the things you can do with computers, phones, tablets (and watches, and Google Glass, and smoke alarms, and washing machines, and…) and the Internet, is revolutionary, increasingly pervasive, and often wildly misunderstood. But it is at heart only a tool; use it right and it can support what you want to do. Tech can be transformative in that it enables transformation; but it’s also possible to apply digital technology in ways that reinforce a status quo, that limit transformation.

Talking too much about digital tools can make us focus on the way we’ll achieve things that might not yet be fully thought through, and the outcomes may then not be what we actually want or need.

There’s also the risk that by concentrating on the digital, we fail to consider non-digital approaches. Technology, as Peter Wells points out, can be exclusive: not everyone is online. Not everyone wants to be online; and not everyone who is, wants to engage with politics online. That’s another reason to focus on the ends.

Also, when people talk about “tech” or “digital” they are often bundling together ideas that happen to be common in those spaces. For instance, “digital” can become a shorthand for agility or for disruption. I’ve also seen it be used as a shorthand for evidence-based policy. Not all those ideas are equal, or even the same type of idea. It’s also unlikely that there’ll be equal support for every concrete plan that stems from them or from political avenues that become feasible by digital technology (such as rapidly polling an electorate).

Bundling a range of ideas together under one label may be valuable when campaigning. It isn’t helpful when discussing how we want politics and government to work, too easily presenting a false view that you have to accept or reject the entire package. That’s often used deliberately as a rhetorical and political technique to avoid debate, but we want the debate and we’re better than that.

Some concrete ideas

Just because we should focus on ends not digital means doesn’t mean it isn’t worth having concrete proposals. Here are just a few that occur to me.

A pool of people willing to help

There are always jobs that need doing in any campaign or political organisation, and some of those for some time have required digital skills. The Labour party already provides training in some of these, but sometimes more specific talents are required, such as building a web app, or taking really good photographs of a candidate; this is by no means specific to people who can provide digital experience.

With some (fairly simple) digital tools allowing people willing to volunteer to update their profiles with availability, skills and talents, we wouldn’t be so reliant on little black books of campaign managers, candidates and other supporters. Some skill-sets don’t require people to be geographically close to the people they’re working with, so if done right and with enough volunteers this should enable any group to find someone to help them, on anything they need.

A community of tool builders

If there are tools, someone has to build them. Whether this is done by explicitly paying people or agencies, or as in-kind donations to the party, the people working on them should do so as part of a wider community of tool builders within the Labour movement, avoiding duplication, and providing more people who can work on any given tool.

I’ll go further and say that this should be an open source community. There will be some who believe this is unwise, providing succour to opposing parties, to which I have some counter-arguments:

The main party in opposition to Labour isn’t remotely short of money. If they want a particular tool they’ll get it. The effort of volunteers, or paid staffers, will in any case often outweigh the cost of particular tools. Conversely, for local campaigns within our movement, removing the cost of tools they need may be the difference between existing or not.
Reducing the cost of running political campaigns of any sort lowers the cost of politics. If we want the political system to work for everyone, we should want it to cost as little as possible.
While some tools will be specific to the UK, not all will. If we want a Labour government for our country, we should want similar for others. Tools we build may be able to help, and tools that others build we will benefit from.

Some policies, widely accepted

As Peter Wells touches on in his article, we probably need some solid governance of things like privacy and data usage policies. People need to be able to trust what will happen to any data they give to any part of Labour, so they can make an informed decision on whether to give it or not.

We also need good standards for ensuring people aren’t overloaded by emails, as Giuseppe Sollazzo wrote in his summary of “digital in the campaigns” in August. In fact, this is just the bottom rung of how to think about communication within the movement. James Darling, commenting on the work needed to embed digital thinking, points out:

Labour movement’s digital revolution is only in its embryonic stage […] ‘Digital’ still means a website and a twitter account, those things understood by last generation’s most prestigious skill set; Public Relations

While broadcast communication is still going to be important, it absolutely cannot end there. That’s going to be a process of experimentation and discovery, but in the meantime I’d settle for getting regular emails when there isn’t a campaign on, and not getting four a day when there is.

Central support

If digital tools are being provided by the community, there may be a need for some centrally-supported hosting of those tools, particularly for smaller campaigns and organisations that just want something that works, and don’t need to dip into the pool of talent. (Particularly if we build the tools right.)

There’s also considerable value, in a social media world, to using the better-connected within the movement (some of which are at the centre, some not) to amplify the voices of those doing interesting work but who don’t have as wide an audience. This is of course how social media works in general, but I suspect there are specific things that can be done, such as actively seeking out interesting stories of issues, campaigns and successes to shout about and inspire people everywhere.

It should be possible, for instance, to provide tools, time and even funding to create videos and articles for online distribution, looking at people working to supplement underfunded council services, or to campaign on local environmental, fairness or access issues. Merely knowing that someone else has had success can encourage others to try the same elsewhere, and these local but incredibly valuable efforts aren’t going to make national newspapers or the evening news. Digital processes give us a huge opportunity to connect people with inspiration and support throughout the country.

The party already provides training and other services centrally; there’s nothing special about digital. (To provide maximum value, the talent pool of volunteers would need to run on a tool provided centrally, for instance.)

I’ve been into elections for far longer than I’ve been interested in the politics around them, if I’m being honest. Polling numbers, the Swingometer, even the timings and logistics around constituency counts can all become fascinating to someone who enjoys nerding out on numbers and data.

But a funny thing happened on the way to the all-night, emergency food-fuelled election watch (if becoming older and more interested in other people can be called “funny”), and over a period of some years I’ve found myself caring about political outcomes, not just for the results, but for the impact they can have on our country, our society and our future.

For a long while I felt that there wasn’t a party political position that matched my views (one of my blogs over the last couple of decades was subtitled “unfashionable politics”), but tides turn, just as I’ve come to realise that being part of a movement where you mostly feel comfortable is better than refusing to compromise, leaving you sitting on the outside, swearing in.

The last few years have been a fantastically interesting time in politics for someone with a digital technology background. US presidential elections since 2004 have made increasing use of technology, and by 2012 both Mitt Romney for President and Obama for America went heavy both on “back office” data analysis and more visible approaches on social media. UK parties haven’t been far behind, with Labour and the Liberal Democrats’ use of Nationbuilder, through Conservative use of highly detailed targeting on Facebook in the run-up to the 2015 General Election. Meanwhile, organisations like MySociety and Full Fact have been filling in some of the digital gaps in both accountability and civic services, while people like Democracy Club have been looking at the same around elections.

Now, it feels like there’s an opportunity for many more people to help out in UK Labour. Maybe that includes you. Maybe that includes me. And maybe I’ll see you in Brighton, to throw some ideas around. Just as long as we keep sight of the ends.

This was originally published on Medium.

Fun with Django storage backends

Published at: Thursday 18th July, 2013

Django abstracts file storage using storage backends, from simple filesystem storage to things like S3. This can be used for processing file uploads, storing static assets, and more. This is just a brief look at some things you can do which are kind of fun.

Using Amazon S3

django-storages is “a collection of custom storage backends”, including support for Amazon S3. You want to use the boto-based one, because it has lots of useful features. You can use it pretty quickly without customisation just by adding a few variables to your settings.py; I tend to put AWS access keys in environment variables rather than have different settings.py for different uses, because it plays better with Heroku.

AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY']
AWS_STORAGE_BUCKET_NAME = os.environ['AWS_STORAGE_BUCKET_NAME']
AWS_QUERYSTRING_AUTH = False
AWS_HEADERS = {
  'Cache-Control': 'max-age=86400',
}
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
# these next two aren't used, but staticfiles will complain without them
STATIC_URL = "https://%s.s3.amazonaws.com/" % os.environ['AWS_STORAGE_BUCKET_NAME']
STATIC_ROOT = ''

DEFAULT_FILE_STORAGE is used when you want to store file-like things attached to your models, using field types like FileField and ImageField; STATICFILES_STORAGE is where the static files pulled together from apps and your project by the collectstatic command end up.

Okay, great. But say we want to do more?

Put static files in a slightly different place

If you subclass the S3BotoStorage class, you can override some of its configuration. There are lots of these, but location is an interesting one because it acts as a prefix for the keys stored in S3.

import storages.backends.s3boto

class PrefixedStorage(storages.backends.s3boto.S3BotoStorage):
  def __init__(self, *args, **kwargs):
    from django.conf import settings
    kwargs['location'] = settings.ASSETS_PREFIX
    return super(PrefixedStorage, self).__init__(*args, **kwargs)

So if we plonk a suitable bit of configuration into our settings.py:

ASSETS_PREFIX = 'assets'
STATICFILES_STORAGE = 'prefixed_storage.PrefixedStorage'

then our assets will be separated from our uploaded media. (You could also put them in a different bucket, using the bucket argument, for which you might also want to set access_key and secret_key differently to the default configuration we put in settings.py earlier.)

Protect some file storage

Most of your media uploads – user avatars, for instance – you want to be public. But if you have some media that requires authentication before you can access it – say PDF resumes which are only accessible to members – then you don’t want S3BotoStorage’s default S3 ACL of public-read. Here we don’t have to subclass, because we can pass in an instance rather than refer to a class.

from django.db import models
import storages.backends.s3boto

protected_storage = storages.backends.s3boto.S3BotoStorage(
  acl='private',
  querystring_auth=True,
  querystring_expire=600, # 10 minutes, try to ensure people won't/can't share
)

class Profile(models.Model):
  resume = models.FileField(
    null=True,
    blank=True,
    help_text='PDF resume accessible only to members',
    storage=protected_storage,
  )

There is no permanent publicly-accessible URL for the uploaded resumes, but it’s easy to write a view that will redirect to a temporary URL. Because we set up S3BotoStorage to use query string-based authentication, when asked for the field’s URL it will contact S3 and ask for a temporary one to be created. The configuration above gives use 600 seconds, or 10 minutes, before that URL becomes invalid and can no longer be used.

from django.views.generic import DetailView
from django.http import HttpResponseForbidden, HttpResponseNotFound, HttpResponseRedirect

class ResumeView(DetailView):
  model = Profile

  def get(self, *args, **kwargs):
    obj = super(ResumeView, self).get_object()
    if not request.user.is_authenticated():
      return HttpResponseForbidden()
    if obj.resume is None:
      return HttpResponseNotFound()
    return HttpResponseRedirect(obj.resume.url)

Or you could just put it in a template, only for members:

{% if user.is_authenticated %}
  <a href='{{ profile.resume.url }}'>Grab my resume</a>
{% endif %}

Making a staging version of your live database

This is something I needed to do recently for NSFWCORP: come up with an easy way of taking a live database dump and making a staging instance out of it. This is all run on Heroku, so moving the database dumps around is easy, and writing something to throw away all non-staff users, old conversation threads and so on is also simple. But I also needed to duplicate the media files from the live bucket to the staging bucket. My solution is as follows:

import os
import os.path
import shutil
import sys

from django.conf import settings
from django.core.management.base import BaseCommand
from django.db.models import get_models, FileField
from storages.backends.s3boto import S3BotoStorage


class Command(BaseCommand):
  output_transaction = True

  def handle(self, *args, **options):
    # we want a django-storages s3boto backend for live, using
    # a dedicated read-only key pair
    storage = S3BotoStorage(
      bucket='nsfw-live',
      access_key=settings.LIVE_READ_ONLY_ACCESS_KEY_ID,
      secret_key=settings.LIVE_READ_ONLY_SECRET_KEY,
    )
    # now just go through all the models looking for stuff to do
    for model in get_models():
      fields = filter(lambda x: isinstance(x, FileField), model._meta.fields)
      if len(fields) > 0:
        sys.stdout.write(u"Copying media for %s..." % model._meta.object_name)
        sys.stdout.flush()
        for obj in model.objects.all():
          for field in fields:
            _if = None
            _of = None
            _file = getattr(obj, field.name)
            if not _file.name:
              continue
            try:
              _if = storage.open(_file.name, 'rb')
              if not settings.AWS_AVAILABLE:
                full_path = _file.path
                directory = os.path.dirname(full_path)
                if not os.path.exists(directory):
                  os.makedirs(directory)
                if not os.path.exists(full_path):
                  with open(full_path, 'wb'):
                    pass
              _of = _file.storage.open(_file.name, 'wb')
              shutil.copyfileobj(_if, _of)
            except Exception as e:
              sys.stdout.write(u"\n  failed %s(pk=%i).%s = %s: " % (
                model._meta.object_name,
                obj.pk,
                field.name,
                _file.name
              ))
              sys.stdout.write(unicode(e))
            finally:
              if _if is not None:
                _if.close()
              if _of is not None:
                _of.close()
        sys.stdout.write("done.\n")

Note that there are three new settings.py variables: LIVE_READ_ONLY_ACCESS_KEY_ID and LIVE_READ_ONLY_SECRET_KEY should be fairly obvious, and AWS_AVAILABLE just tells me whether AWS support is configured in the environment, which I use to ensure the destination path and file exist in advance for local storage. I could avoid that by doing something like _file.save(_file.name, _of), although I’m not entirely sure that will preserve file paths and names. It’s cleaner though, and is probably a better solution.

Summing up

The Django storage API and pluggable backends gives a lot of flexibility in how you manage both static assets and file-like things. As well as django-storages there are plenty of other options for when the built-in file system options aren’t suitable for you.

Running statsd on Heroku

Published at: Thursday 18th April, 2013

statsd is a “simple daemon for easy stats aggregation”: you send it stats whenever you can (such as when rendering a web page), and it aggregates them internally and passes them upstream to something that can store them and make them available for other clients for analysis, graphing and so on. Upstream stores from statsd might include the Carbon storage engine from Graphite that you can run yourself somewhere, or a hosted service such as Librato. You can combine the two by using Hosted Graphite, which does exactly what it says on the tin.

Heroku is an infrastructure as a service company that provides an abstraction over servers, virtual machines and so forth geared to web deployment, as well as a toolchain for working with that.

It would be nice if we could use them together, and the good news is that we can. I wrote this because I couldn’t find anything online that spells out how to. The code and configuration is available on github.

How we’re going to do this

A simple deployment of statsd is this: put one instance on each physical machine you have, and point them all at a storage system. (You can also chain instances together, and have instances send their data on to multiple receivers. Let’s just ignore all of that, because then you probably don’t want to host on Heroku, and if you do you can certainly figure out how this all applies to your setup.)

On Heroku, we don’t have physical machines; in fact there isn’t the concept of “machine” at all. Instead, Heroku has Dynos, which are described as “lightweight containers” for UNIX processes. From their documentation:

[A Dyno] can run any command available in its default environment combined with your app’s slug

(The slug is basically your codebase plus dependencies.)

When working with physical machines there’s a tendency to put a number of different types of process on each, to avoid having to buy and manage more of them. With virtualisation, and hosting systems such as Amazon EC2, this isn’t so important, and with Heroku their entire architecture is set up almost to mandate that you have different types of Dynos (called process types) for different jobs; almost always a web type that is basically your application server, probably a secondary worker type that handles any long-running operations asynchronously to web requests, and so on.

However this doesn’t mean we can’t run multiple UNIX processes within one Dyno. Providing each process type is still only doing one thing, it still fits the Heroku semantics. This means we can tuck a statsd instance away in each Dyno, so it will aggregate information from the work being done there, with each statsd sending its aggregated data upstream.

(Why not have a process type for statsd and send all data to one or two Dynos before aggregating it upstream? Because statsd works over UDP for various sound reasons, but Heroku doesn’t provide UDP routing for its Dynos. Even if it did, you wouldn’t want to do things that way because UDP between arbitrary Dynos running who knows where within Heroku’s virtualised infrastructure can fall foul of all sorts of intermediate network issues.)

A demonstration app

Process types are configured in your app’s Procfile, so we want a single command that launches both statsd and whatever the main work of this Dyno is going to be. Let’s start by making a simple Flask app and deploying it to Heroku without statsd.

# requirements.txt
Flask==0.9
gunicorn==0.17.2

# web.py
from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

if __name__ == "__main__":
    app.run()

And a simple Procfile to launch that:

# Procfile
web: gunicorn -b "0.0.0.0:$PORT" -w 4 web:app

If we turn this into a git repo, create a Heroku app and push everything up, we’ll be able to see our very boring homepage.

$ git init
$ git add requirements.txt Procfile web.py
$ git commit -a -m 'Simple Flask app for Heroku.'
$ heroku apps:create
Creating afternoon-reaches-9313... done, stack is cedar
http://afternoon-reaches-9313.herokuapp.com/ | git@heroku.com:afternoon-reaches-9313.git
Git remote heroku added
$ git push heroku master

(Lots of unimportant output removed; the important bit is the output from heroku apps:create which tells you the URL.)

Okay, all is well there. Let’s get statsd into play.

Doing two things at once in a Dyno

The key here is to put a command in the Procfile which launches both gunicorn and the statsd. A simple choice here is honcho, which is a python version of foreman. (If we were doing this using the Heroku Ruby runtime (say a Rails or Sinatra app) then it would make sense to use foreman instead.)

As we’re working in the python side of things, let’s add a simple statsd counter to our web app at the same time.

# requirements.txt
Flask==0.9
gunicorn==0.17.2
honcho==0.4.0
python-statsd==1.5.8

# web.py
import statsd
from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    counter = statsd.Counter("Homepage hits")
    counter += 1
    return "Hello World!"

if __name__ == "__main__":
    app.run()

Honcho uses a Procfile itself to figure out what to launch, so we need to give it separate configuration from the main Heroku one:

# Procfile.chain
web: gunicorn -b "0.0.0.0:$PORT" -w 4 web:app
statsd: cat /dev/zero

At this point we don’t know how to launch a statsd so we’ll just have it launch a dummy command that will keep running while gunicorn does its work. Then we need the main Heroku Procfile to launch honcho instead of gunicorn directly:

# Procfile
web: USER=nobody PORT=$PORT honcho -f Procfile.chain start

(The USER environment variable is needed because of how honcho defaults some of its options.)

And push it to Heroku:

$ git add requirements.txt Procfile Procfile.chain  web.py 
$ git commit -a -m 'Run gunicorn + dummy process; python will try to push to statsd'
$ git push heroku master

The python that tries to push a counter to statsd will fail silently if there isn’t one running, so all is well and you should still be able to get to your homepage at whichever URL Heroku gave you when you created the app.

Running statsd on Heroku

statsd is a node.js program, so we want the Heroku node.js support in order to run it. Heroku supports different languages using buildpacks – and we’re already using the Python buildpack to run Flask. Fortunately there are community-contributed buildpacks available, one of which suits our needs: heroku-buildpack-multi allows using multiple buildpacks at once. We need to set this as the buildpack for our app:

$ heroku config:add BUILDPACK_URL=https://github.com/ddollar/heroku-buildpack-multi.git

Then we can add a .buildpacks file that lists all the buildpacks we want to use.

http://github.com/heroku/heroku-buildpack-nodejs.git
http://github.com/heroku/heroku-buildpack-python.git

The node.js buildpack uses package.json to declare dependencies:

/* package.json */
{
  "name": "heroku-statsd",
  "version": "0.0.1",
  "dependencies": {
    "statsd": "0.6.0"
  },
  "engines": {
    "node": "0.10.x",
    "npm":  "1.2.x"
  }
}

statsd itself needs a tiny amount of configuration; at this point we’re not going to consider an upstream, so we want it to log every message it gets sent so we can see it in the Heroku logs:

/* statsd-config.js */
{
  dumpMessages: true
}

And finally we want to chain Procfile.chain so honcho knows to launch statsd:

web: gunicorn -b "0.0.0.0:$PORT" -w 4 web:app
statsd: node node_modules/statsd/stats.js statsd-config.js

Push that up to Heroku:

$ git add .buildpacks package.json statsd-config.js Procfile.chain
$ git commit -a -m 'Run statsd alongside gunicorn'
$ git push heroku master

If you hit your Heroku app’s URL you won’t see anything different, but when you check your Heroku logs:

$ heroku logs
2013-04-17T14:06:38.766960+00:00 heroku[router]: at=info method=GET path=/ host=afternoon-reaches-9313.herokuapp.com fwd="149.241.66.93" dyno=web.1 connect=2ms service=5ms status=200 bytes=12
2013-04-17T14:06:38.780056+00:00 app[web.1]: 14:06:38 statsd.1 | 17 Apr 14:06:38 - DEBUG: Homepage hits:1|c

Again I’ve removed a lot of boring output to focus on the two important lines: the first (from the Heroku routing layer; gunicorn itself doesn’t log by default) shows the request being successfully processed, and the second shows statsd getting our counter.

Pushing upstream

Both Librato and Hosted Graphite provide statsd backends so you can aggregate directly to them. For Librato the plugin is statsd-librato-backend, and for Hosted Graphite it’s statsd-hostedgraphite-backend. Other options will either have their own backends, or you can always write your own.

As well any configuration to support your chosen upstream, you probably want to drop the dumpMessages: true line so your Heroku logs are tidier.

Running locally

Everything we’ve done here will work locally as well. Assuming you have node.js (and npm) installed already, and you have virtualenv on your system for managing python virtual environments, just do:

$ virtualenv ENV
$ source ENV/bin/activate
$ ENV/bin/pip install -r requirements.txt
$ npm install
$ honcho -f Procfile.chain start

Caveats

I haven’t used this in production (yet), so beyond the concept being sound I can’t commit to its working without problems. In particular, things to think about include:

honcho isn’t usually used in production, so may have gotchas (note that if any process running under honcho quits the entire thing will shut down, which means the Dyno will die and be replaced; this is almost certainly what you want)
I don’t know how Dyno teardown works, and so statsd may lose data on Dyno cycle (which is rarely a huge problem)
Not actually tested on more than one dyno at once

Certainly if you put this into production I’d pay attention to Heroku platform errors, do spot checks on data coming out of statsd if you can, and generally be cautious.

Accessibility Advent: normal accessibility advice still applies

Published at: Monday 24th December, 2012

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

It’s still advent, but lots of people have already started their trips to wherever they’re spending Christmas, so I just wanted to point out that a lot of the the normal accessibility advice helps voice users too. Nuance, who make Dragon, have guidelines for speech-accessible HTML which are worth looking at, even though they’re a few years old now, and based on the Windows version which has more features than the Mac version.

For a view of Dragon from the point of view of a web developer who just wanted to learn a little about Dragon and using it, check out Jon Whiting’s article from last year, Assistive Technology Experiment: Dragon NaturallySpeaking (he links to the same guidelines as above, although curiously under a different URL).

I’d recommend all web developers spend some time using either the Windows or Mac version; although it’s a significant amount of money to spend, it’s cheaper than some assistive technologies such as JAWS. If you’d like to have me come and talk to your company about using computers by voice, then please get in touch. I can of course include a demonstration (swearing as things go wrong strictly optional; please express preference at time of booking ;-), and if desired can perform a review in advance of a web site or app you’ve built, which can drive both demonstration and discussion.

Accessibility Advent: scrolling issues redux

Published at: Friday 21st December, 2012

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

Earlier this month I wrote about paging, suggesting that if you want floated information around your main content, the main content should scroll separately to the rest of your webpage. It turns out there’s a problem with this, although it is fixable. For a demonstration of what can go wrong, we turn to Quartz.

Looking at a Quartz story, there are three different things that might be scrollable – a list of topics that could scroll horizontally at the top, a list of stories that could scroll vertically down the left hand side and the story itself, on the right. (As I look at it now the main content is preceded in its scrollable area by an advert that takes up almost all of my screen. If I looked at this on a netbook, I wouldn’t have seen any content at all.)

You could, perhaps, make an argument for scrolling the stories list by default. Certainly there’s a strong argument for strolling the content itself by default. I think the topics list is fine as it is. There is no argument on earth that justifies what actually happens, which is that nothing scrolls by default.

What I mean is that pressing page up and page down, the keyboard access to scrolling (and, as it happens, the voice access as well), does absolutely nothing. Why not? Because the wrong thing has focus: I’m guessing the entire page (which is the default), but since that’s a fixed size filling the window, it won’t scroll. A simple touch of JavaScript will focus the correct scrollable area, and make life easier.

So really this is about focus, in which case I’ll take advantage of the opportunity to point out something nice I noticed about Barclays’ online banking system today. They have a fairly modern web app style, meaning that a lot of operations bring up a modal overlay to get them done. So making payments, transferring money and so on all have these little in-page dialogs. Not only do they support the escape key, they go one further when opening the overlays to make sure that focus is transferred to the first element of that overlay. This means you can tab through the controls on the overlay without worrying about what was behind it – avoiding a major annoyance in having overlays. They also have a number of other subtle features, such as showing jump links on focus.

Accessibility Advent: only I get to put stuff in text inputs

Published at: Thursday 20th December, 2012

(Throughout Advent I’m sharing some hints as to how web developers can make my life as a speech recognition user easier.)

I wrote yesterday about enhancing long drop-down menus to turn them into combo boxes, which act more like text areas and so are somewhat more tractable to voice. However you can still screw them up; here are two ways I’ve seen recently.

The first is where you’re implementing auto complete on a text area. The best way of doing this is to provide a drop-down menu of possible completions and only fill the text area when one is explicitly selected. (This is how Google Search does it, for instance.) This means that until I finished dictating into the text area I can continue to use Dragon commands to correct that dictation. If you remember, Dragon maintains an internal view of what the text is in the current input field, so if you complete automatically in the text field this internal view is now incorrect. We might have the following, having dictated “James":

James Aylett

“James” has been input, but the web app has added “Aylett”, and the text cursor will be after that.

If I actually meant “chains”, and I use Dragon’s correction commands, Dragon will try to correct the word immediately before the text cursor, which it thinks is “James”, but which is actually “Aylett”. Dragon typically uses character by character selection, so what we are likely to end up with is something like “James Achains”.

Note that once the user has selected a completion from the menu, the text input is naturally going to contain some stuff that Dragon doesn’t know about. Voice users should be able to spot these kind of explicit situations, and have a command specifically to resynchronize Dragon’s view of an input, if they need to edit it further.

The other problem is more insidious, although I haven’t seen it in a web app as yet. It’s the way Google Chrome makes its address bar work. Someone on the Chrome team clearly decided that the “http://” part of the URL wasn’t necessary; other schemes are shown, so why take up space showing the most common? Except when you cut and paste the URL from Chrome it always includes the scheme, even if it’s HTTP. This is very clever, but trips up Dragon.

If you want to synchronize Dragon’s view of an input with what’s already there, you say “cache document”. It then selects all the text, and copies it into its own view; then it manually moves the cursor to the beginning then forward to the place where Dragon believes it to be. At this point (usually) Dragon and the input match each other, and voice editing commands will work smoothly.

But when the URL is copied out of Chrome’s address bar, the “http://” part is added to the front, meaning that Dragon thinks it’s there but Chrome, when editing commands are applied to the input field, does not. This creates a similar problem to the first example, in that trying to select parts of the text (to replace it, to add more before or after it, or to apply formatting commands such as capitalization) will select the wrong characters.

So with Google’s URL in the address bar (Google actually uses HTTPS for everything these days, but it’s easier to pretend it doesn’t for the purposes of explanation than to find a website that won’t move to HTTPS in the future), and Dragon thinking it’s synchronized, we might choose to go instead to Microsoft’s website, saying “select Google / Microsoft”. Dragon’s internal view is now “http://Microsoft.com”. However it will attempt to make that edit by selecting the eighth to thirteenth characters and typing “Microsoft” over them. Because the scheme isn’t in the actual text input, you end up with: “google.Microsoft”.

The only way of working round this is to copy the contents of the address bar out, edit them separately from Google Chrome, and then copy them back in again.

James Aylett: Recent diary entries

Getting work from others

Incorporating work from others

Merging to incorporate others’ work

Rebasing to incorporate others’ work

Some caution around rebasing

In summary

Trying to explain from the spec

Supporting what’s there

Networking the party

Building the digital community

Some concrete ideas

A pool of people willing to help

A community of tool builders

Some policies, widely accepted

Central support

Using Amazon S3

Put static files in a slightly different place

Protect some file storage

Making a staging version of your live database

Summing up

How we’re going to do this

A demonstration app

Doing two things at once in a Dyno

Running statsd on Heroku

Pushing upstream

Running locally

Caveats