James Aylett: Git remotes

Published at
Sunday 5th June, 2016

I’m a mentor for Xapian within Google Summer of Code again this year, and since we’re a git-backed project that means introducing people to the concepts of git, something that is difficult partly because git is complex, partly because a lot of the documentation out there is dense and can be confusing, and partly because people insist on using git in ways I consider unwise and then telling other people to do the same.

Two of the key concepts in git are branches and remotes. There are many descriptions of them online; this is mine.

For the rest of this article I’m only going to consider the common case of multiple people collaborating to make a single stream of releases (whether that’s open source software tagged and packaged, or perhaps commercial software that’s deployed to a company’s infrastructure, like a webapp). I also won’t consider what happens when merges or rebases fail and need manual assistance, as that’s a more complex topic.

Getting work from others

One of the key things you need to be able to do in collaborative development is to accept in changes that other people have made while you were working on your own changes. In git terms, this means that there’s a remote that contains some commits that you don’t have yet, and a local branch (in the repository you’re working with) that probably contains commits that the remote doesn’t have yet either.

First you need to get those commits into your repository:

$ git fetch origin
remote: Counting objects: 48, done.
remote: Total 48 (delta 30), reused 30 (delta 30), pack-reused 18
Unpacking objects: 100% (48/48), done.
From git://github.com/xapian/xapian
   9d2c1f7..91aac9f  master     -> origin/master

The details don’t matter so much as that if there are no new commits for you from the remote, there won’t be any output at all.

Note that some git guides suggest using git pull here. When working with a lot of other people, that is risky, because it doesn’t give you a chance to review what they’ve been working on before accepting it into your local branch.

Say you have a situation that looks a little like this:

[1] -- [2] -- [3] -- [4] <--- HEAD of master
          \-- [5] -- [6] -- [7] <--- HEAD of origin/master

(The numbers are just so I can talk about individual commits clearly. They actually all have hashes to identify them.)

What the above would mean is that you’ve add two commits on your local branch master, and origin/master (ie the master branch on the origin remote) has three commits that aren’t in your local branch.

You can see what state you’re actually in using git show-branch. The output is vertical instead of horizontal, but contains the same information as above:

$ git show-branch origin/master master
! [origin/master] 7 message
 * [master] 4 message
 * [master] 4 message
 * [master^] 3 message
+  [origin/master] 7 message
+  [origin/master^] 6 message
+  [origin/master~2] 5 message
+* [origin/master~3] 2 message

Each column on the left represents one of the branches you give to the command, in order. The top bit, above the line of hyphens, gives a summary of which commit each branch is at, and the bit below shows you the relationship between the commits behind the various branches. The things inside [] tell you how to address the commits if you need to; after them come the commit messages. (The stars * show you which branch you currently have checked out.)

From this it’s fairly easy to see that your local branch master has two commits that aren’t in origin/master, and origin/master has three commits that aren’t in your local branch.

Incorporating work from others

So now you have commits from other people, and additionally you know that your master branch and the remote tracking branch origin/master have diverged from a common past.

There are two ways of incorporating that other work into your branch: merging and rebasing. Which to use depends partly on the conventions of the project you’re working on (some like to have a “linear” history, which means using rebase; some prefer to preserve the branching and merging patterns, which means using merge). We’ll look at merge first, even though a common thing to be asked to do to a pull request on github is to “rebase on top of master” or similar.

Merging to incorporate others’ work

Merging leaves two different chains of commits intact, and creates a merge commit to bind the changes together. If you merge the changes from origin/master in the above example into your local master branch, you’ll end up with something that looks like this:

[1] -- [2] -- [3] -- [4] --------- [8] <--- HEAD of master
         \                         /
          \-- [5] -- [6] -- [7] --/

You do it using git merge:

$ git merge origin/master
Updating 9d2c1f7..91aac9f
 .travis.yml | 26 ++++++++++++++++++++++++++
 bootstrap   | 10 ++++++++--
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 .travis.yml

It will list all the changes in the remote tracking branch which were incorporated into your branch.

Rebasing to incorporate others’ work

What we’re doing here is to take your changes since your local branch and remote tracking branch diverged and move them onto the current position of the remote tracking branch. For the example above you’d end up with something that looks like this:

[1] -- [2] -- [5] -- [6] -- [7] -- [3'] -- [4'] <--- new HEAD of master

Note that commits [3] and [4] have become [3'] and [4'] – they’re actually recreated (meaning their hash will change), which is important as we’ll see in a minute.

You do this as follows:

$ git rebase origin/master
First, rewinding head to replay your work on top of it...
Applying: 3 message
Applying: 4 message.

Some caution around rebasing

Rebasing is incredibly powerful, and some people get trigger happy and use it perhaps more often than they should. The problem is that, as noted above, the commits you rebase are recreated; this means that if anyone had your commits already and you rebase those commits, you’ll cause difficulties for those other people. In particular this can happen while using pull requests on github.

A good rule of thumb is:

Rebasing during pull requests is discussed in this Thoughtbot article.

In summary

Most of the time, your cycle of work is going to look like this:

Following that you can use git push and pull requests, or whatever other approach you need to do to start the review process ahead of getting your changes applied.