I’m a mentor for Xapian within Google Summer of Code again this year, and since we’re a git-backed project that means introducing people to the concepts of git, something that is difficult partly because git is complex, partly because a lot of the documentation out there is dense and can be confusing, and partly because people insist on using git in ways I consider unwise and then telling other people to do the same.
Two of the key concepts in git are branches and remotes. There are many descriptions of them online; this is mine.
remotes are repositories other than the one you’re using, and to which you have access (via a URL and probably some authentication)
remotes have names; you probably have one called “origin”, which will be wherever you cloned your repository from – throughout this article I assume that collaborators all push their changes back to this remote, and fetch others’ changes from there
branches are, somewhat confusingly, pointers to particular commits in your repository; they are either local branches or remote (tracking) branches
local branches are ones that you can work on
remote tracking branches are branches in your repository which mirror local branches in remotes you’re working with
remote tracking branches don’t automatically update when someone changes the remote; you have to tell them to update, and you do so using
git fetch <remote>
, which pulls down the commits you don’t have, and adjusts the remote tracking branches to point to the right placeat that point your repository has commits from other people, but they aren’t yet incorporated into the code you see on your local branch
you can see which commits are behind which branches using
git show-branch
you can then incorporate commits from remote tracking branches into your local branch using a range of options; here I’ll talk about
git merge
andgit rebase
, because they both play well in collaborative environments
For the rest of this article I’m only going to consider the common case of multiple people collaborating to make a single stream of releases (whether that’s open source software tagged and packaged, or perhaps commercial software that’s deployed to a company’s infrastructure, like a webapp). I also won’t consider what happens when merges or rebases fail and need manual assistance, as that’s a more complex topic.
Getting work from others
One of the key things you need to be able to do in collaborative development is to accept in changes that other people have made while you were working on your own changes. In git terms, this means that there’s a remote that contains some commits that you don’t have yet, and a local branch (in the repository you’re working with) that probably contains commits that the remote doesn’t have yet either.
First you need to get those commits into your repository:
$ git fetch origin
remote: Counting objects: 48, done.
remote: Total 48 (delta 30), reused 30 (delta 30), pack-reused 18
Unpacking objects: 100% (48/48), done.
From git://github.com/xapian/xapian
9d2c1f7..91aac9f master -> origin/master
The details don’t matter so much as that if there are no new commits for you from the remote, there won’t be any output at all.
Note that some git guides suggest using git pull
here. When working
with a lot of other people, that is risky, because it doesn’t give you
a chance to review what they’ve been working on before accepting it
into your local branch.
Say you have a situation that looks a little like this:
[1] -- [2] -- [3] -- [4] <--- HEAD of master
\
\-- [5] -- [6] -- [7] <--- HEAD of origin/master
(The numbers are just so I can talk about individual commits clearly. They actually all have hashes to identify them.)
What the above would mean is that you’ve add two commits on your local branch master, and origin/master (ie the master branch on the origin remote) has three commits that aren’t in your local branch.
You can see what state you’re actually in using git show-branch
. The
output is vertical instead of horizontal, but contains the same
information as above:
$ git show-branch origin/master master
! [origin/master] 7 message
* [master] 4 message
--
* [master] 4 message
* [master^] 3 message
+ [origin/master] 7 message
+ [origin/master^] 6 message
+ [origin/master~2] 5 message
+* [origin/master~3] 2 message
Each column on the left represents one of the branches you give to the
command, in order. The top bit, above the line of hyphens, gives a
summary of which commit each branch is at, and the bit below shows you
the relationship between the commits behind the various branches. The
things inside []
tell you how to address the commits if you need to;
after them come the commit messages. (The stars *
show you which
branch you currently have checked out.)
From this it’s fairly easy to see that your local branch master has two commits that aren’t in origin/master, and origin/master has three commits that aren’t in your local branch.
Incorporating work from others
So now you have commits from other people, and additionally you know that your master branch and the remote tracking branch origin/master have diverged from a common past.
There are two ways of incorporating that other work into your branch: merging and rebasing. Which to use depends partly on the conventions of the project you’re working on (some like to have a “linear” history, which means using rebase; some prefer to preserve the branching and merging patterns, which means using merge). We’ll look at merge first, even though a common thing to be asked to do to a pull request on github is to “rebase on top of master” or similar.
Merging to incorporate others’ work
Merging leaves two different chains of commits intact, and creates a merge commit to bind the changes together. If you merge the changes from origin/master in the above example into your local master branch, you’ll end up with something that looks like this:
[1] -- [2] -- [3] -- [4] --------- [8] <--- HEAD of master
\ /
\-- [5] -- [6] -- [7] --/
You do it using git merge
:
$ git merge origin/master
Updating 9d2c1f7..91aac9f
Fast-forward
.travis.yml | 26 ++++++++++++++++++++++++++
bootstrap | 10 ++++++++--
2 files changed, 34 insertions(+), 2 deletions(-)
create mode 100644 .travis.yml
It will list all the changes in the remote tracking branch which were incorporated into your branch.
Rebasing to incorporate others’ work
What we’re doing here is to take your changes since your local branch and remote tracking branch diverged and move them onto the current position of the remote tracking branch. For the example above you’d end up with something that looks like this:
[1] -- [2] -- [5] -- [6] -- [7] -- [3'] -- [4'] <--- new HEAD of master
Note that commits [3]
and [4]
have become [3']
and [4']
–
they’re actually recreated (meaning their hash will change), which is
important as we’ll see in a minute.
You do this as follows:
$ git rebase origin/master
First, rewinding head to replay your work on top of it...
Applying: 3 message
Applying: 4 message.
Some caution around rebasing
Rebasing is incredibly powerful, and some people get trigger happy and use it perhaps more often than they should. The problem is that, as noted above, the commits you rebase are recreated; this means that if anyone had your commits already and you rebase those commits, you’ll cause difficulties for those other people. In particular this can happen while using pull requests on github.
A good rule of thumb is:
you can rebase at any time up until the point when you submit code for review (either at the point you open the pull request, or the point where you ask people to look at it)
from then on, you shouldn’t rebase until everyone has finished reviewing the code, you have made changes based on those comments, and they have checked those changes to ensure their concerns have been addressed; if someone suggests a change which you then make, but you rebase in the process, it can be difficult for them to see what’s happened
when making changes based on pull request comments, you can use
git commit --fixup <earlier commit>
to quickly make a commit with a message that will be easy to flatten into the earlier commit just before merging the pull requestat the end of review, before a pull request is merged, you can do a final rebase (a lot of projects have a process where someone will explicitly prompt that this is the time to do so); that allows you both to ensure you’re properly integrated with the latest upstream code and to collapse “fixup” commits into the right place
Rebasing during pull requests is discussed in this Thoughtbot article.
In summary
Most of the time, your cycle of work is going to look like this:
git add -p
to add changes to the git stagegit commit -v
to create commits out of those changesgit fetch
to get others’ recent changesgit show-branch
to see what those changes aregit merge
orgit rebase
to incorporate those changes
Following that you can use git push
and pull requests, or whatever
other approach you need to do to start the review process ahead of
getting your changes applied.