May 2018

Level up your Git

Who is this for?
Why Git
The Git workflow
- Common workflows
Merging
- Pulling with --rebase
Git configuration
Rebasing
- Pulling with rebase
- Rebasing my local commits
Example Git configuration

Who is this for?

If you consider yourself a Git expert, well-versed in the merge conflict and interactive rebase, this isn't for you. This is for anyone who's been using Git in a basic day-to-day setting, and wants some tips and tricks to make their Git experience a little bit nicer. This article includes some workflow and configuration recommendations that you might not be aware of, based on my personal experience and the kind of issues we encounter at Cogapp.

Why Git

Git is a powerful distributed version-control system that has largely superceded predecessors like SVN and CVS. There are many reasons for this, but the most compelling for me are:

Speed
Distribution
Cheap branching model

Speed is important—we want our tools to help us be as productive as possible, and anyone who has worked on a large svn project will know the pain of waiting for large merges and checkouts. A large proportion of Git's speed goes hand-in-hand with it being a distributed version-control system—having all of the information about the codebase available locally means that common operations are vastly quicker.

Distribution can also be helpful in letting you work with your code on the move. The disadvantage of a centralised version-control system like svn is that unless you can talk to the central repository, you can't commit anything. Git allows you to work locally until you can talk to your origin server, making life that much simpler.

And in any developer workflow, having a cheap (and simple) branching and merging model is a godsend. Although it takes some getting used to after the reflexive fear of merging that comes from using svn, where forgetting to run a single command can spell disaster, Git makes creating and merging feature-branches (largely) issue-free. Because of the distributed model it's also possible to try out your merges locally before having to push them remotely, giving you added peace-of-mind.

The Git workflow

Any basic team-based workflow in Git will involve using a few core commands:

git clone
git checkout
git add
git commit
git push
git merge

Working from a core develop branch, a feature-branch-driven development workflow might look something like:

git clone git://...
git checkout -b my-feature-branch
... Write code
git add .
git commit
... Repeat until complete
git push

Then, as a reviewer merging the feature branch:

git checkout my-feature-branch
... Review
git checkout develop
git pull
git merge my-feature-branch
git push

You'll get a long way with this basic set of commands, but there are plenty of things that you can do to make your life easier (both in terms of configuration and process) as your codebase gets larger and more complex. I'm going to cover a few of those here.

Common workflows

Having a defined workflow is important to ensure that everyone is working to the same branching model and knows what to do in the case of bugs or hotfixes. We use Gitflow (with a few tweaks), and I'd recommend this as a starting point for anyone that doesn't have a workflow already in place.

There are also variants of this available following the same general feature-branching model.

Merging

Merging is one of the more painful experiences you can have in Git (as in any version-control system) when it goes wrong. In my experience there are a couple of things that are the trickiest to resolve:

Large diffs
Parallel changes to similar pieces of code

Issues with large diffs can be avoided by keeping your feature branches as compact as possible—if you work on Agile projects this goes hand-in-hand with having small features. Indications that a branch is going to be troubling to merge can be either that it is long-running (has gone for a long period of time without being merged), or that it has a large number of changed lines of code.

Parallel changes, such as those from merged feature branches, are tricker, and in many cases will require human intervention (i.e. talking to the person who made the other set of changes) in order to resolve. However, there are some configuration changes that you can make which may make this process simpler.

In the case of refactoring or introduction of similar code, the basic Git diff algorithm can give odd diffs, making it hard to decipher what has actually changed. Take this basic bit of PHP:

<?php

$config['foo'] = array(
  'foo',
  'bar',
  'baz',
);

$config['bar'] = array(
  'foo',
  'bar',
  'baz',
);

Now imagine we've refactored this by moving bar above foo and adding a new value:

<?php

$config['bar'] = array(
  'foo',
  'bar',
  'baz',
);

$config['foo'] = array(
  'foo',
  'bar',
  'baz',
  'qux',
);

A standard git diff for this code doesn't come out quite as you'd expect:

diff --git a/foo.php b/foo.php
index 954387f..c4916ae 100644
--- a/foo.php 100644
+++ b/foo.php
@@ -1,13 +1,14 @@
 <?php

-$config['foo'] = array(
+$config['bar'] = array(
   'foo',
   'bar',
   'baz',
 );

-$config['bar'] = array(
+$config['foo'] = array(
   'foo',
   'bar',
   'baz',
+  'qux',
 );

Whilst it correctly reflects what has physically changed here, it doesn't accurately represent the fact that we've really moved an entire block of code in the file. By using a more advanced diffing algorithm which tries to group together related changes, you can achieve a more human-readable diff. With git diff --histogram:

diff --git a/foo.php b/foo.php
index 954387f..c4916ae 100644
--- a/foo.php
+++ b/foo.php
@@ -1,13 +1,14 @@
 <?php

-$config['foo'] = array(
-  'foo',
-  'bar',
-  'baz',
-);
-
 $config['bar'] = array(
   'foo',
   'bar',
   'baz',
 );
+
+$config['foo'] = array(
+  'foo',
+  'bar',
+  'baz',
+  'qux',
+);

This diff, whilst larger than the first, more accurately represents the nature of the change—instead of changing three disparate lines we can see that an entire block of code has been moved. The downside is that we've now lost the representation of what has changed within $config['foo'], which may be less helpful in larger diffs. However, I'd find this kind of diff much more helpful in many cases to see where blocks of code have been moved or introduced, particularly in the case of auto-generated code which often has these kinds of changes. Small line-by-line changes will still be reflected as such, giving visual clarity to your diff.

You can set histogram as the default diffing algorithm with:

git config --global diff.algorithm histogram

You can also use it for pulls and merges with:

git merge --strategy=recursive --Xdiff-algorithm=histogram
git pull --strategy=recursive --Xdiff-algorithm=histogram

I don't believe this can be set as a default—some answers point to a core.mergeoptions configuration option, but from what I can tell this only exists as branch.<name>.mergeOptions.

Pulling with `--rebase`

When pulling code changes, I'd strongly recommend taking advantage of the --rebase flag and pulling with git pull --rebase. Rebasing can be a scary thing in Git, and I'll talk about it in more detail later on, but it can also make your life simpler if used in a controlled manner and without violating the rebase prime directive.

The manual explains rebasing in more detail, but in a nutshell rebasing will place any commits you've made locally onto the tip of the branch ahead of any remote commits that you've not yet pulled down. This means that the branch doesn't become filled with pointless merge commits every time you want to reintegrate remote changes, and gives you the same opportunities to resolve conflicts.

Git configuration

Whilst the basic settings that come with Git are perfectly usable, there are some improvements that you can make which will improve your day-to-day experience.

Autocompletion

The Git documentation provides an autocompletion script to make your life easier when you can't quite remember the name of the branch you're looking for. You can download and set this up following the instructions in the documentation.

Whitespace

Setting core.autocrlf to the correct configuration for your platform will eliminate issues arising from mixed or conflicting line-endings. For my purposes (working on Linux and OSX) I want to enforce newlines as line-endings, so would set:

git config --global core.autocrlf input

Setting core.whitespace can help flag up whitespace issues. The default options enabled are blank-at-eol, blank-at-eof and space-before-tab, which will flag whitespace at the end of lines, whitespace at the end of a file, and spaces before tabs at the beginning of a line respectively. If you work with indentation in spaces (and since it's 2018, why wouldn't you?) then you might want to use tab-in-indent instead with something like:

git config --global core.whitespace blank-at-eol,blank-at-eof,tab-in-indent,-space-before-tab

If you're some kind of tab-loving lunatic, you might instead use:

git config --global core.whitespace blank-at-eol,blank-at-eof,indent-with-non-tab,space-before-tab

The space-before-tab is an interesting option as it takes the middle road and lets you use either tabs or spaces with a small amount of sanity-checking if you've started the line with spaces then switched to tabs. However, there are still plenty of ways you could start mixing up indentation so I'd suggest just picking one option and rigorously enforcing it.

As with all other settings in this document you can set them on a per-project basis by exchanging --global for --local, in case some of your projects use tabs and others use spaces.

Ignored files

It's standard practice to create a .gitignore file for your project, but it's also possible to create a global .gitignore file which will apply to every project on your machine. GitHub has a guide on creating various kinds of .gitignore files, with some recommended lists of files to exclude. The basic process is:

touch ~/.gitignore_global
git config --global core.excludesfile ~/.gitignore_global

After this you can add your ignored files to ~/.gitignore_global in the same way you would a project .gitignore file.

It's important to note that these changes will not apply to other developers, so they should only be used for files specific to your system or tools. For example, .DS_Store files if you're on OSX or .idea files if you use PHPStorm, but not *.pyc files if your project is written in Python.

Merge tool

If you're resolving merge conflicts without a merge tool or an IDE, you might find it helpful to configure one to use in conjunction with git mergetool. Git supports a range of options out-of-the-box with the merge.tool config option.

Rebasing

Broadly speaking, git rebase is a tool that lets you rewrite the history of a branch by moving or modifying commits. It is very powerful, and as with many Git commands can be invoked in ways that cause you tremendous pain. However, if used carefully it can be another tool to make you life a little bit nicer when using Git.

I would only recommend using git rebase more widely if you've played around with a little, and are happy that you understand the potential side-effects and how to back out of a given rebase operation. Whilst some people will advocate for doing things like branch squashing and rebasing shared branches, where you can have the potential to have an impact on other users, I will instead recommend a single prime directive if you're new to rebasing:

Don't rebase pushed commits
Me, 2018

However, there are a couple of cases where I believe rebasing can be very useful in terms of keeping your Git history that little bit saner.

Pulling with rebase

As mentioned above, I will always pull with --rebase if I've made some local commits and I need to integrate commits from the remote branch. This is effectively the same as calling rebase with the remote branch.

The reason for doing this is to prevent the remote branch becoming full of merge commits from merging into itself. Any merge conflicts I'll have to resolve anyway if I do a regular pull, so it's no extra effort on my part for this extra cleanliness.

Rebasing my local commits

I have a very rapid commiting philosophy when working on a piece of code locally, and will make many small, incremental commits once I've gotten something to a useful stage. The only rule is that each part should be commited in isolation—if I've updated a feature, some documentation, and the tests I'll make a commit for the feature change, a commit for the documentation change, and a commit for the test change.

Before I push these commits, I'll run back through them with git rebase -i and tidy them up into some logical units. For example, I'll take a series of commits that look like this:

Added specification for Foo widget
Added test plan for Foo widget
Added first pass of Foo widget
Added basic test suite for Foo widget
Added corner-case handling
Added additional corner-case handling
Updated logical flow to speed up case Y
Updated automated tests
Fixed linting issues
Added documentation for Foo widget
Reverted change X following discussion
Updated automated tests
Updated test plan

And squash them down to something like this:

Added specification for Foo widget
Added test plan for Foo widget
Added Foo widget and automated test suite
Added documentation for Foo widget

I've used the fact that I've got many logically different commits to easily squash them together into single related commits. I've even squashed together two different types of commit (feature code and automated tests) since it makes sense to group them together.

This not only makes more sense visually, but it also makes it vastly simpler for someone reviewing the code to step through each commit and be confident that they're not reviewing something that will change a few commits later because I've made an additional commit with something I forgot.

There are a couple of caveats with this approach—it's still limited in that if you push these commits then need to add something else, you're going to have to add a separate commit (with the exception of branches you have sole ownership over, in which case you can rebase the pushed commits). You also want to balance working on something with the frequency you push code, in the case where you can't complete the whole feature locally then push it once it's done.

Example Git configuration

Bringing together some of the configuration from above, here's a basic set of configuration you may find useful. You should replace the name and email with your relevant configuration from, for example, your GitHub account. Place this in your ~/.gitconfig file:

[user]
  name = Jane Doe
  email = foo@example.com

[core]
  autocrlf = input
  whitespace = blank-at-eol,blank-at-eof,tab-in-indent,-space-before-tab

[diff]
  algorithm = histogram