If you want to make sense of your git history more easily when you go
back, try rebasing as you go. Rebasing while collaborating with others
can get hairy quickly. But here’s one way to make it work.
There are manywaystousegit. It’s sometimes said
that git is not actually a version control system, but rather a
content-addressable filesystem, used to track directory trees. So,
given that it’s only a version control system inasmuch as you impose a
methodology on it, it’s worth it to play with some different
methodologies.
One common way to use git is with a lot of merging between branches, but
if you don’t like the tangled web of history that creates, I’m going to
propose a rebase-centric model. It’s a trade-off: a little harder to
make, but a lot easier to understand when you need to come back to it
later.
Now that you have enough links to take you away from this post for a day
at least, let’s get started.
First, let’s unpack what I said above. Be prepared for some graph theory
terms. You’ve seen these sorts of charts in every bit of git
documentation you look at, right? :
This is a simple directed acyclic graph. Each node (except for D, the
root node) has one or more parents. No node is an ancestor of itself.
But in git, these nodes aren’t just abstractions: they carry data.
Specifically, they carry change sets and commit messages.
Because there are changesets in the nodes, that implies that a traversal
from the root node to any other node will result in a description of a
certain end-state for the files in the repository, and that a traversal
to the terminal node marked with a particular branch will result in a
description of the files in the latest state of that branch. If the
nodes were re-arranged, that could change the resulting state, depending
on the particular changesets involved.
When you rebase, you “rewrite history” and transform, for example, this
DAG:
The commits in the topic branch are labeled with prime signs after the
rebase to denote that they are, in a sense, the “same” commits, but
because they have a different ancestry, describe a different set of
states – specifically, states that incorporate commits F and G. If there
are conflicts between the rebased commits and their new ancestry, git
will prompt you to resolve those before continuing the rebase.
All of this, so far, is just to explain rebasing so you can feel really
comfortable with it. Rewriting your own history is fine and safe. But
what we really care about is rewriting history that other people can
access.
Pretty much everything written about git warns you not to rebase any
commits that are shared with other people or systems. It is presented as
a cardinal sin and something that will get you shunned by all serious
git users. So, let’s talk about how to do it!
The fundamental reason you’re enjoined against doing it is that it
generates more work for everyone else to get their local repositories
into a good state again. That is not something I mean in any way to
contradict. If you are going to be pushing rewritten history to a shared
remote, you must have everyone on-board and with a common
understanding of the process you’re using and how to keep their local
histories in order.
I find that the best way to avoid nasty situations with rewritten
history consists of two principles:
Maintain sole ownership of branches: only one person writes to a
given branch, and if you need to collaborate, pull requests into the
other person’s branch are the way to go.
Maintain a strong distinction between read-only upstream branches
and writable feature branches.
These principles won’t work for every team and every situation, so you
have to decide if this approach is right for you.
Let’s see what this looks like in practice.
Let’s say Bao and Robin are collaborating on a project, through GitHub.
There are many ways to manage the remotes, but we’ll take a simple
approach right now: they each have a copy of the repo on their own
computers, and they are sharing one repo on GitHub. When Bao runs
git remote -v, they get:
And Robin would see the same. They can each push to that remote, too,
but they have an understanding that they won’t push branches that don’t
belong to them.
As they work on the project, they will end up with branches that look
like this:
Black commits are on master, blue are by Bao, coral are by Robin. They
both start their work off of the current state of master. Bao has a
sort of short and straightforward branch. Robin is working on something
longer and more complex, and Bao jumps in to help a little, branching
off of Robin’s branch when it’s at commit G, while Robin keeps working
on it. Bao will eventually pull-request their changes back in.
But what if Robin realizes that their work really depends on the work
Bao did and already merged into master? They can just rebase their
branch on to that work:
On Bao’s machine, this looks like:
$ git fetch
$ # Update local information on upstream branch.
$ # Because Bao *never* commits on this branch, every merge
$ # should be a fast-forward merge, but let's use --ff-only
$ # just to be sure:
$ git checkout robin-feature &&git merge --ff-only
$ # See what the world looks like at this moment:
$ git checkout bao-fix
$ git branch -vv
* bao-fix ca1f618 [robin-feature: ahead 4, behind 9] short message
robin-feature fc58298 [origin/robin-feature] short message
master d1ef2a3 [origin/master] Merge Bao's work
This leaves Bao’s branch attached to the old commit G, which had been in
Robin’s branch before the rebase. But because Bao’s branch tracks
Robin’s branch as its upstream, Bao can, with fresh remote tracking info
on their local computer, just run git rebase and get this:
Again, on Bao’s machine:
$ # Because the bao-fix branch has robin-feature as an upstream:
$ git rebase
$ git branch -vv
* bao-fix ca1f618 [robin-feature: ahead 2] short message
robin-feature fc58298 [origin/robin-feature] short message
master d1ef2a3 [origin/master] Merge Bao's work
Note that Bao’s work is now coming off of K’, not G (or G’). This is
because it tracks Robin’s branch as its upstream, not a specific
commit in that branch. Since branches can wholly change what commits
they consist of, this is both necessary and useful.
Bao can force-push that back up to GitHub (because only Bao writes to
that branch, this is safe), Robin can merge it in, and then the whole
thing can be merged back into master:
Note: if anyone has merge conflicts at any point, they have to resolve
them, and those rebased commits (with the prime marks) can differ from
their original renditions by whatever it takes to resolve that merge
conflict.
This approach can make your git history much more useful as a
historical artifact, as a way to see not just what the code was like at
any given point, but what the intent of that state of the code was. But
unless everyone on the team is on board with this, and understands it,
you risk the proverbial shooting yourself in the foot.
If you use git, you should be familiar with how to use the reflog to
back yourself out of ah-damn-what-did-I-just-do situations. That is
still true here; if you are not perfectly comfortable with rebase,
having a way to undo is crucial.
There are some git defaults you may want to set to make this pattern
easier:
Honestly, they’re useful defaults to set in any case! (Hat-tip to Owen
for these defaults, and a lot of thinking about git!)
Try it out on a small project, and see if you like it. It’s even better
if you write good commit messages. If you don’t go spelunking through
your git history often to understand past choices, then let this be your
excuse.
We actually use a merge-centric flow, the kind you are likely more
familiar with! We avoid pushing rebased branches to remotes, and don’t
have strong ownership of branches. If more than one of us is working on
something at a time, whoever pushed to the remote first will have their
work merged in to whatever anyone else is doing, before that other
person pushes the result of their merged work to the remote.
This is convenient enough for us, but it really is motivated by the
shape of our team; we have specialists all along the
design-frontend-backend continuum, and frequently want to rapidly
intertwine our work many times in a day as we coordinate changes all
along that spectrum.
Still, some of us have used this rebase model successfully before and
think it’s worth trying out!
For many years, it has been ‘best practice’ to use relative units (especially em and rem) for sizing text. That’s great! But after playing around with my user preferences, I think we can improve on the common approaches.
It is frustrating to track down why an anchor isn’t being found. I’ve found a simple way that should work in most cases. If that doesn’t work, step through the checklist, and then dive in to get a better understanding of how Anchor Positioning works.