Monday, March 30, 2015

Fun with Non-Fast-Forward

Your push may fail due to “non fast-forward”. You start from a history that is identical to that of your upstream, commit your work on top of it, and then by the time you attempt to push it back, the upstream may have advanced because somebody else was also working on his own changes.


For example, between the upstream and your repositories, histories may diverge this way (the asterisk denotes the tip of the branch; the time flows from left to right as usual):


Upstream                                You


---A---B---C*      --- fetch -->        ---A---B---C*


                                                    D*
                                                   /
---A---B---C---E*                       ---A---B---C


            D?                                      D*
           /                                       /
---A---B---C---E?   <-- push ---        ---A---B---C


If the push moved the branch at the upstream to point at your commit, you will be discarding other people’s work. To avoid doing so, git push fails with “Non fast-forward”.


The standard recommendation when this happens is to “fetch, merge and then push back”. The histories will diverge and then converge like this:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       1
                                                    D---F*
                                                   /   /2
---A---B---C---E*                       ---A---B---C---E


               1                                       1
            D---F*                                  D---F*
           /   /2                                  /   /2
---A---B---C---E    <-- push ---        ---A---B---C---E


Now, the updated tip of the branch has the previous tip of the upstream (E) as its parent, so the overall history does not lose other people’s work.


The resulting history, however, is not what the majority of the project participants would appreciate. The merge result records D as its first parent (denoted with 1 on the edge to the parent), as if what happened on the upstream (E) were done as a side branch while F was being prepared and pushed back. In reality, E in the illustration may not be a single commit but can be many commits and many merges done by many people, and these many commits may have been observed as the tips of the upstream’s history by many people before F got pushed.

Even though Git treats all parents of a merge equally at the level of the underlying data model, the users have come to expect that the history they will see by following the first-parent chain tells the overall picture of the shared project history, while second and later parents of merges represent work done on side branches. From this point of view, what "fetch, merge and then push" is not quite a right suggestion to proceed from a failed push due to "non fast-forward".


It is tempting to recommend “fetch, merge backwards and then push back” as an alternative, and it almost works for a simple history:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       2
                                                    D---F*
                                                   /   /1
---A---B---C---E*                       ---A---B---C---E


               2                                       2
            D---F*                                  D---F*
           /   /1                                  /   /1
---A---B---C---E    <-- push ---        ---A---B---C---E


Then, if you follow the first-parent chain of the history, you will see how the tip of the overall project progressed. This is an improvement over the “fetch, merge and then push back”, but it has a few problems.


One reason why “merge backwards” is wrong becomes apparent when you consider what should happen when the push fails for the second time after the backward merge is made:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       2
                                                    D---F*
                                                   /   /1
---A---B---C---E*                       ---A---B---C---E


               2                                       2
            D---F?                                  D---F*
           /   /1                                  /   /1
---A---B---C---E---G    <-- push ---    ---A---B---C---E


               2                                       2   2
            D---F?                                  D---F---H*
           /   /1                                  /   /1  /1
---A---B---C---E---G    --- fetch -->   ---A---B---C---E---G


If the upstream side gained another commit G while F was being prepared, “fetch, merge backwards and then push” will end up creating a history like this, hiding D, the only real change you did in the repository, as the tip of the side branch of a side branch!

It also does not solve the problem if the work you did in D is not a single strand of pearls, but has merges from side branches. If D in the above series of illustrations were a few merges X, Y and Z from side branches of independent topics, the picture on your side, after fetching E from the updated upstream, may look like this:


    y---y---y   .
   /         \   .
  .   x---x   \   \
 .   /     \   \   \
.   /       X---Y---Z*
   /       /
---A---B---C---E


That is, hoping that the other people will stay quiet, starting from C, you merged three independent topic branches on top of it with merges X, Y and Z, and hoped that the overall project history would fast-forward to Z. From your perspective, you wanted to make A-B-C-X-Y-Z to be the main history of the project, while x, y, ... were implementation details of X, Y and Z that are hidden behind merges on side branches. And if there were no E, that would indeed have been the overall project history people would have seen after your push.

Merging backwards and pushing back would however make the history’s tip F, with its first parent E, and Z becomes a side branch. The fact that X, Y and Z (more precisely, X^2 and Y^2 and Z^2) were independent topics is lost by doing so:


    y---y---y   .
   /         \   .
  .   x---x   \   \
 .   /     \   \   \
.   /       X---Y---Z
   /       /         \2
---A---B---C---E-------F*
                     1



So "merge backwards" is not a right solution in general. It is only valid if you are building a topic directly on top of the shared integration branch, which is something you should not be doing in the first place. In the earlier illustration of creating a single D on top of C and pushing it, if there were no work from other people (i.e. E), the push would have fast-forwarded, making D as a normal commit directly on the first-parent chain. If there were work from other people like E, “merge in reverse” would instead have recorded D on a side branch. If D is a topic separate and independent from other work being done in parallel, you would consistently want to see such a change appear as a merge of a side branch.

A better recommendation might be to “fetch, rebuild the first-parent chain, and then push back”. That is, you would rebuild X, Y and Z (i.e. “git log --first-parent C..”) on top of the updated upstream E:


    y---y-------y   .
   /             \   .
  .   x-------x   \   \
 .   /         \   \   \
.   /           X’--Y’--Z’*
   /           /
---A---B---C---E


Note that this will work well naturally even when your first-parent chain has non-merge commits. For example, X and Y in the above illustration may be merges while Z is a regular commit that updates the release notes with descriptions of what was recently merged (i.e. X and Y). Rebuilding such a first-parent chain on top of E will make the resulting history very easy to understand when the reader follows the first-parent chain.

The reason why “rebuild the first-parent chain on the updated upstream” works the best is tautological. People do care about the first-parenthood when viewing the history, and you must have cared about the first-parent chain, too, when building your history leading to Z. That first-parenthood you and others care about is what is being preserved here. By definition, we cannot go wrong ;-)

And of course, this will work against a moving upstream that gained new commits while we were fixing things up on our end, because we won't be piling a new merges on top, but will be rebuilding X', Y' and Z' into X'', Y'', and Z'' instead.

To make this work on the pusher’s end, after seeing the initial “non fast-forward” refusal from “git push”, the pusher may need to do something like this:


$ git push ;# fails
$ git fetch
$ git rebase --first-parent @{upstream}


Note that “git rebase --first-parent” does not exist yet; it is one of the topics I would like to see resurrected from old discussions.

But before "rebase --first-parent" materialises, in the scenario illustrated above, the pusher can do these instead of that command:


$ git reset --hard @{upstream}
$ git merge X^2
$ git merge Y^2
$ git merge Z^2


And then, inspect the result thoroughly. As carefully as you checked your work before you attempted your first push that was rejected. After that, hopefully your history will fast-forward the upstream and everybody will be happy.


No comments: