It is not exactly a new feature, but I find myself using the --sort option of git branch a lot more often than before these days.
The way I work is that I review patches that were mailed-in during previous night (my time) in the morning. For each promising new topic, I decide where the topic should eventually be merged (some are fixes that should go to older maintenance tracks, some are new features that we will not want to merge to the maintenance tracks), create a dedicated topic branch for it, apply these patches, re-review them once more and then test the changes in isolation. Each existing topics that is redone in response to previous reviews is handled the same way. Its branch is rewound and the new round of patches are applied instead.
After accumulating the new and updated topics that way without integrating with anything else, I'd often forget how many topics need to be integrated into the test branches (i.e. jch and pu), and I can do this:
$ git branch --no-merged pu --sort=-committerdate
This lists the topic branches that are not part of pu, which is the branch that is supposed to contain all the testable things, and sort them according to the commit date (i.e. the time I last touched it) of the tip of the topic branch. There often are topics that were once picked up, but turned out to be not ready even for the pu branch, and left around without getting merged to anywhere as a reminder for myself (otherwise, I'll forget pinging their authors about them), and they will sink in the older part of the output, while the freshly created and updated ones will float to the top of the output. This reminds me of the topics from the day that I need to reintegrate before starting the integration testing.
The --sort option appeared first in Git 2.7.0.
Another command that I use often these days is Michael Haggerty's when-merged script, available in his repository at GitHub. After finding a problematic line in the source and identifying the exact commit that introduced the line by using git blame, I can see when it landed in the mainline by doing this:
$ git when-merged $that_problematic_commit master | git name-rev --stdin
This gives the merge commit that brought in the commit as part of a topic to the mainline, and after that, it is just the matter of turning it into a revision name to find the oldest maintenance track that needs to be fixed, which is partially done by passing its output through the name-rev filter.
Tuesday, May 3, 2016
Thursday, October 15, 2015
Fun with recreating an evil merge
Sometimes we wish there were good ways to recreate a complex merge, replaying a previously resolved conflict resolution, and reapplying a previously done evil merge, of a side branch to an updated mainline.
For example, you have a side-branch that consists of two commits, A and B, and you create a merge with the mainline that ends with X, like so:
A---B
/ \
---o---O---X---M
For example, you have a side-branch that consists of two commits, A and B, and you create a merge with the mainline that ends with X, like so:
A---B
/ \
---o---O---X---M
resulting in a new merge commit M. When you created this merge, it could be that changes A and B overlapped (either textually or semantically) with what was done on the mainline since the side branch forked, i.e. what was done by X. Such an overlap, if it is textual, would result in a merge conflict. Perhaps X added a new line at the same place A and/or B added a different line, resulting in something like:
...
original line 1
<<<<<<< HEAD
line added by X
||||||| O (common ancestor)
=======
line added by A
>>>>>>> B
original line 2
...
which you may resolve, when recording M, to:
...
original line 1
line added by A
line added by X
original line 2
...
Expressed in "git show --cc" format, such a merge result would appear this way:
...
original line 1
+line added by A
+ line added by X
original line 2
...
A line with two leading spaces are common lines that both branches agree with, a line with plus at the first column is from the mainline and a line with plus at the second column is from the side branch.
If the overlap were not just textual but semantic, you may have to further update parts of files that did not textually conflict. For example, X may have renamed an existing function F to newF, while A or B added new callsites of F. Such a change is not likely to overlap textually, but in the merge result M, you would need to change the new calls you added to F to instead call newF. Such a change may look like this:
...
original line 1
+line added by A
+ line added by X
original line 2
...
-a new call to F() added by A
++a new call to newF() added by A
...
A line with minus at the second column is what was only in the side branch but that does not appear in the result (i.e. the side branch added the line, but the result does not have it). A line with two pluses at the beginning is what appears in the result but does not exist in either branch.
A merge that introduces such a line that did not exist in either branch is called an evil merge. It is something that no automated textual merge algorithm would have produced.
Now, while you were working on producing the merge M, the mainline may have progressed and gained a new commit Y. You would like to somehow take advantage of what you have already done when you created M to merge your side branch to the updated mainline to produce N:
.---A---B
/ \
---o---O---X---Y---N
A good news is that, when the evil merge is in a file that also has textual conflicts to resolve, "git rerere" will automatically take care of this situation. All you need to do is to set the configuration rerere.enabled to true before attempting the merge between X and B and recording their merge M, and then attempt a new merge between B and Y. Without even having to type "git rerere", the mechanism is invoked by "git merge" to replay the recorded resolution (which is where the name of the machinery "rerere" comes from). A bad news is that when an evil merge has to be made to a file that is not involved in any textual conflict (i.e. imagine the case where we didn't have "line added by A" vs "line added by X" conflict earlier in the same file in the above example), "rerere" does not even kick in. The question is what to do, knowing B, X, and M, to recreate N while keeping the adjustment needed for semantic conflicts to record M.
One naive approach would be to take a difference between X and M and apply it to Y. In the previous example, X would have looked like:
to compute the state we would have obtain by making the same move as going from X to M starting at Y, using the index and the working tree.
While that approach would work in simple case where Y does not do anything interesting, it would not work well in general. The most obvious case is when Y is actually a merge between X and A:
.---A---B
/ \ \
---o---O---X---Y---N
The difference between X and M would contain all that was done by A and B, in addition to what was done at M to adjust for textual and semantic conflicts. Replaying that on top of Y, which already contains what was done by A but not B, would end up duplicating what A did. At best, we will get a huge and uninteresting merge conflict. At worst, we will get the same code silently duplicated twice.
I think the right approach to recreate the (potentially evil) merge M is to consider M as two steps.
The first step is to merge X and B mechanically, and make a tree out of the mechanical merge result, with conflict markers and all. Call it T. The difference between T and M is what the person who made M did to adjust for textual and semantic conflicts.
A---B
/ \
---o---O---X---T-M
Then, you can think of the process of recreating N in a way similar to M was made as a similar two step process. The first step is to merge Y and B mechanically, and create a tree out of the mechanical merge result, and call it S. Applying the difference between T and M on top of S would give you the textual and semantic adjustments the same way "git rerere" replays the recorded resolution.
.---A---B
/ (\) \
---o---O---X---Y---S-N
This should work better whether Y is a merge with A.
$ git checkout X^0
$ git merge --no-commit B
$ git add -u
$ T=$(git write-tree)
$ git reset --hard Y^0
$ git merge --no-commit B
$ git add -u
$ S=$(git commit-tree $(git write-tree) -p HEAD -m S)
$ git checkout $S
$ git merge-recursive $T HEAD M
would compute the result using the index and the working tree, so after eyeballing the result and making sure it makes sense, the above can be concluded with a
$ git commit --amend
Of course, this article is only about outlining the idea. If this proves to be a viable approach, it would make sense to do these procedures inside "rebase --first-parent" or something.
/ \
---o---O---X---Y---N
A good news is that, when the evil merge is in a file that also has textual conflicts to resolve, "git rerere" will automatically take care of this situation. All you need to do is to set the configuration rerere.enabled to true before attempting the merge between X and B and recording their merge M, and then attempt a new merge between B and Y. Without even having to type "git rerere", the mechanism is invoked by "git merge" to replay the recorded resolution (which is where the name of the machinery "rerere" comes from). A bad news is that when an evil merge has to be made to a file that is not involved in any textual conflict (i.e. imagine the case where we didn't have "line added by A" vs "line added by X" conflict earlier in the same file in the above example), "rerere" does not even kick in. The question is what to do, knowing B, X, and M, to recreate N while keeping the adjustment needed for semantic conflicts to record M.
One naive approach would be to take a difference between X and M and apply it to Y. In the previous example, X would have looked like:
...
original line 1
line added by X
original line 2
...
and the difference between X and M would be (1) addition of "line added by A", (2) addition of "a new call to newF() added by A", and (3) any other change made by A and B that did not overlap with what X did. Implementation-wise, it is unlikely that we would do this as a "diff | patch" pipeline; most likely we would do it as a three-way merge, i.e.
$ git checkout Y^0
$ git merge-recursive X HEAD M
to compute the state we would have obtain by making the same move as going from X to M starting at Y, using the index and the working tree.
While that approach would work in simple case where Y does not do anything interesting, it would not work well in general. The most obvious case is when Y is actually a merge between X and A:
.---A---B
/ \ \
---o---O---X---Y---N
The difference between X and M would contain all that was done by A and B, in addition to what was done at M to adjust for textual and semantic conflicts. Replaying that on top of Y, which already contains what was done by A but not B, would end up duplicating what A did. At best, we will get a huge and uninteresting merge conflict. At worst, we will get the same code silently duplicated twice.
I think the right approach to recreate the (potentially evil) merge M is to consider M as two steps.
The first step is to merge X and B mechanically, and make a tree out of the mechanical merge result, with conflict markers and all. Call it T. The difference between T and M is what the person who made M did to adjust for textual and semantic conflicts.
A---B
/ \
---o---O---X---T-M
.---A---B
/ (\) \
---o---O---X---Y---S-N
$ git checkout X^0
$ git merge --no-commit B
$ git add -u
$ T=$(git write-tree)
$ git reset --hard Y^0
$ git merge --no-commit B
$ git add -u
$ S=$(git commit-tree $(git write-tree) -p HEAD -m S)
$ git checkout $S
$ git merge-recursive $T HEAD M
would compute the result using the index and the working tree, so after eyeballing the result and making sure it makes sense, the above can be concluded with a
$ git commit --amend
Of course, this article is only about outlining the idea. If this proves to be a viable approach, it would make sense to do these procedures inside "rebase --first-parent" or something.
Monday, July 27, 2015
Git 2.5
The latest feature release Git v2.5.0 is now available at the
usual places. It is comprised of 583 non-merge commits since
v2.4.0, contributed by 70 people, 21 of which are new faces.
One interesting change is to git help. We now list commands, grouped by the situation in which you would want to use them. This came from discussion on usability, inspired by one of the talks at GitMerge conference we had in spring.
Among notable new features, some of my favourites are:
One interesting change is to git help. We now list commands, grouped by the situation in which you would want to use them. This came from discussion on usability, inspired by one of the talks at GitMerge conference we had in spring.
Among notable new features, some of my favourites are:
- A new short-hand branch@{push} denotes the remote-tracking branch that tracks the branch at the remote the branch would be pushed to.
- git send-email learned the alias file format used by the sendmail program.
- Traditionally, external low-level 3-way merge drivers are expected to produce their results based solely on the contents of the three variants given in temporary files named by %O, %A and %B placeholders on their command line. They are now additionally told about the final path (given by %P).
- A heuristic we use to catch mistyped paths on the command line git cmd revs pathspec is to make sure that all the non-rev parameters in the later part of the command line are names of the files in the working tree, but that means git grep string -- \*.c must always be disambiguated with --, because nobody sane will create a file whose name literally is asterisk-dot-see. We loosen the heuristic to declare that with a wildcard string the user likely meant to give us a pathspec. So you can now simply say git grep string \*.c without --.
- Filter scripts were run with SIGPIPE disabled on the Git side, expecting that they may not read what Git feeds them to filter. We however treated a filter that does not read its input fully before exiting as an error. We no longer do and ignore EPIPE when writing to feed the filter scripts.
This changes semantics, but arguably in a good way. If a filter can produce its output without fully consuming its input using whatever magic, we now let it do so, instead of diagnosing it as a programming error. - Whitespace breakages in deleted and context lines can also be painted in the output of git diff and friends with the new --ws-error-highlight option.
- git merge FETCH_HEAD learned that the previous "git fetch" could be to create an Octopus merge, i.e. recording multiple branches that are not marked as "not-for-merge"; this allows us to lose an old style invocation git merge msg HEAD commits... in the implementation of git pull script; the old style syntax can now be deprecated (but not removed yet).
There are a few "experimental" new features, too. They are still incomplete and/or buggy around the edges and likely to change in the future, but nevertheless interesting.
- git cat-file --batch learned the --follow-symlinks option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax. For example, HEAD:RelNotes may be a symbolic link that points at Documentation/RelNotes/2.5.0.txt. With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead.
This is incomplete in at least a few ways.
(1) A symbolic link in the index, e.g. :RelNotes, should also be treated the same way, but isn't. (2) Non-batch mode, e.g. git cat-file --follow-symlinks blob HEAD:RelNotes, may also want to behave the same way, but it doesn't. - A replacement mechanism for contrib/workdir/git-new-workdir that does not rely on symbolic links and make sharing of objects and refs safer by making the borrowee and borrowers aware of each other has been introduced and accessible via git worktree add. This is accumulating more and more known bugs but may prove useful once they are fixed.
Monday, June 29, 2015
Fun with "git blame -s"
After applying a patch that moves a bulk of code that was placed in a wrong file to its correct place, a quick way to sanity-check that the patch does not introduce anything unexpected is to run "git blame -C -M" between HEAD^ and HEAD, like this:
$ git blame -C -M HEAD^..HEAD -- new-location.c
This should show that the lines moved from the old location in the output as coming from there; lines blamed for the new commit (i.e. not coming from the old location) can then be inspected more carefully to see if it makes sense.
One problem I had while doing exactly that today was that most of the screen real-estate on my 92-column wide terminal was taken by the author name and the timestamp, and I found myself pressing right and left arrow in my pager to scroll horizontally a lot, which was both frustrating and suboptimal.
$ git blame -h
told me that there is "git blame -s" to omit that information. I thought that I didn't know about the option. Running "git blame" on its source itself revealed that the option was added by me 8 years ago, and it wasn't that I didn't know but I simply forgot ;-)
$ git blame -C -M HEAD^..HEAD -- new-location.c
This should show that the lines moved from the old location in the output as coming from there; lines blamed for the new commit (i.e. not coming from the old location) can then be inspected more carefully to see if it makes sense.
One problem I had while doing exactly that today was that most of the screen real-estate on my 92-column wide terminal was taken by the author name and the timestamp, and I found myself pressing right and left arrow in my pager to scroll horizontally a lot, which was both frustrating and suboptimal.
$ git blame -h
told me that there is "git blame -s" to omit that information. I thought that I didn't know about the option. Running "git blame" on its source itself revealed that the option was added by me 8 years ago, and it wasn't that I didn't know but I simply forgot ;-)
Thursday, June 25, 2015
Git 2.4.5
The latest maintenance release for Git v2.4.x series has been tagged.
Enjoy.
- The setup code used to die when core.bare and core.worktree are set inconsistently, even for commands that do not need working tree.
- There was a dead code that used to handle git pull --tags and show special-cased error message, which was made irrelevant when the semantics of the option changed back in Git 1.9 days.
- color.diff.plain was a misnomer; give it color.diff.context as a more logical synonym.
- The configuration reader/writer uses mmap(2) interface to access the files; when we find a directory, it barfed with "Out of memory?".
- Recent git prune traverses young unreachable objects to safekeep old objects in the reachability chain from them, which sometimes showed unnecessary error messages that are alarming.
- git rebase -i fired post-rewrite hook when it shouldn't (namely, when it was told to stop sequencing with exec insn).
Enjoy.
Git 2.5-rc0 early preview
An early preview of the upcoming Git 2.5 has been tagged as v2.5.0-rc0. It is comprised of 492 non-merge commits since v2.4.0, contributed by 54 people, 17 of which are new faces.
Among notable new features, some of my favourites are:
- A new short-hand <branch>@{push} denotes the remote-tracking branch that tracks the branch at the remote the <branch> would be pushed to.
- A heuristic we use to catch mistyped paths on the command line git cmd revs pathspec is to make sure that all the non-rev parameters in the later part of the command line are names of the files in the working tree, but that means git grep string -- \*.c must always be disambiguated with --, because nobody sane will create a file whose name literally is asterisk-dot-see. We loosen the heuristic to declare that with a wildcard string the user likely meant to give us a pathspec. So you can now simply say git grep string \*.c without --.
- Filter scripts were run with SIGPIPE disabled on the Git side, expecting that they may not read what Git feeds them to filter. We however treated a filter that does not read its input fully before exiting as an error. We no longer do and ignore EPIPE when writing to feed the filter scripts.
This changes semantics, but arguably in a good way. If a filter can produce its output without fully consuming its input using whatever magic, we now let it do so, instead of diagnosing it as a programming error. - Whitespace breakages in deleted and context lines can also be painted in the output of git diff and friends with the new --ws-error-highlight option.
There are a few "experimental" new features, too. They are still incomplete and/or buggy around the edges and likely to change in the future, but nevertheless interesting.
- git cat-file --batch learned the --follow-symlinks option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax. For example, HEAD:RelNotes may be a symbolic link that points at Documentation/RelNotes/2.5.0.txt. With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead.
This is incomplete in a few ways.
(1) A symbolic link in the index, e.g. :RelNotes, should also be treated the same way, but isn't. (2) Non-batch mode, e.g. git cat-file --follow-symlinks blob HEAD:RelNotes, may also want to behave the same way, but it doesn't. - A replacement mechanism for contrib/workdir/git-new-workdir that does not rely on symbolic links and make sharing of objects and refs safer by making the borrowee and borrowers aware of each other has been introduced and accessible via git checkout --to. This is accumulating more and more known bugs but may prove useful once they are fixed.
A draft release notes is there.
Tuesday, May 26, 2015
Git 2.4.1 and 2.4.2
Today, the v2.4.2 maintenance release was tagged. Compared to v2.4.0 that was released end of April 2015 (i.e. last month), in addition to minor typo-fixes, documentation updates and trivial code clean-ups, today's maintenance release contains the following:
- The usual git diff, when seeing a file turning into a directory, showed a patchset to remove the file and create all files in the directory, but git diff --no-index simply refused to work. Also, when asked to compare a file and a directory, imitate POSIX diff and compare the file with the file with the same name in the directory, instead of refusing to run.
- The default $HOME/.gitconfig file created upon git config --global that edits it had incorrectly spelled user.name and user.email entries in it.
- git commit --date=now or anything that relies on approxidate lost the daylight-saving-time offset.
- git cat-file bl $blob failed to barf even though there is no object type that is "bl".
- Teach the codepaths that read .gitignore and .gitattributes files that these files encoded in UTF-8 may have UTF-8 BOM marker at the beginning; this makes it in line with what we do for configuration files already.
- Access to objects in repositories that borrow from another one on a slow NFS server unnecessarily got more expensive due to recent code becoming more cautious in a naive way not to lose objects to pruning.
- We avoid setting core.worktree when the repository location is the .git directory directly at the top level of the working tree, but the code misdetected the case in which the working tree is at the root level of the filesystem (which arguably is a silly thing to do, but still valid).
- git rev-list --objects $old --not --all to see if everything that is reachable from $old is already connected to the existing refs was very inefficient.
- hash-object --literally introduced in v2.2 was not prepared to take a really long object type name.
- git rebase --quiet was not quite quiet when there is nothing to do.
- The completion for log --decorate= parameter value was incorrect.
- filter-branch corrupted commit log message that ends with an incomplete line on platforms with some sed implementations that munge such a line. Work it around by avoiding to use sed.
- git daemon failed to build from the source under NO_IPV6 configuration (regression in 2.4).
- git stash pop/apply forgot to make sure that not just the working tree is clean but also the index is clean. The latter is important as a stash application can conflict and the index will be used for conflict resolution.
- We have prepended $GIT_EXEC_PATH and the path git is installed in (typically /usr/bin) to $PATH when invoking subprograms and hooks for almost eternity, but the original use case the latter tried to support was semi-bogus (i.e. install git as /opt/foo/git and run it without having /opt/foo on $PATH), and more importantly it has become less and less relevant as Git grew more mainstream (i.e. the users would want to have it on their $PATH). Stop prepending the path in which git is installed to users' $PATH, as that would interfere the command search order people depend on (e.g. they may not like versions of programs that are unrelated to Git in /usr/bin and want to override them by having different ones in /usr/local/bin and have the latter directory earlier in their $PATH).
Git hopefully continues to improve.
Have fun.
Saturday, April 25, 2015
Fun with failing cherry-pick
I just encountered an interesting cherry-pick failure.
The change I was trying to cherry-pick was to remove a hunk of text. Its patch conceptually looked like this:
@@ ... @@
A
-B
C
even though the pre-context A, removed text B, and post-context C are all multi-line block.
After doing a significant rewrite to the same original codebase (i.e. that had A, B and then C next to each other), the code I wanted to cherry-pick the above commit moved the text around and the block corresponding to B is now done a lot later. A diff between that state and the original perhaps looked like this:
@@ ... @@
A
-B
C
@@ ... @@
D
+B
E
And cherry-picking the above change succeeded without doing anything (!?!?).
Logically, this behaviour "makes sense", in the sense that it can be explained. The change wants to make A and C adjacent by removing B, and the three-way merge noticed that the updated codebase already had that removal, so there is nothing that needs to be done. In this particular case, I did not remove B but moved it elsewhere, so what cherry-pick did was wrong, but in other cases I may indeed have removed it without adding the equivalent to anywhere else, so it could have been correct. We simply cannot say. I wonder if we should at least flag this "both sides appear to have removed" case as conflicting, but I am not sure how that should be implemented (let alone implemented efficiently). After all, the moved block B might have gone to a completely different file. Would we scan for the matching block of text for the entire working tree?
This is why you should always look at the output from "git show" for the commit being cherry-picked and the output from "git diff HEAD" before concluding the cherry-pick to see if anything is amiss.
The change I was trying to cherry-pick was to remove a hunk of text. Its patch conceptually looked like this:
@@ ... @@
A
-B
C
even though the pre-context A, removed text B, and post-context C are all multi-line block.
After doing a significant rewrite to the same original codebase (i.e. that had A, B and then C next to each other), the code I wanted to cherry-pick the above commit moved the text around and the block corresponding to B is now done a lot later. A diff between that state and the original perhaps looked like this:
@@ ... @@
A
-B
C
@@ ... @@
D
+B
E
And cherry-picking the above change succeeded without doing anything (!?!?).
Logically, this behaviour "makes sense", in the sense that it can be explained. The change wants to make A and C adjacent by removing B, and the three-way merge noticed that the updated codebase already had that removal, so there is nothing that needs to be done. In this particular case, I did not remove B but moved it elsewhere, so what cherry-pick did was wrong, but in other cases I may indeed have removed it without adding the equivalent to anywhere else, so it could have been correct. We simply cannot say. I wonder if we should at least flag this "both sides appear to have removed" case as conflicting, but I am not sure how that should be implemented (let alone implemented efficiently). After all, the moved block B might have gone to a completely different file. Would we scan for the matching block of text for the entire working tree?
This is why you should always look at the output from "git show" for the commit being cherry-picked and the output from "git diff HEAD" before concluding the cherry-pick to see if anything is amiss.
Thursday, April 2, 2015
First release candidate for Git 2.4
This release has a few changes in the user-visible output from Porcelain commands. These are not meant to be parsed by scripts, but the users still may want to be aware of the changes.
- Output from "git log --decorate" (and "%d" format specifier used in the userformat "--format=<string>" parameter "git log" family of commands take) used to list "HEAD" just like other branch names, separated with a comma in between. E.g.
$ git log --decorate -1 master
commit bdb0f6788fa5e3cacc4315e9ff318a27b2676ff4 (HEAD, master)
...
This release updates the output slightly when HEAD refers to the tip of a branch whose name is also shown in the output. The above is shown as:
$ git log --decorate -1 master
commit bdb0f6788fa5e3cacc4315e9ff318a27b2676ff4 (HEAD -> master)
...
- The phrasing "git branch" uses to describe a detached HEAD has been updated to match that of "git status". When the HEAD is at the same commit as it was originally detached, they now both show "detached at <commit object name>". When the HEAD has moved since it was originally detached, they now both show "detached from <commit object name>". Earlier "git branch" always used "from", even when the user hasn't moved HEAD since it was detached.
Otherwise, there are only minor fixes and documentation updates everywhere, and unusually low number of new and shiny toys ;-)
- "git log --invert-grep --grep=WIP" will show only commits that do not have the string "WIP" in their messages.
- "git push" has been taught a "--atomic" option that makes push to update more than one ref an "all-or-none" affair.
- Extending the "push to deploy" added in 2.3, the behaviour of "git push" when updating the branch that is checked out can now be tweaked by push-to-checkout hook. The "push to deploy" implementation in 2.3 has a bug that makes it impossible to bootstrap an empty repository (or an unborn branch), but it can be worked around by using this hook.
- "git send-email" used to accept a mistaken "y" (or "yes") as an answer to "What encoding do you want to use [UTF-8]? " without questioning. Now it asks for confirmation when the answer looks too short to be a valid encoding name.
- "git archive" can now be told to set the 'text' attribute in the resulting zip archive.
- "git -C '' subcmd" used to refuse to work in the current directory, unlike "cd ''" which silently behaves as a no-op.
- The versionsort.prerelease configuration variable can be used to specify that v1.0-pre1 comes before v1.0.
- A new "push.followTags" configuration turns the "--follow-tags" option on by default for the "git push" command.
Thanks.
Monday, March 30, 2015
Fun with Non-Fast-Forward
Your push may fail due to “non fast-forward”. You start from a history that is identical to that of your upstream, commit your work on top of it, and then by the time you attempt to push it back, the upstream may have advanced because somebody else was also working on his own changes.
For example, between the upstream and your repositories, histories may diverge this way (the asterisk denotes the tip of the branch; the time flows from left to right as usual):
Upstream You
---A---B---C* --- fetch --> ---A---B---C*
D*
/
---A---B---C---E* ---A---B---C
D? D*
/ /
---A---B---C---E? <-- push --- ---A---B---C
If the push moved the branch at the upstream to point at your commit, you will be discarding other people’s work. To avoid doing so, git push fails with “Non fast-forward”.
The standard recommendation when this happens is to “fetch, merge and then push back”. The histories will diverge and then converge like this:
Upstream You
D*
/
---A---B---C---E* --- fetch --> ---A---B---C---E
1
D---F*
/ /2
---A---B---C---E* ---A---B---C---E
1 1
D---F* D---F*
/ /2 / /2
---A---B---C---E <-- push --- ---A---B---C---E
Now, the updated tip of the branch has the previous tip of the upstream (E) as its parent, so the overall history does not lose other people’s work.
The resulting history, however, is not what the majority of the project participants would appreciate. The merge result records D as its first parent (denoted with 1 on the edge to the parent), as if what happened on the upstream (E) were done as a side branch while F was being prepared and pushed back. In reality, E in the illustration may not be a single commit but can be many commits and many merges done by many people, and these many commits may have been observed as the tips of the upstream’s history by many people before F got pushed.
Even though Git treats all parents of a merge equally at the level of the underlying data model, the users have come to expect that the history they will see by following the first-parent chain tells the overall picture of the shared project history, while second and later parents of merges represent work done on side branches. From this point of view, what "fetch, merge and then push" is not quite a right suggestion to proceed from a failed push due to "non fast-forward".
Even though Git treats all parents of a merge equally at the level of the underlying data model, the users have come to expect that the history they will see by following the first-parent chain tells the overall picture of the shared project history, while second and later parents of merges represent work done on side branches. From this point of view, what "fetch, merge and then push" is not quite a right suggestion to proceed from a failed push due to "non fast-forward".
It is tempting to recommend “fetch, merge backwards and then push back” as an alternative, and it almost works for a simple history:
Upstream You
D*
/
---A---B---C---E* --- fetch --> ---A---B---C---E
2
D---F*
/ /1
---A---B---C---E* ---A---B---C---E
2 2
D---F* D---F*
/ /1 / /1
---A---B---C---E <-- push --- ---A---B---C---E
Then, if you follow the first-parent chain of the history, you will see how the tip of the overall project progressed. This is an improvement over the “fetch, merge and then push back”, but it has a few problems.
It also does not solve the problem if the work you did in D is not a single strand of pearls, but has merges from side branches. If D in the above series of illustrations were a few merges X, Y and Z from side branches of independent topics, the picture on your side, after fetching E from the updated upstream, may look like this:
One reason why “merge backwards” is wrong becomes apparent when you consider what should happen when the push fails for the second time after the backward merge is made:
Upstream You
D*
/
---A---B---C---E* --- fetch --> ---A---B---C---E
2
D---F*
/ /1
---A---B---C---E* ---A---B---C---E
2 2
D---F? D---F*
/ /1 / /1
---A---B---C---E---G <-- push --- ---A---B---C---E
2 2 2
D---F? D---F---H*
/ /1 / /1 /1
---A---B---C---E---G --- fetch --> ---A---B---C---E---G
If the upstream side gained another commit G while F was being prepared, “fetch, merge backwards and then push” will end up creating a history like this, hiding D, the only real change you did in the repository, as the tip of the side branch of a side branch!
It also does not solve the problem if the work you did in D is not a single strand of pearls, but has merges from side branches. If D in the above series of illustrations were a few merges X, Y and Z from side branches of independent topics, the picture on your side, after fetching E from the updated upstream, may look like this:
y---y---y .
/ \ .
. x---x \ \
. / \ \ \
. / X---Y---Z*
/ /
---A---B---C---E
That is, hoping that the other people will stay quiet, starting from C, you merged three independent topic branches on top of it with merges X, Y and Z, and hoped that the overall project history would fast-forward to Z. From your perspective, you wanted to make A-B-C-X-Y-Z to be the main history of the project, while x, y, ... were implementation details of X, Y and Z that are hidden behind merges on side branches. And if there were no E, that would indeed have been the overall project history people would have seen after your push.
Merging backwards and pushing back would however make the history’s tip F, with its first parent E, and Z becomes a side branch. The fact that X, Y and Z (more precisely, X^2 and Y^2 and Z^2) were independent topics is lost by doing so:
Merging backwards and pushing back would however make the history’s tip F, with its first parent E, and Z becomes a side branch. The fact that X, Y and Z (more precisely, X^2 and Y^2 and Z^2) were independent topics is lost by doing so:
y---y---y .
/ \ .
. x---x \ \
. / \ \ \
. / X---Y---Z
/ / \2
---A---B---C---E-------F*
1
So "merge backwards" is not a right solution in general. It is only valid if you are building a topic directly on top of the shared integration branch, which is something you should not be doing in the first place. In the earlier illustration of creating a single D on top of C and pushing it, if there were no work from other people (i.e. E), the push would have fast-forwarded, making D as a normal commit directly on the first-parent chain. If there were work from other people like E, “merge in reverse” would instead have recorded D on a side branch. If D is a topic separate and independent from other work being done in parallel, you would consistently want to see such a change appear as a merge of a side branch.
A better recommendation might be to “fetch, rebuild the first-parent chain, and then push back”. That is, you would rebuild X, Y and Z (i.e. “git log --first-parent C..”) on top of the updated upstream E:
y---y-------y .
/ \ .
. x-------x \ \
. / \ \ \
. / X’--Y’--Z’*
/ /
---A---B---C---E
Note that this will work well naturally even when your first-parent chain has non-merge commits. For example, X and Y in the above illustration may be merges while Z is a regular commit that updates the release notes with descriptions of what was recently merged (i.e. X and Y). Rebuilding such a first-parent chain on top of E will make the resulting history very easy to understand when the reader follows the first-parent chain.
The reason why “rebuild the first-parent chain on the updated upstream” works the best is tautological. People do care about the first-parenthood when viewing the history, and you must have cared about the first-parent chain, too, when building your history leading to Z. That first-parenthood you and others care about is what is being preserved here. By definition, we cannot go wrong ;-)
The reason why “rebuild the first-parent chain on the updated upstream” works the best is tautological. People do care about the first-parenthood when viewing the history, and you must have cared about the first-parent chain, too, when building your history leading to Z. That first-parenthood you and others care about is what is being preserved here. By definition, we cannot go wrong ;-)
And of course, this will work against a moving upstream that gained new commits while we were fixing things up on our end, because we won't be piling a new merges on top, but will be rebuilding X', Y' and Z' into X'', Y'', and Z'' instead.
To make this work on the pusher’s end, after seeing the initial “non fast-forward” refusal from “git push”, the pusher may need to do something like this:
$ git push ;# fails
$ git fetch
$ git rebase --first-parent @{upstream}
Note that “git rebase --first-parent” does not exist yet; it is one of the topics I would like to see resurrected from old discussions.
But before "rebase --first-parent" materialises, in the scenario illustrated above, the pusher can do these instead of that command:
But before "rebase --first-parent" materialises, in the scenario illustrated above, the pusher can do these instead of that command:
$ git reset --hard @{upstream}
$ git merge X^2
$ git merge Y^2
$ git merge Z^2
And then, inspect the result thoroughly. As carefully as you checked your work before you attempted your first push that was rejected. After that, hopefully your history will fast-forward the upstream and everybody will be happy.
Subscribe to:
Posts (Atom)