Monday, June 29, 2015

Fun with "git blame -s"

After applying a patch that moves a bulk of code that was placed in a wrong file to its correct place, a quick way to sanity-check that the patch does not introduce anything unexpected is to run "git blame -C -M" between HEAD^ and HEAD, like this:

  $ git blame -C -M HEAD^..HEAD -- new-location.c

This should show that the lines moved from the old location in the output as coming from there; lines blamed for the new commit (i.e. not coming from the old location) can then be inspected more carefully to see if it makes sense.

One problem I had while doing exactly that today was that most of the screen real-estate on my 92-column wide terminal was taken by the author name and the timestamp, and I found myself pressing right and left arrow in my pager to scroll horizontally a lot, which was both frustrating and suboptimal.

  $ git blame -h

told me that there is "git blame -s" to omit that information. I thought that I didn't know about the option. Running "git blame" on its source itself revealed that the option was added by me 8 years ago, and it wasn't that I didn't know but I simply forgot ;-)

Thursday, June 25, 2015

Git 2.4.5

The latest maintenance release for Git v2.4.x series has been tagged.
  • The setup code used to die when core.bare and core.worktree are set inconsistently, even for commands that do not need working tree.
  • There was a dead code that used to handle git pull --tags and show special-cased error message, which was made irrelevant when the semantics of the option changed back in Git 1.9 days.
  • color.diff.plain was a misnomer; give it color.diff.context as a more logical synonym.
  • The configuration reader/writer uses mmap(2) interface to access the files; when we find a directory, it barfed with "Out of memory?".
  • Recent git prune traverses young unreachable objects to safekeep old objects in the reachability chain from them, which sometimes showed unnecessary error messages that are alarming.
  • git rebase -i fired post-rewrite hook when it shouldn't (namely, when it was told to stop sequencing with exec insn).
It also contains typofixes, documentation updates and trivial code clean-ups.


Git 2.5-rc0 early preview

An early preview of the upcoming Git 2.5 has been tagged as v2.5.0-rc0. It is comprised of 492 non-merge commits since v2.4.0, contributed by 54 people, 17 of which are new faces.

Among notable new features, some of my favourites are:

  • A new short-hand <branch>@{push} denotes the remote-tracking branch that tracks the branch at the remote the <branch> would be pushed to.
  • A heuristic we use to catch mistyped paths on the command line git cmd revs pathspec is to make sure that all the non-rev parameters in the later part of the command line are names of the files in the working tree, but that means git grep string -- \*.c must always be disambiguated with --, because nobody sane will create a file whose name literally is asterisk-dot-see.  We loosen the heuristic to declare that with a wildcard string the user likely meant to give us a pathspec. So you can now simply say git grep string \*.c without --.
  • Filter scripts were run with SIGPIPE disabled on the Git side, expecting that they may not read what Git feeds them to filter. We however treated a filter that does not read its input fully before exiting as an error.  We no longer do and ignore EPIPE when writing to feed the filter scripts.
    This changes semantics, but arguably in a good way.  If a filter can produce its output without fully consuming its input using whatever magic, we now let it do so, instead of diagnosing it as a programming error.
  • Whitespace breakages in deleted and context lines can also be painted in the output of git diff and friends with the new --ws-error-highlight option.
There are a few "experimental" new features, too. They are still incomplete and/or buggy around the edges and likely to change in the future, but nevertheless interesting.
  • git cat-file --batch learned the --follow-symlinks option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax.  For example, HEAD:RelNotes may be a symbolic link that points at Documentation/RelNotes/2.5.0.txt.  With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead.
    This is incomplete in a few ways.
    (1) A symbolic link in the index, e.g. :RelNotes, should also be treated the same way, but isn't. (2) Non-batch mode, e.g. git cat-file --follow-symlinks blob HEAD:RelNotes, may also want to behave the same way, but it doesn't.
  • A replacement mechanism for contrib/workdir/git-new-workdir that does not rely on symbolic links and make sharing of objects and refs safer by making the borrowee and borrowers aware of each other has been introduced and accessible via git checkout --to. This is accumulating more and more known bugs but may prove useful once they are fixed.
A draft release notes is there.

Tuesday, May 26, 2015

Git 2.4.1 and 2.4.2

Today, the v2.4.2 maintenance release was tagged. Compared to v2.4.0 that was released end of April 2015 (i.e. last month), in addition to minor typo-fixes, documentation updates and trivial code clean-ups, today's maintenance release contains the following:
  • The usual git diff, when seeing a file turning into a directory, showed a patchset to remove the file and create all files in the directory, but git diff --no-index simply refused to work.  Also, when asked to compare a file and a directory, imitate POSIX diff and compare the file with the file with the same name in the directory, instead of refusing to run.
  • The default $HOME/.gitconfig file created upon git config --global that edits it had incorrectly spelled and entries in it.
  • git commit --date=now or anything that relies on approxidate lost the daylight-saving-time offset.
  • git cat-file bl $blob failed to barf even though there is no object type that is "bl".
  • Teach the codepaths that read .gitignore and .gitattributes files that these files encoded in UTF-8 may have UTF-8 BOM marker at the beginning; this makes it in line with what we do for configuration files already.
  • Access to objects in repositories that borrow from another one on a slow NFS server unnecessarily got more expensive due to recent code becoming more cautious in a naive way not to lose objects to pruning.
  • We avoid setting core.worktree when the repository location is the .git directory directly at the top level of the working tree, but the code misdetected the case in which the working tree is at the root level of the filesystem (which arguably is a silly thing to do, but still valid).
  • git rev-list --objects $old --not --all to see if everything that is reachable from $old is already connected to the existing refs was very inefficient.
  • hash-object --literally introduced in v2.2 was not prepared to take a really long object type name.
  • git rebase --quiet was not quite quiet when there is nothing to do.
  • The completion for log --decorate= parameter value was incorrect.
  • filter-branch corrupted commit log message that ends with an incomplete line on platforms with some sed implementations that munge such a line.  Work it around by avoiding to use sed.
  • git daemon failed to build from the source under NO_IPV6 configuration (regression in 2.4).
  • git stash pop/apply forgot to make sure that not just the working tree is clean but also the index is clean. The latter is important as a stash application can conflict and the index will be used for conflict resolution.
  • We have prepended $GIT_EXEC_PATH and the path git is installed in (typically /usr/bin) to $PATH when invoking subprograms and hooks for almost eternity, but the original use case the latter tried to support was semi-bogus (i.e. install git as /opt/foo/git and run it without having /opt/foo on $PATH), and more importantly it has become less and less relevant as Git grew more mainstream (i.e. the users would want to have it on their $PATH).  Stop prepending the path in which git is installed to users' $PATH, as that would interfere the command search order people depend on (e.g. they may not like versions of programs that are unrelated to Git in /usr/bin and want to override them by having different ones in /usr/local/bin and have the latter directory earlier in their $PATH).
Git hopefully continues to improve.
Have fun.

Saturday, April 25, 2015

Fun with failing cherry-pick

I just encountered an interesting cherry-pick failure.

The change I was trying to cherry-pick was to remove a hunk of text. Its patch conceptually looked like this:

@@ ... @@

even though the pre-context A, removed text B, and post-context C are all multi-line block.
After doing a significant rewrite to the same original codebase (i.e. that had A, B and then C next to each other), the code I wanted to cherry-pick the above commit moved the text around and the block corresponding to B is now done a lot later. A diff between that state and the original perhaps looked like this:

@@ ... @@
@@ ... @@

And cherry-picking the above change succeeded without doing anything (!?!?).

Logically, this behaviour "makes sense", in the sense that it can be explained. The change wants to make A and C adjacent by removing B, and the three-way merge noticed that the updated codebase already had that removal, so there is nothing that needs to be done. In this particular case, I did not remove B but moved it elsewhere, so what cherry-pick did was wrong, but in other cases I may indeed have removed it without adding the equivalent to anywhere else, so it could have been correct. We simply cannot say. I wonder if we should at least flag this "both sides appear to have removed" case as conflicting, but I am not sure how that should be implemented (let alone implemented efficiently). After all, the moved block B might have gone to a completely different file. Would we scan for the matching block of text for the entire working tree?

This is why you should always look at the output from "git show" for the commit being cherry-picked and the output from "git diff HEAD" before concluding the cherry-pick to see if anything is amiss.

Thursday, April 2, 2015

First release candidate for Git 2.4

This release has a few changes in the user-visible output from Porcelain commands. These are not meant to be parsed by scripts, but the users still may want to be aware of the changes.
  • Output from "git log --decorate" (and "%d" format specifier used in the userformat "--format=<string>" parameter "git log" family of commands take) used to list "HEAD" just like other branch names, separated with a comma in between. E.g.

         $ git log --decorate -1 master
         commit bdb0f6788fa5e3cacc4315e9ff318a27b2676ff4 (HEAD, master)
    This release updates the output slightly when HEAD refers to the tip of a branch whose name is also shown in the output.  The above is shown as:

         $ git log --decorate -1 master
         commit bdb0f6788fa5e3cacc4315e9ff318a27b2676ff4 (HEAD -> master)

  • The phrasing "git branch" uses to describe a detached HEAD has been updated to match that of "git status". When the HEAD is at the same commit as it was originally detached, they now both show "detached at <commit object name>". When the HEAD has moved since it was originally detached, they now both show "detached from <commit object name>". Earlier "git branch" always used "from", even when the user hasn't moved HEAD since it was detached.
Otherwise, there are only minor fixes and documentation updates everywhere, and unusually low number of new and shiny toys ;-)
  • "git log --invert-grep --grep=WIP" will show only commits that do not have the string "WIP" in their messages.
  • "git push" has been taught a "--atomic" option that makes push to update more than one ref an "all-or-none" affair.
  • Extending the "push to deploy" added in 2.3, the behaviour of "git push" when updating the branch that is checked out can now be tweaked by push-to-checkout hook. The "push to deploy" implementation in 2.3 has a bug that makes it impossible to bootstrap an empty repository (or an unborn branch), but it can be worked around by using this hook.
  • "git send-email" used to accept a mistaken "y" (or "yes") as an answer to "What encoding do you want to use [UTF-8]? " without questioning.  Now it asks for confirmation when the answer looks too short to be a valid encoding name.
  • "git archive" can now be told to set the 'text' attribute in the resulting zip archive.
  • "git -C '' subcmd" used to refuse to work in the current directory, unlike "cd ''" which silently behaves as a no-op.
  • The versionsort.prerelease configuration variable can be used to specify that v1.0-pre1 comes before v1.0.
  • A new "push.followTags" configuration turns the "--follow-tags" option on by default for the "git push" command.
Please give it a good beating so that we can ship a successful v2.4 final at around the end of the month without regressions compared to v2.3 series.


Monday, March 30, 2015

Fun with Non-Fast-Forward

Your push may fail due to “non fast-forward”. You start from a history that is identical to that of your upstream, commit your work on top of it, and then by the time you attempt to push it back, the upstream may have advanced because somebody else was also working on his own changes.

For example, between the upstream and your repositories, histories may diverge this way (the asterisk denotes the tip of the branch; the time flows from left to right as usual):

Upstream                                You

---A---B---C*      --- fetch -->        ---A---B---C*

---A---B---C---E*                       ---A---B---C

            D?                                      D*
           /                                       /
---A---B---C---E?   <-- push ---        ---A---B---C

If the push moved the branch at the upstream to point at your commit, you will be discarding other people’s work. To avoid doing so, git push fails with “Non fast-forward”.

The standard recommendation when this happens is to “fetch, merge and then push back”. The histories will diverge and then converge like this:

Upstream                                You

---A---B---C---E*  --- fetch -->        ---A---B---C---E

                                                   /   /2
---A---B---C---E*                       ---A---B---C---E

               1                                       1
            D---F*                                  D---F*
           /   /2                                  /   /2
---A---B---C---E    <-- push ---        ---A---B---C---E

Now, the updated tip of the branch has the previous tip of the upstream (E) as its parent, so the overall history does not lose other people’s work.

The resulting history, however, is not what the majority of the project participants would appreciate. The merge result records D as its first parent (denoted with 1 on the edge to the parent), as if what happened on the upstream (E) were done as a side branch while F was being prepared and pushed back. In reality, E in the illustration may not be a single commit but can be many commits and many merges done by many people, and these many commits may have been observed as the tips of the upstream’s history by many people before F got pushed.

Even though Git treats all parents of a merge equally at the level of the underlying data model, the users have come to expect that the history they will see by following the first-parent chain tells the overall picture of the shared project history, while second and later parents of merges represent work done on side branches. From this point of view, what "fetch, merge and then push" is not quite a right suggestion to proceed from a failed push due to "non fast-forward".

It is tempting to recommend “fetch, merge backwards and then push back” as an alternative, and it almost works for a simple history:

Upstream                                You

---A---B---C---E*  --- fetch -->        ---A---B---C---E

                                                   /   /1
---A---B---C---E*                       ---A---B---C---E

               2                                       2
            D---F*                                  D---F*
           /   /1                                  /   /1
---A---B---C---E    <-- push ---        ---A---B---C---E

Then, if you follow the first-parent chain of the history, you will see how the tip of the overall project progressed. This is an improvement over the “fetch, merge and then push back”, but it has a few problems.

One reason why “merge backwards” is wrong becomes apparent when you consider what should happen when the push fails for the second time after the backward merge is made:

Upstream                                You

---A---B---C---E*  --- fetch -->        ---A---B---C---E

                                                   /   /1
---A---B---C---E*                       ---A---B---C---E

               2                                       2
            D---F?                                  D---F*
           /   /1                                  /   /1
---A---B---C---E---G    <-- push ---    ---A---B---C---E

               2                                       2   2
            D---F?                                  D---F---H*
           /   /1                                  /   /1  /1
---A---B---C---E---G    --- fetch -->   ---A---B---C---E---G

If the upstream side gained another commit G while F was being prepared, “fetch, merge backwards and then push” will end up creating a history like this, hiding D, the only real change you did in the repository, as the tip of the side branch of a side branch!

It also does not solve the problem if the work you did in D is not a single strand of pearls, but has merges from side branches. If D in the above series of illustrations were a few merges X, Y and Z from side branches of independent topics, the picture on your side, after fetching E from the updated upstream, may look like this:

    y---y---y   .
   /         \   .
  .   x---x   \   \
 .   /     \   \   \
.   /       X---Y---Z*
   /       /

That is, hoping that the other people will stay quiet, starting from C, you merged three independent topic branches on top of it with merges X, Y and Z, and hoped that the overall project history would fast-forward to Z. From your perspective, you wanted to make A-B-C-X-Y-Z to be the main history of the project, while x, y, ... were implementation details of X, Y and Z that are hidden behind merges on side branches. And if there were no E, that would indeed have been the overall project history people would have seen after your push.

Merging backwards and pushing back would however make the history’s tip F, with its first parent E, and Z becomes a side branch. The fact that X, Y and Z (more precisely, X^2 and Y^2 and Z^2) were independent topics is lost by doing so:

    y---y---y   .
   /         \   .
  .   x---x   \   \
 .   /     \   \   \
.   /       X---Y---Z
   /       /         \2

So "merge backwards" is not a right solution in general. It is only valid if you are building a topic directly on top of the shared integration branch, which is something you should not be doing in the first place. In the earlier illustration of creating a single D on top of C and pushing it, if there were no work from other people (i.e. E), the push would have fast-forwarded, making D as a normal commit directly on the first-parent chain. If there were work from other people like E, “merge in reverse” would instead have recorded D on a side branch. If D is a topic separate and independent from other work being done in parallel, you would consistently want to see such a change appear as a merge of a side branch.

A better recommendation might be to “fetch, rebuild the first-parent chain, and then push back”. That is, you would rebuild X, Y and Z (i.e. “git log --first-parent C..”) on top of the updated upstream E:

    y---y-------y   .
   /             \   .
  .   x-------x   \   \
 .   /         \   \   \
.   /           X’--Y’--Z’*
   /           /

Note that this will work well naturally even when your first-parent chain has non-merge commits. For example, X and Y in the above illustration may be merges while Z is a regular commit that updates the release notes with descriptions of what was recently merged (i.e. X and Y). Rebuilding such a first-parent chain on top of E will make the resulting history very easy to understand when the reader follows the first-parent chain.

The reason why “rebuild the first-parent chain on the updated upstream” works the best is tautological. People do care about the first-parenthood when viewing the history, and you must have cared about the first-parent chain, too, when building your history leading to Z. That first-parenthood you and others care about is what is being preserved here. By definition, we cannot go wrong ;-)

And of course, this will work against a moving upstream that gained new commits while we were fixing things up on our end, because we won't be piling a new merges on top, but will be rebuilding X', Y' and Z' into X'', Y'', and Z'' instead.

To make this work on the pusher’s end, after seeing the initial “non fast-forward” refusal from “git push”, the pusher may need to do something like this:

$ git push ;# fails
$ git fetch
$ git rebase --first-parent @{upstream}

Note that “git rebase --first-parent” does not exist yet; it is one of the topics I would like to see resurrected from old discussions.

But before "rebase --first-parent" materialises, in the scenario illustrated above, the pusher can do these instead of that command:

$ git reset --hard @{upstream}
$ git merge X^2
$ git merge Y^2
$ git merge Z^2

And then, inspect the result thoroughly. As carefully as you checked your work before you attempted your first push that was rejected. After that, hopefully your history will fast-forward the upstream and everybody will be happy.

Friday, March 27, 2015

Stats from recent Git releases

Following up to the previous post, I computed a few numbers for each development cycle in the recent past.

In all the graphs in this article, the horizontal axis counts the number of days into the development cycle, and the vertical axis shows the number of non-merge commits made.

  • The bottom line in each graph shows the number of non-merge commits that went to the contemporary maintenance track.
  • The middle line shows the number of non-merge commits that went to the release but not to the maintenance track (i.e. shiny new toys, oops-fixes to them, and clean-ups that were too minor to be worth merging to the maintenance track), and
  • The top line shows the total number of non-merge commits in the release.

Even though I somehow have a fond memory of v1.5.3, the beginning of the modern Git was unarguably the v1.6.0 release. Its development cycle started in June 2008 and ended in August 2008. We can see that we were constantly adding a lot more new shiny toys (this cycle had the big "no more git-foo in user's $PATH" change) than we were applying fixes to the maintenance track during this period. This cycle lasted for 60 days, 731 commits in total among which 120 went to the maintenance track of its time.

During the development cycle that led to v1.8.0 (August 2012 to October 2012), the pattern is very different. We cook our topics longer in the 'next' branch and we can clearly see that the topics graduate to 'master' in batches, which appear as jumps in the graph.  This cycle lasted for 63 days, 497 commits in total among which 182 went to the maintenance track of its time.

The cycle led to v2.0.0 (February 2014 to June 2014) has a similar pattern, but as another "we now break backward compatibility for ancient UI wart" release, we can see that a large batch of changes were merged in early part of the cycle, hoping to give them better and longer exposure to the testing public; on the other hand, we did not do too many fixes to the maintenance track.  This cycle lasted for 103 days, 475 commits in total among which 90 went to the maintenance track of its time.

The numbers for the current cycle leading to v2.4 (February 2015 to April 2015) are not finalized yet, but we can clearly see that this cycle is more about fixing old bugs than introducing shiny new toys from this graph.  This cycle as of this writing is at its 50th day, 344 commits in total so far among which 115 went to the maintenance track.

Note that we should not be alarmed by the sharp rise at the end of the graph. We just entered the pre-release freeze period and the jump shows the final batch of topics graduating to the 'master' branch. We will have a few more weeks until the final, and during that period the graph will hopefully stay reasonably flat (any rise from this point on would mean we would be doing a last-minute "oops" fixes).

Thursday, March 26, 2015

Git 2.4 will hopefully be a "product quality" release

Earlier in the day, an early preview release for the next release of Git, 2.4-rc0, was tagged. Unlike many major releases in the past, this development cycle turned out to be relatively calm, fixing many usability warts and bugs, while introducing only a few new shiny toys.

In fact, the ratio of changes that are fixes and clean-ups in this release is unusually higher compared to recent releases. We keep a series of patches around each topic, whether it is a bugfix, a clean-up, or a new shiny toy, on its own topic branch, and each branch is merged to the 'master' branch after reviewing and testing, and then fixes and trivial clean-ups are also merged to the 'maint' branch. Because of this project structure, it is relatively easy to sift fixes and enhancement apart. Among new commits in release X since release (X-1), the ones that appear also in the last maintenance track for release (X-1) are fixes and clean-ups, while the remainder is enhancements.

Among the changes that went into v1.9.0 since v1.8.5, 23% of them were fixes that got merged to v1.8.5.6, for example, and this number has been more or less stable throughout the last year. Among the changes in v2.3.0 since v2.2.0, 18% of them were also in v2.2.2. Today's preview v2.4.0-rc0, however, has 333 changes since v2.3.0, among which 110 are in v2.3.4, which means that 33% of the changes are fixes and clean-ups.

These fixes came from 33 contributors in total, but changes from only a few usual suspects dominate and most other contributors have only one or two changes on the maintenance track. It is illuminating to compare the output between

$ git shortlog --no-merges -n -s ^maint v2.3.0..master
$ git shortlog --no-merges -n -s v2.3.0..maint

to see who prefers to work on new shiny toys and who works on product quality by fixing other people's bugs. The first command sorts the contributors by the number of commits since v2.3.0 that are only in the 'master', i.e. new shiny toys, and the second command sorts the contributors by the number of commits since v2.3.0 that are in the 'maint', i.e. fixes and clean-ups.

The output matches my perception (as the project maintainer, I at least look at, if not read carefully, all the changes) of each contributor's strength and weakness fairly well. Some are always looking for new and exciting things while being bad at tying loose ends, while others are more careful perfectionists.

Wednesday, March 25, 2015

Git Rev News

Christian Couder (who is known for his work enhancing the "git bisect" command several years ago) and Thomas Ferris Nicolaisen (who hosts a popular podcast GitMinutes) started producing a newsletter for Git development community and named it Git Rev News.

Here is what the newsletter is about in their words:

Our goal is to aggregate and communicate some of the activities on the Git mailing list in a format that the wider tech community can follow and understand. In addition, we'll link to some of the interesting Git-related articles, tools and projects we come across.

This edition covers what happened during the month of March 2015.

As one of the people who still remembers "Git Traffic", which was meant to be an ongoing summary of the Git mailing list traffic but disappeared after publishing its first and only issue, I find this a very welcome development. Because our mailing list is a fairly high-volume one, it is almost impossible to keep up with everything that happens there, unless you are actively involved in the development process.

I hope their effort will continue and benefit the wider Git ecosystem. You can help them out in various ways if you are interested.

  • They are not entirely happy with how the newsletter is formatted. If you are handy with HTML, CSS or some blog publishing platforms, they would appreciate help in this area.
  • They are not paid full-time editors but doing this as volunteers. They would appreciate editorial help as well.
  • You can contribute by writing your own articles that summarize the discussions you found interesting on the mailing list.