Tuesday, March 18, 2014

Git 1.9.1

Traditionally, releases numbered with three Dewey-decimal digits were major releases that add new features, while ones with four were maintenance releases with only fixes. This was meant to give us some flexibility to say that the difference between 1.7.12 and 1.8.0 are larger than the difference between 1.8.1 and 1.8.2 (1.7.12 was the last major release in the 1.7.x series), while reserving the difference in the first digit for really big changes (i.e. 2.0 may finally toggle a switch that makes Git incompatible with older 1.x releases out of the box).

But we found out that in practice, we do not need to have three levels of changes (an incremental that changes the third digit between 1.8.1 to 1.8.2, a larger update that changes the second digit between 1.7.12 to 1.8.0, and a huge update that changes the first digit between 1.9 and 2.0). Hence the last major release was officially called "Git 1.9" when it was released on February 14, 2014.

It logically follows that, because we are dropping the third digit (or the second, depending on how you look at it) from the numbering of major releases, the first maintenance release for Git 1.9 is named with three digits, not four.

Git 1.9.1 is such a release. Among many changes we have been cooking on the development front towards the next major release, which will be called Git 2.0, this maintenance release contains only the fixes, and everybody is encouraged to upgrade to it.

Fixes since Git 1.9 are as follows:
  • "git clean -d pathspec" did not use the given pathspec correctly and ended up cleaning too much.
  • "git difftool" misbehaved when the repository is bound to the working tree with the ".git file" mechanism, where a textual file ".git" tells us where it is.
  • "git push" did not pay attention to branch.*.pushremote if it is defined earlier than remote.pushdefault; the order of these two variables in the configuration file should not matter, but it did by mistake.
  • Codepaths that parse timestamps in commit objects have been tightened.
  • "git diff --external-diff" incorrectly fed the submodule directory in the working tree to the external diff driver when it knew it is the same as one of the versions being compared.
  • "git reset" needs to refresh the index when working in a working tree (it can also be used to match the index to the HEAD in an otherwise bare repository), but it failed to set up the working tree properly, causing GIT_WORK_TREE to be ignored.
  • "git check-attr" when working on a repository with a working tree did not work well when the working tree was specified via the --work-tree (and obviously with --git-dir) option.
  • "merge-recursive" was broken in 1.7.7 era and stopped working in an empty (temporary) working tree, when there are renames involved.  This has been corrected.
  • "git rev-parse" was loose in rejecting command line arguments that do not make sense, e.g. "--default" without the required value for that option.
  • include.path variable (or any variable that expects a path that can use ~username expansion) in the configuration file is not a boolean, but the code failed to check it.
  • "git diff --quiet -- pathspec1 pathspec2" sometimes did not return correct status value.
  • Attempting to deepen a shallow repository by fetching over smart HTTP transport failed in the protocol exchange, when no-done extension was used.  The fetching side waited for the list of shallow boundary commits after the sending end stopped talking to it.
  • Allow "git cmd path/", when the 'path' is where a submodule is bound to the top-level working tree, to match 'path', despite the extra and unnecessary trailing slash (such a slash is often given by command line completion).
Have fun.

Wednesday, November 27, 2013

Git 1.8.5

The latest release Git 1.8.5 is out. Among many incremental improvements, there are a handful of changes that are worth mentioning:

  • Magic pathspecs like ":(icase)makefile" (matches both Makefile and makefile) and ":(glob)foo/**/bar" (matches "bar" in "foo" and any subdirectory of "foo") can be used in more places.
  • The "http.*" configuration variables can now be specified for individual URLs. E.g

    [http]
     sslVerify = true
    [http "https://weak.example.com/"] sslVerify = false

    would turn on http.sslVerify for everybody, except when talking with the specified URL.
  • "git mv A B" when moving a submodule has been taught to relocate the submodule's working tree and to adjust the paths in the .gitmodules file.
  • "git blame" can now take more than one -L option to discover the origin of multiple blocks of lines.
  • The http transport clients can optionally ask to save cookies with the http.savecookies configuration variable.
  • "git push" learned a more fine grained control over a blunt "--force" when requesting a non-fast-forward update with the "--force-with-lease=<refname>:<expected object name>" option.
  • "git diff --diff-filter=<classes of changes>" can now take lowercase letters (e.g. "--diff-filter=d") to mean "show everything but these classes".  "git diff-files -q" is now a deprecated synonym for "git diff-files --diff-filter=d".
  • "git gc" exits early without doing any work when it detects that another instance of itself is already running.

Tuesday, November 26, 2013

The Codebreakers


The CodebreakersEvery once in a while, I receive gifts from satisfied Git friends, chosen from my Amazon Wish list. And today was such a day. As I have been fairly busy cleaning up the fallout from our recent move and finally things are beginning less hectic, it turns out to be a perfect distraction gift for me, too ;-)


I only read the first few sections so far (it is a big, thick book and it would take me forever to finish reading and then write about it and thanking the person). Thanks, MTM!

Friday, November 8, 2013

Git 1.8.4.3

The latest maintenance release Git v1.8.4.3 has been tagged and is available at the usual places (see the list of public repositories). The fixes that have already merged to the 'master' branch for the upcoming Git v1.8.5 feature release are all there.

Here are the highlights, relative to the previous maintenance release v1.8.4.2:

  • The interaction between use of Perl in our test suite and NO_PERL has been clarified a bit.
  • A fast-import stream expresses a pathname with funny characters by quoting them in C style; remote-hg remote helper (in contrib/) forgot to unquote such a path.
  • One long-standing flaw in the pack transfer protocol used by "git clone" was that there was no way to tell the other end which branch "HEAD" points at, and the receiving end needed to guess. A new capability has been defined in the pack protocol to convey this information so that cloning from a repository with more than one branches pointing at the same commit where the HEAD is at now reliably sets the initial branch in the resulting repository.
  • We did not handle cases where http transport gets redirected during the authorization request (e.g. from http:// to https://).
  • "git rev-list --objects ^v1.0^ v1.0" gave v1.0 tag itself in the output, but "git rev-list --objects v1.0^..v1.0" did not.
  • The fall-back parsing of commit objects with broken author or committer lines were less robust than ideal in picking up the timestamps.
  • Bash prompting code to deal with an SVN remote as an upstream were coded in a way not supported by older Bash versions (3.x).
  • "git checkout topic", when there is not yet a local "topic" branch but there is a unique remote-tracking branch for a remote "topic" branch, pretended as if "git checkout -t -b topic remote/$r/topic" (for that unique remote $r) was run. This hack however was not implemented for "git checkout topic --".
  • Coloring around octopus merges in "log --graph" output was screwy.
  • We did not generate HTML version of documentation to "git subtree" in contrib/.
  • The synopsis section of "git unpack-objects" documentation has been clarified a bit.
  • An ancient How-To on serving Git repositories on an HTTP server lacked a warning that it has been mostly superseded with a more modern way.


Wednesday, October 30, 2013

v1.8.5-rc0: An early preview of the upcoming release

There are many little changes everywhere.  All of the fixes that have already went into 1.8.4.2 maintenance release are also in this preview.

Foreign interfaces, subsystems and ports.

  • "git-svn" used with SVN 1.8.0 when talking over https:// connection dumped core due to a bug in the serf library that SVN uses.  Work it around on our side, even though the SVN side is being fixed.
  • On MacOS X, we detected if the filesystem needs the "pre-composed unicode strings" workaround, but did not automatically enable it.  Now we do.
  • remote-hg remote helper misbehaved when interacting with a local Hg repository relative to the home directory, e.g. "clone hg::~/there".
  • imap-send ported to OS X uses Apple's security framework instead of OpenSSL one.
  • Subversion 1.8.0 that was recently released breaks older subversion clients coming over http/https in various ways.
  • "git fast-import" treats an empty path given to "ls" as the root of the tree.

UI, Workflows & Features

  • "git grep" and "git show" pays attention to "--textconv" option when these commands are told to operate on blob objects (e.g. "git grep -e pattern HEAD:Makefile").
  • "git replace" helper no longer allows an object to be replaced with another object of a different type to avoid confusion (you can still manually craft such replacement using "git update-ref", as an escape hatch).
  • "git status" no longer prints dirty status information for submodules for which submodule.$name.ignore is set to "all".
  • "git rebase -i" honours core.abbrev when preparing the insn sheet for editing.
  • "git status" during a cherry-pick shows what original commit is being picked.
  • Instead of typing four capital letters "HEAD", you can say "@" now, e.g. "git log @".
  • "git check-ignore" follows the same rule as "git add" and "git status" in that the ignore/exclude mechanism does not take effect on paths that are already tracked.  With "--no-index" option, it can be used to diagnose which paths that should have been ignored have been mistakenly added to the index.
  • Some irrelevant "advice" messages that are shared with "git status" output have been removed from the commit log template.
  • "update-refs" learnt a "--stdin" option to read multiple update requests and perform them in an all-or-none fashion.
  • Just like "make -C <directory>", "git -C <directory> ..." tells Git to go there before doing anything else.
  • Just like "git checkout -" knows to check out and "git merge -" knows to merge the branch you were previously on, "git cherry-pick" now understands "git cherry-pick -" to pick from the previous branch.
  • "git status" now omits the prefix to make its output a comment in a commit log editor, which is not necessary for human consumption.  Scripts that parse the output of "git status" are advised to use "git status --porcelain" instead, as its format is stable and easier to parse.
  • Make "foo^{tag}" to peel a tag to itself, i.e. no-op., and fail if "foo" is not a tag.  "git rev-parse --verify v1.0^{tag}" would be a more convenient way to say "test $(git cat-file -t v1.0) = tag".
  • "git branch -v -v" (and "git status") did not distinguish among a branch that does not build on any other branch, a branch that is in sync with the branch it builds on, and a branch that is configured to build on some other branch that no longer exists.
  • A packfile that stores the same object more than once is broken and will be rejected by "git index-pack" that is run when receiving data over the wire.
  • Earlier we started rejecting an attempt to add 0{40} object name to the index and to tree objects, but it sometimes is necessary to allow so to be able to use tools like filter-branch to correct such broken tree objects.  "filter-branch" can again be used to to do so.
  • "git config" did not provide a way to set or access numbers larger than a native "int" on the platform; it now provides 64-bit signed integers on all platforms.
  • "git pull --rebase" always chose to do the bog-standard flattening rebase.  You can tell it to run "rebase --preserve-merges" by setting "pull.rebase" configuration to "preserve".
  • "git push --no-thin" actually disables the "thin pack transfer" optimization.
  • Magic pathspecs like ":(icase)makefile" that matches both Makefile and makefile can be used in more places.
  • The "http.*" variables can now be specified per URL that the configuration applies.  For example,

       [http]
           sslVerify = true
       [http "https://weak.example.com/"]
           sslVerify = false

    would flip http.sslVerify off only when talking to that specified site.
  • "git mv A B" when moving a submodule A has been taught to relocate its working tree and to adjust the paths in the .gitmodules file.
  • "git blame" can now take more than one -L option to discover the origin of multiple blocks of the lines.
  • The http transport clients can optionally ask to save cookies with http.savecookies configuration variable.
  • "git push" learned a more fine grained control over a blunt "--force" when requesting a non-fast-forward update with the "--force-with-lease=<refname>:<expected object name>" option.
  • "git diff --diff-filter=<classes of changes>" can now take lowercase letters (e.g. "--diff-filter=d") to mean "show everything but these classes".  "git diff-files -q" is now a deprecated synonym for "git diff-files --diff-filter=d".
  • "git fetch" (hence "git pull" as well) learned to check "fetch.prune" and "remote.*.prune" configuration variables and to behave as if the "--prune" command line option was given.
  • "git check-ignore -z" applied the NUL termination to both its input (with --stdin) and its output, but "git check-attr -z" ignored the option on the output side. Make both honor -z on the input and output side the same way.
  • "git whatchanged" may still be used by old timers, but mention of it in documents meant for new users will only waste readers' time wonderig what the difference is between it and "git log".  Make it less prominent in the general part of the documentation and explain that it is merely a "git log" with different default behaviour in its own document.

Performance, Internal Implementation, etc.


  • The HTTP transport will try to use TCP keepalive when able.
  • "git repack" is now written in C.
  • Build procedure for MSVC has been updated.
  • If a build-time fallback is set to "cat" instead of "less", we should apply the same "no subprocess or pipe" optimization as we apply to user-supplied GIT_PAGER=cat.
  • Many commands use --dashed-option as a operation mode selector (e.g. "git tag --delete") that the user can use at most one (e.g. "git tag --delete --verify" is a nonsense) and you cannot negate (e.g. "git tag --no-delete" is a nonsense).  parse-options API learned a new OPT_CMDMODE macro to make it easier to implement such a set of options.
  • OPT_BOOLEAN() in parse-options API was misdesigned to be "counting up" but many subcommands expect it to behave as "on/off". Update them to use OPT_BOOL() which is a proper boolean.
  • "git gc" exits early without doing a double-work when it detects that another instance of itself is already running.
  • Under memory pressure and/or file descriptor pressure, we used to close pack windows that are not used and also closed filehandle to an open but unused packfiles. These are now controlled separately to better cope with the load.


Wednesday, September 18, 2013

Fun with first parent history

If your history is cleanly maintained, the output from "git log --first-parent" will consist only of merges of completed topics and trivially correct updates made directly on top of it. It will give you a birds-eye view that shows what features and fixes are made during given period without going into too much details. A history, each of whose merge shows work done for a specific topic (theme, purpose, objective; use whatever word you prefer) into it, means that whoever made these merges is the integrator, the keeper of the main history. The first-parent view of the history is useful only when the keeper of the main history takes good care of the main history.

People who use the central repository workflow where there is a single repository used for everybody to fetch from and push to complain that "git pull" they do merges the history taken from their central repository into their own development history and the merge is made in the wrong direction. They often wish for an option to flip the order of parents around for this reason, but they do not realize that a first-parent-clean history needs a lot more than that.

When you are using the "central shared repository" workflow, if you had and used such an option to flip the heads of a merge to record what you have done so far as a side branch of what everybody else did, the first-parent view would make a bit more sense than what you currently get. For example, if you worked on a specific topic that required six individual commits to complete since you forked from the mainline, your history in your repository and the project's main history in the central repository may look like this:

     x---x---x---x---x---x     Your history
    /
---X---o---o---o---o---o       Project's history

If you try to "git push" at this point, it will stop you, lest you lose these commits represented with o by overwriting the history. Git will tell you to first integrate the project's history with yours with "git pull", but if you actually pull to merge, the commits x will form the first-parent chain of the resulting merge, and the sequence of commits (most likely, merges of topics unrelated to each other) o will appear as its side branch:

     x---x---x---x---x---x---M     Your history
    /                       /
---X---o---o---o---o---o----       Project's history

This is bad, and "flip the order of parents" may help to produce a history of this shape instead:

     x---x---x---x---x---x     Your history
    /                     \
---X---o---o---o---o---o---M   Project's history

However, there is another half of the problem that is not solved by such an option. People, especially those who work with the centralized workflow, tend to pull too often, just to catch up. Even with such a "flip the order of parents" option, what they would end up with in reality would often look more like this:

     x---x   x---x---x   x     Your history
    /     \ /         \ / \
---X---o---M---o---o---M---M   Project's history

The result fragments otherwise a logical and clean "single strand of pearls to fully address the issue, consisting of 6 commits", into three separate and seemingly unrelated pieces. Imagine that other people are working the same way, and the commits marked with o are merges of side branches they add their half-way work to the main history similar to what happened in the illustration above. You would get this history:

     x---x   x---x---x   x     Your history
    /     \ /         \ / \
---X---M---M---M---M---M---M   Project's history
      / \     / \ /
  ---y   y---y   y             Your colleague's history

Now, in "git log --first-parent" of the project's mainline history, there is nothing that links these six commits marked with x together and differentiates them from commits marked with y, and there is nothing that groups these M (merges) that pull in your disjoint steps to achieve a single goal and separates from other merges. Unless people stop doing that too many "pull"s that are used only to "catch up", even with the "flip the parents of a merge" option, you will not get a history that yields a good first-parent view.

As I wrote in an earlier entry (Fun with various workflows), when you "pull" and then "push" to the central repository, you are playing the role of the integrator, the keeper of the main history, and you are responsible for taking a good care of it yourself. If you make a 2+3+1=6 mess as depicted in the last illustration above, you are failing to do so. People who later read "git log --first-parent" would not be able to see that these six commits you did were to achieve a single coherent goal and should be read together to understand it.

One obvious way to solve it is to use a topic branch workflow, and you do a "git pull" from the shared repository while you are on your 'master', which is free of your 'x's until that 6-commit series is complete and ready. Then you locally merge that topic branch to your 'master' and push it back for everybody to see, which will give you the third picture in this message.

Incidentally, by doing so, you do not need the "flip the order of parents" option, either.

Friday, August 23, 2013

Git 1.8.4

The 1.8.4 release has finally been tagged and pushed out to the usual places. It contains 870+ changes from ~100 contributors (among which 33 people are new) since v1.8.3.

Due to regressions discovered at the last minute, two topics that have been in the master branch for a while had to be reverted. They are expected to come back after fixing the regressions in future releases.

Here are some highlights:

  • "git log" learnt the "-Lbegin,end:filename" option. This starts from the specified range and digs through the history. It may still have rough edges and memory leaks, though.
  • "git clean" learnt the interactive mode, modeled after "git add -i" interface.
  • "git check-mailmap" is a new command that lets you inquire your .mailmap file for the canonical username and e-mail address.
  • "git name-rev" learnt to name an annotated tag object name back to its tagname.
  • Various subcommands of "git submodule" now work even from a subdirectory.
  • "git submodule update" can optionally clone the submodule repositories shallowly.
  • The "push.default=simple" mode of "git push" has been updated to behave like "current" when you push to a remote that is different from where you fetch from (e.g. via remote.pushdefault), in order to better support the triangular workflow.
  • "git log" learnt the "--author-date-order" option.
  • The configuration variable color.ui defaults to "auto" now.
  • "git describe" learnt the "--first-parent" option.
  • "git fetch $remote $branch" used to avoid touching the remote-tracking branch (you could always be explicit and say "git fetch $remote $branch:refs/remotes/$remote/$branch"). The command now updates the remote-tracking branch (if configured).
  • Use of platform fnmatch(3) function (many places like pathspec matching, .gitignore and .gitattributes) have been replaced with wildmatch, allowing "foo/**/bar" to match "foo/bar", "foo/a/bar", etc.
Have fun.

Wednesday, August 14, 2013

Delaying Git 1.8.4 by a week

It appears that we need to revert two topics that cause regressions before the upcoming 1.8.4 release.

  • There is a corner case bug in git stash.  Suppose you have a path that is a regular file (or a symbolic link) in the committed state. You change it to a directory in your working tree, and have various new files in it. Some of them may be tracked, while others may not be. You issue git stash. The command needs to match the path to the committed state, hence it needs to remove the directory to resurrect the path. The new files in the directory you have git added will be in the stash so they are OK, but what happens to the untracked ones? They are killed. The same issue exists if you turned a tracked directory into a file and run the command without first running git add.
    An attempted fix was to ask
    git ls-files --killed to see if such a path exists that will be lost, but it turns out that this makes the command unusably slow in certain directories with very many untracked files.
  • There was an attempt to save typing four capital letters "H", "E", "A" and "D" by instead allowing you to type "@", e.g. git log @. The idea may have been a good one, but the change was executed poorly and incorrectly triggered when it shouldn't (e.g. having a branch whose name is @/foo made it into HEAD/foo or something insane).

Because we have already passed -rc3, I'd feel safer to add another rc week before the final. Updated Git Calendar is here.

Both of these changes meant well, and because we are not reverting them due to design mistakes (i.e. we are not saying that "we do not ever want to have such a feature or fix in our system"), hopefully these can be redone properly after the upcoming release is done.

Some leftover bits (I'll add more to this list later).

  • [DONE] Find out where ls-files --killed is unnecessarily wasting time, and fix it. This is a prerequisite to resurrect the stash corner case fix.
    Cf. $gmane/
    232113
  • Refactor run_hook() interface to be truly reusable by codepath other than git commit, resurrecting a "how about this" patch sent in the past.
    Cf. $gmane/192806, $gmane/212284
  • [IN PROGRESS] Extend the upload-pack protocol to tell what symbolic ref points at which other ref by resurrecting the idea outlined in 2008.
    Cf. $gmane/102039
  • [IN PROGRESS] Rethink how name-hash keeps track of names of directories and actual files to help case insensitive filesystems. Since 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), there appears to be no reason why a directory name has to be registered to the hash with a trailing slash, which is the root cause why directory_exists_in_index_icase() reads past the end of the buffer.
    Cf. $gmane/232822
  • [DONE] Look into cvsserver permission bits regression between 1.8.1 and 1.8.3.
    Cf. $gmane/234476
  • Look into pathspec-limited revision traversal regression between 1.8.3 and 1.8.4.
    Cf. $gmane/234462
  • Checking out a branch X that does not have directory D (or worse, has a file D), while you are in the directory D, may want to fail.
    Cf. $gmane/234905
  • Allow extra options to "ssh" invocation made from connect.c, in a way that (ideally) does not break backward compatibility.
    Cf. $gmane/234624
  • Perhaps add a --post-service-hook to the git-daemon that can be used after a service finishes? The exit status from the service process means totally different thing from what the user of service perceives because the former has to say "successfully told the requester that the request is denied", it may not be such a useful mechanism as one naïvely would expect, though.
    Cf. $gmane/
    234706
  • git checkout $commit -- somedir should remove somedir/file that is not in $commit but is in the original index.
    Cf. $gmane/
    234935

Thursday, August 1, 2013

Git 1.8.4-rc1

The first release candidate for Git v1.8.4-rc1 is available for testing at the usual places.
For highlights, please refer to the previous post on v1.8.4-rc0.

Have fun.

Wednesday, July 24, 2013

Git 1.8.4-rc0

A release candidate preview Git v1.8.4-rc0 is now available for testing at the usual places.

As this cycle is a rather large update, please test this thoroughly. It contains 814 non-merge commits, from 90+ contributors (v1.8.3 consisted of 694 changes from 97 contributors).

Here are some highlights:

  • "git log" learnt the "-Lbegin,end:filename" option. This starts from the specified range and digs through the history. It may still have rough edges and memory leaks, though.
  • "git clean" learnt the interactive mode, modeled after "git add -i" interface.
  • "git check-mailmap" is a new command that lets you inquire your .mailmap file for the canonical username and e-mail address.
  • "git name-rev" learnt to name an annotated tag object name back to its tagname.
  • Various subcommands of "git submodule" now works even from a subdirectory.
  • "git submodule update" can optionally clone the submodule repositories shallowly.
  • The "push.default=simple" mode of "git push" has been updated to behave like "current" when you push to a remote that is different from where you fetch from (e.g. via remote.pushdefault), in order to better support the triangular workflow.
  • "git log" learnt the "--author-date-order" option.
  • The configuration variable color.ui defaults to "auto" now.
  • Instead of typing "HEAD", you can say "@" instead, e.g. "git log @".
  • "git describe" learnt the "--first-parent" option.
  • "git fetch $remote $branch" used to avoid touching the remote-tracking branch (you could always be explicit and say "git fetch $remote $branch:refs/remotes/$remote/$branch"). The command now updates the remote-tracking branch (if configured).
  • Use of platform fnmatch(3) function (many places like pathspec matching, .gitignore and .gitattributes) have been replaced with wildmatch, allowing "foo/**/bar" to match "foo/bar", "foo/a/bar", etc.
Have fun.