Tuesday, September 30, 2014

Fun (?) with GnuPG

We use GnuPG as part of the infrastructure to certify authenticity of development history in Git in various places:
  • Signed tags created by git tag -s is to say "This tag was created by me, the holder of the private GnuPG key that signed this object". Because the object name of any Git object is computed as a cryptographic hash over what the object records, and because a signed tag object records the object name of a tagged object (typically a commit) and the human readable name (typically a release number or name) the tagger wants to give the tagged object, an attacker cannot forge a phony tag that points at a different commit signed with the private key the attacker does not have. You are saying "You can verify that it is true that I wanted to make that commit release X" safely because of this. Also, because the commit object records all the objects and their location in a project tree, and the parent commit objects, such a signed tag also ensures that all the development history behind such a tagged commit cannot be tampered with.
  • When you merge a signed tag (either done by git merge or git pull), the content of the tag with its GnuPG signature is copied to the resulting commit object. This lets you ensure that the history behind the side branch that was merged to the history cannot be tampered with and the signature certifies that it came from the signer (typically a subsystem lieutenant).
  • Signed commits created by git commit -S is a way to say "This commit was created by me", and ensures that the history behind the commit cannot be tampered with and certifies that the change it introduces came from the signer.
  • Still under development is git push --signed, a way to certify that you wanted to put a particular commit at the tip of a particular branch.
GnuPG is also used as a mechanism to ensure the integrity and authenticity of tarballs that are sent to the kernel.org servers, which is a common distribution point for open source projects like the Linux kernel and Git itself. A maintainer prepares a tarball and a detached signature, uploads them, and the receiving end will verify that the signature is good.

It is a common practice to specify the expiration date when creating a signing key. For example, the key I have been using to sign Git release tags was originally set up to expire in 3 years since the key was created. But the thing is, a project may outlive that expiry date. An interesting question is what happens to the existing tags when the key expires.

Unluckily, the right thing happens. If the holder of the key does not do anything, the key becomes expired, and the signatures in the signed tags stops validating. Luckily, the validity of a key can be extended by the holder of the key, and once it is done, the signatures made before the key's original expiration date will continue to validate fine.

At least, that is the theory ;-)

As my key was originally set to expire early next month, I've extended the lifespan of the key 96AFE6CB I have been using a few days ago and uploaded the updated key to pgp keyservers, so existing signed tags (e.g. v2.0.0) should continue to be valid.

A few tips:
  • Although this page is a specific instruction to Debian contributors, it was very helpful when I had to figure out how to futz with GnuPG subkeys. It does not talk about how to update the expiration date for a subkey, though (you use "gpg --edit-key" and then use "expire" command).
  • In order to force a specific subkey to be used when signing for Git, you would need to use the ! suffix to the GnuPG key-id, e.g. in my ~/.gitconfig file:
      [user] signingkey = 96AFE6CB!
    Without the ! suffix, GnuPG tries to use the newest subkey you have associated with the same primary key, which may not be the subkey you would want to use.
I signed a new v2.1.2 maintenance release with the same key today. Hopefully it will validate OK for you (otherwise, you may have to fetch the public key from the keyserver).

Wednesday, May 28, 2014

Git 2.0

The real "Git 2.0" is finally out.

From the point of view of end users who are totally new to Git, this release will give them the defaults that are vastly improved compared to the older versions, but at the same time, for existing users, this release is designed to be as low-impact as possible, as long as they have been following recent releases along (instead of sticking to age-old releases like 1.7.x series). Some may even say, without remembering why it was a big deal to bring these new default behaviours to help new users, that the new release does not offer anything exciting—and that is exactly what we want to hear from existing users. In recent releases for the past year or so, we have added knobs to allow users to enable these new defaults before 2.0 happens, and added warnings to let users know when they perform an operation whose outcome will be different between 1.x series and 2.0 release. The existing users are hopefully very well prepared by now, and "Git 2.0" is designed to be the final "flipping the default" step.

We had to delay the final release by a week or so because we found a few problems  in earlier release candidates (request-pull had a regression that stopped it from showing the "tags/" prefix in "Please pull tags/frotz" when the user asked to compose a request for 'frotz' to be pulled; a code path in git-gui to support ancient versions of Git incorrectly triggered for Git 2.0), which we had to fix in an extra unplanned release candidate.

Hopefully the next cycle will become shorter, as topics that have been cooking on the 'next' branch had extra time to mature, so it all evens out in the end ;-).

Have fun.

Friday, April 25, 2014

Git 2.0 release candidate 1

This is the first release candidate for the upcoming Git 2.0. There are usual sort of updates and fixes one would expect to see between any two feature releases, but the primary reason why its name begins with "2" (as opposed to the last feature release whose name was "Git 1.9") is because it has a few backward incompatible changes that are all meant to improve the end-user experiences.

  • People almost always push to a single place, and many people would push a single branch they are currently on. The default behaviour of "git push" (that does not say which branches to push out to where on the command line) has been updated to better support this mode of working (as opposed to working on making all branches they are going to publish ready and then push all of them in one go). The old default of pushing out all the matching branches is available by setting the push.default configuration variable to matching.
  • Even though "git commit -a" can be run from any subdirectory to commit changes to all the tracked paths in the working tree, "git add -u" and "git add -A" (without specifying any path on the command line) used to operate only inside the current directory. This inconsistency bothered many people, and these commands have been updated to operate on all modified (for "-u") or all (for "-A") paths. Use "git add -u ." and "git add -A ." to restrict the command to the current directory.
  • "git add path" is now the same as "git add -A path" now, so that "git add directory/" will notice paths you removed from the directory and record the removal.  In older versions of Git, it used to ignore removals.  You can say "git add --ignore-removal path" to add only added or modified paths, if you really want to.

Some of the readers may remember that we didn't give users a very good transition experience when we introduced a backward incompatible change in Git 1.6.0. We used to install all the "git-cmd"s in the same directory as "git" itself and people were used to that "git commit" and "git-commit" can be used interchangeably before that release. Then we stopped installing what does not have to be on user's $PATH at that release, which is a change that breaks people's finger-memory and existing scripts. All we did to prepare users for that change was to warn about it in release notes since Git 1.5.4 and it was apparently not enough. Many people were unhappy.

In retrospect, perhaps we could have done better by adding code to somehow detect when "git-cmd" is invoked as the top-level command and warn that such usage would break in future versions to train users to use "git cmd" form way before releasing the version that actually delivered the change.

This time around, we have been trying to be a lot more careful. For the past handful of releases, we have added extra code to detect cases where exiting versions of Git and the upcoming Git 2.0 will behave differently and to warn about the upcoming change. As the result, the actual difference between Git 1.9 and Git 2.0 is mostly "flipping the default" for these changes.

Have fun.

Tuesday, March 18, 2014

Git 1.9.1

Traditionally, releases numbered with three Dewey-decimal digits were major releases that add new features, while ones with four were maintenance releases with only fixes. This was meant to give us some flexibility to say that the difference between 1.7.12 and 1.8.0 are larger than the difference between 1.8.1 and 1.8.2 (1.7.12 was the last major release in the 1.7.x series), while reserving the difference in the first digit for really big changes (i.e. 2.0 may finally toggle a switch that makes Git incompatible with older 1.x releases out of the box).

But we found out that in practice, we do not need to have three levels of changes (an incremental that changes the third digit between 1.8.1 to 1.8.2, a larger update that changes the second digit between 1.7.12 to 1.8.0, and a huge update that changes the first digit between 1.9 and 2.0). Hence the last major release was officially called "Git 1.9" when it was released on February 14, 2014.

It logically follows that, because we are dropping the third digit (or the second, depending on how you look at it) from the numbering of major releases, the first maintenance release for Git 1.9 is named with three digits, not four.

Git 1.9.1 is such a release. Among many changes we have been cooking on the development front towards the next major release, which will be called Git 2.0, this maintenance release contains only the fixes, and everybody is encouraged to upgrade to it.

Fixes since Git 1.9 are as follows:
  • "git clean -d pathspec" did not use the given pathspec correctly and ended up cleaning too much.
  • "git difftool" misbehaved when the repository is bound to the working tree with the ".git file" mechanism, where a textual file ".git" tells us where it is.
  • "git push" did not pay attention to branch.*.pushremote if it is defined earlier than remote.pushdefault; the order of these two variables in the configuration file should not matter, but it did by mistake.
  • Codepaths that parse timestamps in commit objects have been tightened.
  • "git diff --external-diff" incorrectly fed the submodule directory in the working tree to the external diff driver when it knew it is the same as one of the versions being compared.
  • "git reset" needs to refresh the index when working in a working tree (it can also be used to match the index to the HEAD in an otherwise bare repository), but it failed to set up the working tree properly, causing GIT_WORK_TREE to be ignored.
  • "git check-attr" when working on a repository with a working tree did not work well when the working tree was specified via the --work-tree (and obviously with --git-dir) option.
  • "merge-recursive" was broken in 1.7.7 era and stopped working in an empty (temporary) working tree, when there are renames involved.  This has been corrected.
  • "git rev-parse" was loose in rejecting command line arguments that do not make sense, e.g. "--default" without the required value for that option.
  • include.path variable (or any variable that expects a path that can use ~username expansion) in the configuration file is not a boolean, but the code failed to check it.
  • "git diff --quiet -- pathspec1 pathspec2" sometimes did not return correct status value.
  • Attempting to deepen a shallow repository by fetching over smart HTTP transport failed in the protocol exchange, when no-done extension was used.  The fetching side waited for the list of shallow boundary commits after the sending end stopped talking to it.
  • Allow "git cmd path/", when the 'path' is where a submodule is bound to the top-level working tree, to match 'path', despite the extra and unnecessary trailing slash (such a slash is often given by command line completion).
Have fun.

Wednesday, November 27, 2013

Git 1.8.5

The latest release Git 1.8.5 is out. Among many incremental improvements, there are a handful of changes that are worth mentioning:

  • Magic pathspecs like ":(icase)makefile" (matches both Makefile and makefile) and ":(glob)foo/**/bar" (matches "bar" in "foo" and any subdirectory of "foo") can be used in more places.
  • The "http.*" configuration variables can now be specified for individual URLs. E.g

     sslVerify = true
    [http "https://weak.example.com/"] sslVerify = false

    would turn on http.sslVerify for everybody, except when talking with the specified URL.
  • "git mv A B" when moving a submodule has been taught to relocate the submodule's working tree and to adjust the paths in the .gitmodules file.
  • "git blame" can now take more than one -L option to discover the origin of multiple blocks of lines.
  • The http transport clients can optionally ask to save cookies with the http.savecookies configuration variable.
  • "git push" learned a more fine grained control over a blunt "--force" when requesting a non-fast-forward update with the "--force-with-lease=<refname>:<expected object name>" option.
  • "git diff --diff-filter=<classes of changes>" can now take lowercase letters (e.g. "--diff-filter=d") to mean "show everything but these classes".  "git diff-files -q" is now a deprecated synonym for "git diff-files --diff-filter=d".
  • "git gc" exits early without doing any work when it detects that another instance of itself is already running.

Tuesday, November 26, 2013

The Codebreakers

The CodebreakersEvery once in a while, I receive gifts from satisfied Git friends, chosen from my Amazon Wish list. And today was such a day. As I have been fairly busy cleaning up the fallout from our recent move and finally things are beginning less hectic, it turns out to be a perfect distraction gift for me, too ;-)

I only read the first few sections so far (it is a big, thick book and it would take me forever to finish reading and then write about it and thanking the person). Thanks, MTM!

Friday, November 8, 2013


The latest maintenance release Git v1.8.4.3 has been tagged and is available at the usual places (see the list of public repositories). The fixes that have already merged to the 'master' branch for the upcoming Git v1.8.5 feature release are all there.

Here are the highlights, relative to the previous maintenance release v1.8.4.2:

  • The interaction between use of Perl in our test suite and NO_PERL has been clarified a bit.
  • A fast-import stream expresses a pathname with funny characters by quoting them in C style; remote-hg remote helper (in contrib/) forgot to unquote such a path.
  • One long-standing flaw in the pack transfer protocol used by "git clone" was that there was no way to tell the other end which branch "HEAD" points at, and the receiving end needed to guess. A new capability has been defined in the pack protocol to convey this information so that cloning from a repository with more than one branches pointing at the same commit where the HEAD is at now reliably sets the initial branch in the resulting repository.
  • We did not handle cases where http transport gets redirected during the authorization request (e.g. from http:// to https://).
  • "git rev-list --objects ^v1.0^ v1.0" gave v1.0 tag itself in the output, but "git rev-list --objects v1.0^..v1.0" did not.
  • The fall-back parsing of commit objects with broken author or committer lines were less robust than ideal in picking up the timestamps.
  • Bash prompting code to deal with an SVN remote as an upstream were coded in a way not supported by older Bash versions (3.x).
  • "git checkout topic", when there is not yet a local "topic" branch but there is a unique remote-tracking branch for a remote "topic" branch, pretended as if "git checkout -t -b topic remote/$r/topic" (for that unique remote $r) was run. This hack however was not implemented for "git checkout topic --".
  • Coloring around octopus merges in "log --graph" output was screwy.
  • We did not generate HTML version of documentation to "git subtree" in contrib/.
  • The synopsis section of "git unpack-objects" documentation has been clarified a bit.
  • An ancient How-To on serving Git repositories on an HTTP server lacked a warning that it has been mostly superseded with a more modern way.

Wednesday, October 30, 2013

v1.8.5-rc0: An early preview of the upcoming release

There are many little changes everywhere.  All of the fixes that have already went into maintenance release are also in this preview.

Foreign interfaces, subsystems and ports.

  • "git-svn" used with SVN 1.8.0 when talking over https:// connection dumped core due to a bug in the serf library that SVN uses.  Work it around on our side, even though the SVN side is being fixed.
  • On MacOS X, we detected if the filesystem needs the "pre-composed unicode strings" workaround, but did not automatically enable it.  Now we do.
  • remote-hg remote helper misbehaved when interacting with a local Hg repository relative to the home directory, e.g. "clone hg::~/there".
  • imap-send ported to OS X uses Apple's security framework instead of OpenSSL one.
  • Subversion 1.8.0 that was recently released breaks older subversion clients coming over http/https in various ways.
  • "git fast-import" treats an empty path given to "ls" as the root of the tree.

UI, Workflows & Features

  • "git grep" and "git show" pays attention to "--textconv" option when these commands are told to operate on blob objects (e.g. "git grep -e pattern HEAD:Makefile").
  • "git replace" helper no longer allows an object to be replaced with another object of a different type to avoid confusion (you can still manually craft such replacement using "git update-ref", as an escape hatch).
  • "git status" no longer prints dirty status information for submodules for which submodule.$name.ignore is set to "all".
  • "git rebase -i" honours core.abbrev when preparing the insn sheet for editing.
  • "git status" during a cherry-pick shows what original commit is being picked.
  • Instead of typing four capital letters "HEAD", you can say "@" now, e.g. "git log @".
  • "git check-ignore" follows the same rule as "git add" and "git status" in that the ignore/exclude mechanism does not take effect on paths that are already tracked.  With "--no-index" option, it can be used to diagnose which paths that should have been ignored have been mistakenly added to the index.
  • Some irrelevant "advice" messages that are shared with "git status" output have been removed from the commit log template.
  • "update-refs" learnt a "--stdin" option to read multiple update requests and perform them in an all-or-none fashion.
  • Just like "make -C <directory>", "git -C <directory> ..." tells Git to go there before doing anything else.
  • Just like "git checkout -" knows to check out and "git merge -" knows to merge the branch you were previously on, "git cherry-pick" now understands "git cherry-pick -" to pick from the previous branch.
  • "git status" now omits the prefix to make its output a comment in a commit log editor, which is not necessary for human consumption.  Scripts that parse the output of "git status" are advised to use "git status --porcelain" instead, as its format is stable and easier to parse.
  • Make "foo^{tag}" to peel a tag to itself, i.e. no-op., and fail if "foo" is not a tag.  "git rev-parse --verify v1.0^{tag}" would be a more convenient way to say "test $(git cat-file -t v1.0) = tag".
  • "git branch -v -v" (and "git status") did not distinguish among a branch that does not build on any other branch, a branch that is in sync with the branch it builds on, and a branch that is configured to build on some other branch that no longer exists.
  • A packfile that stores the same object more than once is broken and will be rejected by "git index-pack" that is run when receiving data over the wire.
  • Earlier we started rejecting an attempt to add 0{40} object name to the index and to tree objects, but it sometimes is necessary to allow so to be able to use tools like filter-branch to correct such broken tree objects.  "filter-branch" can again be used to to do so.
  • "git config" did not provide a way to set or access numbers larger than a native "int" on the platform; it now provides 64-bit signed integers on all platforms.
  • "git pull --rebase" always chose to do the bog-standard flattening rebase.  You can tell it to run "rebase --preserve-merges" by setting "pull.rebase" configuration to "preserve".
  • "git push --no-thin" actually disables the "thin pack transfer" optimization.
  • Magic pathspecs like ":(icase)makefile" that matches both Makefile and makefile can be used in more places.
  • The "http.*" variables can now be specified per URL that the configuration applies.  For example,

           sslVerify = true
       [http "https://weak.example.com/"]
           sslVerify = false

    would flip http.sslVerify off only when talking to that specified site.
  • "git mv A B" when moving a submodule A has been taught to relocate its working tree and to adjust the paths in the .gitmodules file.
  • "git blame" can now take more than one -L option to discover the origin of multiple blocks of the lines.
  • The http transport clients can optionally ask to save cookies with http.savecookies configuration variable.
  • "git push" learned a more fine grained control over a blunt "--force" when requesting a non-fast-forward update with the "--force-with-lease=<refname>:<expected object name>" option.
  • "git diff --diff-filter=<classes of changes>" can now take lowercase letters (e.g. "--diff-filter=d") to mean "show everything but these classes".  "git diff-files -q" is now a deprecated synonym for "git diff-files --diff-filter=d".
  • "git fetch" (hence "git pull" as well) learned to check "fetch.prune" and "remote.*.prune" configuration variables and to behave as if the "--prune" command line option was given.
  • "git check-ignore -z" applied the NUL termination to both its input (with --stdin) and its output, but "git check-attr -z" ignored the option on the output side. Make both honor -z on the input and output side the same way.
  • "git whatchanged" may still be used by old timers, but mention of it in documents meant for new users will only waste readers' time wonderig what the difference is between it and "git log".  Make it less prominent in the general part of the documentation and explain that it is merely a "git log" with different default behaviour in its own document.

Performance, Internal Implementation, etc.

  • The HTTP transport will try to use TCP keepalive when able.
  • "git repack" is now written in C.
  • Build procedure for MSVC has been updated.
  • If a build-time fallback is set to "cat" instead of "less", we should apply the same "no subprocess or pipe" optimization as we apply to user-supplied GIT_PAGER=cat.
  • Many commands use --dashed-option as a operation mode selector (e.g. "git tag --delete") that the user can use at most one (e.g. "git tag --delete --verify" is a nonsense) and you cannot negate (e.g. "git tag --no-delete" is a nonsense).  parse-options API learned a new OPT_CMDMODE macro to make it easier to implement such a set of options.
  • OPT_BOOLEAN() in parse-options API was misdesigned to be "counting up" but many subcommands expect it to behave as "on/off". Update them to use OPT_BOOL() which is a proper boolean.
  • "git gc" exits early without doing a double-work when it detects that another instance of itself is already running.
  • Under memory pressure and/or file descriptor pressure, we used to close pack windows that are not used and also closed filehandle to an open but unused packfiles. These are now controlled separately to better cope with the load.

Wednesday, September 18, 2013

Fun with first parent history

If your history is cleanly maintained, the output from "git log --first-parent" will consist only of merges of completed topics and trivially correct updates made directly on top of it. It will give you a birds-eye view that shows what features and fixes are made during given period without going into too much details. A history, each of whose merge shows work done for a specific topic (theme, purpose, objective; use whatever word you prefer) into it, means that whoever made these merges is the integrator, the keeper of the main history. The first-parent view of the history is useful only when the keeper of the main history takes good care of the main history.

People who use the central repository workflow where there is a single repository used for everybody to fetch from and push to complain that "git pull" they do merges the history taken from their central repository into their own development history and the merge is made in the wrong direction. They often wish for an option to flip the order of parents around for this reason, but they do not realize that a first-parent-clean history needs a lot more than that.

When you are using the "central shared repository" workflow, if you had and used such an option to flip the heads of a merge to record what you have done so far as a side branch of what everybody else did, the first-parent view would make a bit more sense than what you currently get. For example, if you worked on a specific topic that required six individual commits to complete since you forked from the mainline, your history in your repository and the project's main history in the central repository may look like this:

     x---x---x---x---x---x     Your history
---X---o---o---o---o---o       Project's history

If you try to "git push" at this point, it will stop you, lest you lose these commits represented with o by overwriting the history. Git will tell you to first integrate the project's history with yours with "git pull", but if you actually pull to merge, the commits x will form the first-parent chain of the resulting merge, and the sequence of commits (most likely, merges of topics unrelated to each other) o will appear as its side branch:

     x---x---x---x---x---x---M     Your history
    /                       /
---X---o---o---o---o---o----       Project's history

This is bad, and "flip the order of parents" may help to produce a history of this shape instead:

     x---x---x---x---x---x     Your history
    /                     \
---X---o---o---o---o---o---M   Project's history

However, there is another half of the problem that is not solved by such an option. People, especially those who work with the centralized workflow, tend to pull too often, just to catch up. Even with such a "flip the order of parents" option, what they would end up with in reality would often look more like this:

     x---x   x---x---x   x     Your history
    /     \ /         \ / \
---X---o---M---o---o---M---M   Project's history

The result fragments otherwise a logical and clean "single strand of pearls to fully address the issue, consisting of 6 commits", into three separate and seemingly unrelated pieces. Imagine that other people are working the same way, and the commits marked with o are merges of side branches they add their half-way work to the main history similar to what happened in the illustration above. You would get this history:

     x---x   x---x---x   x     Your history
    /     \ /         \ / \
---X---M---M---M---M---M---M   Project's history
      / \     / \ /
  ---y   y---y   y             Your colleague's history

Now, in "git log --first-parent" of the project's mainline history, there is nothing that links these six commits marked with x together and differentiates them from commits marked with y, and there is nothing that groups these M (merges) that pull in your disjoint steps to achieve a single goal and separates from other merges. Unless people stop doing that too many "pull"s that are used only to "catch up", even with the "flip the parents of a merge" option, you will not get a history that yields a good first-parent view.

As I wrote in an earlier entry (Fun with various workflows), when you "pull" and then "push" to the central repository, you are playing the role of the integrator, the keeper of the main history, and you are responsible for taking a good care of it yourself. If you make a 2+3+1=6 mess as depicted in the last illustration above, you are failing to do so. People who later read "git log --first-parent" would not be able to see that these six commits you did were to achieve a single coherent goal and should be read together to understand it.

One obvious way to solve it is to use a topic branch workflow, and you do a "git pull" from the shared repository while you are on your 'master', which is free of your 'x's until that 6-commit series is complete and ready. Then you locally merge that topic branch to your 'master' and push it back for everybody to see, which will give you the third picture in this message.

Incidentally, by doing so, you do not need the "flip the order of parents" option, either.

Friday, August 23, 2013

Git 1.8.4

The 1.8.4 release has finally been tagged and pushed out to the usual places. It contains 870+ changes from ~100 contributors (among which 33 people are new) since v1.8.3.

Due to regressions discovered at the last minute, two topics that have been in the master branch for a while had to be reverted. They are expected to come back after fixing the regressions in future releases.

Here are some highlights:

  • "git log" learnt the "-Lbegin,end:filename" option. This starts from the specified range and digs through the history. It may still have rough edges and memory leaks, though.
  • "git clean" learnt the interactive mode, modeled after "git add -i" interface.
  • "git check-mailmap" is a new command that lets you inquire your .mailmap file for the canonical username and e-mail address.
  • "git name-rev" learnt to name an annotated tag object name back to its tagname.
  • Various subcommands of "git submodule" now work even from a subdirectory.
  • "git submodule update" can optionally clone the submodule repositories shallowly.
  • The "push.default=simple" mode of "git push" has been updated to behave like "current" when you push to a remote that is different from where you fetch from (e.g. via remote.pushdefault), in order to better support the triangular workflow.
  • "git log" learnt the "--author-date-order" option.
  • The configuration variable color.ui defaults to "auto" now.
  • "git describe" learnt the "--first-parent" option.
  • "git fetch $remote $branch" used to avoid touching the remote-tracking branch (you could always be explicit and say "git fetch $remote $branch:refs/remotes/$remote/$branch"). The command now updates the remote-tracking branch (if configured).
  • Use of platform fnmatch(3) function (many places like pathspec matching, .gitignore and .gitattributes) have been replaced with wildmatch, allowing "foo/**/bar" to match "foo/bar", "foo/a/bar", etc.
Have fun.