Tuesday, November 8, 2011

Helping the kernel workflow redux

[edit: this is now used in the wild]

 The goal is still to give the kernel developers and its users a better way to validate the authenticity of changes that eventually land on Linus's tree.

The "signed commit" mechanism discussed in a previous post may be useful in some workflows, but not necessarily so in an environment where you would push a commit out, and then decide that the commit is worth including in the upstream history after a long while. If you forgot to sign the commit when pushed it out but otherwise the commit is in good shape, it feels a bit dirty that you would have to either amend it or cap it with a signed empty commit.

The latest round after a lengthy discussion across three mailing lists is to allow the integrator to run "git pull" against a signed tag, e.g.

$ git pull git://.../rusty.git/ rusty-for-linus

When 'rusty-for-linus' is a tag, the above syntax does not work with the current git (and it won't change in the upcoming 1.7.8 as we are deep in the pre-release feature freeze period), but you can instead say 'tags/rusty-for-linus' to do the same thing.

When recording the merge result of such a pull that names a tag, Git will open an editor and ask the integrator to give a merge commit message. So far, 'git merge' never asked for commit log message to be edited, and histories of many projects, especially when 'merge.log' configuration variable is not enabled, are littered with one-liner messages, such as "Merge from origin" that does not tell anything useful - why was this merge made, what changes were brought in, etc. That is going to change as well, as a side effect of this topic.

The integrator will see the following in the editor when recording such a merge:
  • The one-liner merge title (e.g 'Merge tag rusty-for-linus of git://.../rusty.git/');
  • The message in the tag object (either annotated or signed). This is where the contributor tells the integrator what the purpose of the work contained in the history is, and helps the integrator describe the merge better;
  • The output of GPG verification of the signed tag object being merged. This is primarily to help the integrator validate the tag before he or she concludes the pull by making a commit, and is prefixed by '#', so that it will be stripped away when the message is actually recorded; and
  • The usual "merge summary log", if 'merge.log' is enabled.
The contents of the signed tag is also recorded in the header field of the resulting commit object, so that anybody can later retrieve it from the history and validate the signature. The signed tag that was pulled is not stored in the integrator's repository, nor pushed out to the integrator's publishing point.

The primary reason the new mechanism records this information inside the commit instead of leaving the tag around is for convenience. Recent kernel history contains about 400 merges by Linus within 3 months (4 to 5 pulls per day), and that counts only the pulls by Linus. To make the whole merge fabric more trustworthy, the integration made by his lieutenants by pulling from their sub-lieutenants need to be made verifyable the same way, which would (1) make the number of signed tags even larger and (2) make it more likely somebody in the foodchain gets lazy and refuses to push out the signed tags after he or she used them for their own verification.


Git 1.7.7.3

Yet another minor update.

Arguably, the most important fix since 1.7.7.2 is that this one actually identifies itself as 1.7.7.3 (1.7.7.2 release still called itself 1.7.7.1 by mistake).

Monday, November 7, 2011

Git 1.7.8-rc1

The first release candidate for the upcoming release is out. Because there won't be any more new feature merged until the 1.7.8 final, it is a good time for the coolest kids on the block to start using the upcoming release before others do.

The release tarballs are found at:

  http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:

f35e5c4410b21710434cb591f4c89843e75bb793 git-1.7.8.rc1.tar.gz 72e27cd397f5ae7b3c9d8bb030a76d7c99cdbb50 git-htmldocs-1.7.8.rc1.tar.gz 95429858e879df3f9425cf1279e03cdec7832379 git-manpages-1.7.8.rc1.tar.gz

Also the following public repositories all have a copy of the v1.7.8.rc1 tag and the master branch that the tag points at:


  url = git://repo.or.cz/alt-git.git
  url = https://code.google.com/p/git-core/
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git

Tuesday, November 1, 2011

Git 1.7.7.2

This is just the result of applying fixes that are already applied to the 'master' branch for upcoming 1.7.8 release. Nothing earth-shattering, which is the whole point of the maintenance series ;-).


Helping the kernel workflow

[edit: there is an update here]

As many people may have already heard, the kernel developers would want to have a better way to validate the authenticity of changes that eventually go into Linus's tree. An e-mailed pull request asking Linus to pull from a public repository has three weak points:

  • The sender of e-mails can easily be spoofed;
  • Traditionally, a pull-request generated by tools states what commit of Linus the new work is based on and which branch of what repository needs to be pulled to receive it, but it does not even say what commit Linus should expect to see at the tip of the history; and
  • A pull-request could specify a random Git hosting site that gives out repository to anybody. Unless the security of the site is trustworthy and Linus knows the developer who asks him to pull from uses that repository, a pull from such a location is suspect.
A typical reaction to the first point is "Use signed e-mail", and while it is a technically valid statement, in practice GPG e-mails are pain to use for some people (including Linus).

The second point is rectified in the development version of Git, namely by commit cf73166 (request-pull: state what commit to expect, 2011-09-16), which is still cooking in the next branch. I expect this feature will be in the release after the upcoming 1.7.8.

The third point is currently addressed by Linus demanding his lieutenants to send pull-requests for repositories on trusted hosting site, including (updated) kernel.org.

I have been working on this issue for the past months, toying with a few alternative designs. My current thinking is to teach "git commit" an option to embed GPG signature in the commit object (already implemented and cooking in the next branch, expected to be in the release after the upcoming 1.7.8), add "the tip commit to expect has this object name" in the pull-request e-mail (mentioned earlier), and teach "git fetch" to verify the GPG signature of the tip commit. A typical lieutenant-to-Linus communication would probably look like this:

(Lieutenant)
  • Do his/her work normally.
  • When finishing up the work in his/her tree before the final testing s/he usually does before sending out a pull-request, "git commit [--amend] --gpg-sign" the tip of the history.
  • Push out the history to be pulled.
  • Run "git request-pull" to generate the pull-request message, that states what the tip commit should be, and send it to Linus.
(Linus)
  • Read the pull-request.
  • Run "git pull" from the requested repository, which fetches the history, verifies that the tip commit matches what was in pull-request, and verifies that the commit is signed by the developer.
 (Others)
  • Fetch from Linus. If they are inclined to independently validate what Linus pulled in, they can run "git log --show-signature" to view the tips of histories Linus merged are indeed signed.
This does not require signed pull-requests (a spoofed pull-request may cause Linus to fetch and merge, but the commit to be merged wouldn't be signed correctly so no real harm other than a bit of wasted time is done), and also the repository does not have to be hosted on a trusted site.

Sunday, October 30, 2011

Git 1.7.8-rc0

I just tagged 1.7.8-rc0 so that we can have something reasonable by late November before many in the US will stop working and start stuffing themselves. There are a few topics that I would further merge down before -rc1 but from the point of view of new features, this should be pretty much it for the upcoming release. Please test thoroughly to hunt for regressions.


Sunday, October 23, 2011

Git 1.7.7.1


The latest maintenance release Git 1.7.7.1 is available.


The release tarballs are found at:


    http://code.google.com/p/git-core/downloads/list


and their SHA-1 checksums are:


9200e0b8ee543d297952b78aac8f61f8b3693f8e  git-1.7.7.1.tar.gz
b25dacb07ebbfc37e7a90c3d47f76b4c0f0487d9  git-htmldocs-1.7.7.1.tar.gz
419c750617ae0c952e2e43f0357c16de6ebc0a44  git-manpages-1.7.7.1.tar.gz


Also the following public repositories all have a copy of the v1.7.7.1
tag and the maint branch that the tag points at:


  url = git://repo.or.cz/alt-git.git
  url = https://code.google.com/p/git-core/
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git

Have fun...

Tuesday, October 18, 2011

Git calendar redrawn

Updated Git calendar for the current development cycle is available here. Partly because the previous cycle took longer than was planned due to the outage of kernel.org, we already had many well cooked topics held back in the 'next' branch when we released 1.7.7, and this cycle is progressing in a rather rapid pace compared to the previous cycles.


Sunday, October 16, 2011

Final Jeopardy by Stephen Baker

Finished reading Final Jeopardy, covering the popular game show match between IBM's Watson and human champions. The pace of the book was pleasant; not too slow to be boring, not too fast to be sketchy. I do not regularly watch television, but I recall people gathering in front of the large TV in our building in one afternoon watching it.

The author excellently described in easy terms why this "question answering" was a harder problem than just finding documents that contain words with search engines. The machine needs to understand (or at least "pretend as if it understands") synonyms and concepts to a certain degree to give plausible answer to clues expressed in human language.

I however found the "search engines are dumb, machine needs to go one level higher" a somewhat antiquated notion, after recently seeing results from Google and Bing for "cartoon about pc and mac users updating software", "movie in which scientists go to brain in submarine" and such.

Tuesday, October 4, 2011

Beginning 1.7.8 cycle

Now that 1.7.7 release was made, I started looking at the topic branches that have been cooking (and some have been stagnating), to get a better feel of what the next release would look like.

One of the major focus would be robustness and security. Partly because the 1.7.7 cycle overlapped with the much publicised k.org break-in, there have been a lot of discussions, both on and off git@vger.kernel.org mailing list, to offer our users better tools to leave audit trails and help them be more confident about the objects and histories they exchange over the wire.
Some randomly selected topics, either already implemented or still under discussion & consideration, are:
  • Teach "git fetch" and "git push" (the object and history transfer over the wire) to validate the objects transferred from the other side of the network more thoroughly while storing them in the local object store before updating the local history pointer. "git push" already had a support for this (receive.fsckobjects) to protect the server side, but the same check will be supported for "git fetch" to give better assurance to the general public;
  • In addition, teach "git fetch" and "git push" to make sure that the set of objects received from the other side of the network is actually consistent with the history the other end claims to be transferring;
  • Signed push, where the server can require the history being uploaded to be cryptographically signed by the developer's public key;
  • Signed commit, where the developer can cryptographically sign a commit without using a separate signed tag.
As usual, I am sure there will be ideas from different contributors during the development cycle toward 1.7.8, and some of them will be part of 1.7.8 and others may have to wait until the next cycle.

One unrelated area that I would like to see more development is to support "floating" submodules, for which the commit object name recorded in the superproject tree takes lower precedence than the actual branch state of submodules, so that the top level superproject can say "module M must check out the latest and greatest of its B branch". This goes quite against the distributed nature of Git, where "latest and greatest" for a given branch depends on which repository you are talking about, but in a project that uses a central shared repository workflow, it makes sort of sense.

A possible implementation would be to record that branch B in the submodule M should be checked out in .gitmodules of the superproject, and "git submodule update M" would check out the local branch "B" (which must integrate with remotes/origin/B), if exists, instead of what is recorded at path M in the superproject tree. Some codepaths e.g. "git status", "git diff", that are run in the superproject currently assume that they always have to compare .git/HEAD in the submodule M with what is in the superproject tree at M, and need to be updated to compare remotes/origin/B and heads/B in submodule M for such a submodule.

By the way, I'll likely to change the repository signing key in the near future. The current key in use is:

pub   1024D/F3119B9A 2004-01-28
      Key fingerprint = 3565 2A26 2040 E066 C9A7  4A7D C0C6 D9A4 F311 9B9A
uid                  Junio C Hamano <gitster@pobox.com>


and I do not have any reason to believe the key might have been compromised (it never left my home machine), I've updated along with other k.org users. The new GPG key will be:

pub   4096R/713660A7 2011-10-01
      Key fingerprint = 96E0 7AF2 5771 9559 80DA  D100 20D0 4E5A 7136 60A7
uid                  Junio C Hamano <gitster@pobox.com>


You can obtain both of them at http://pgp.mit.edu/ and other quality keyservers.