Tuesday, November 1, 2011

Helping the kernel workflow

[edit: there is an update here]

As many people may have already heard, the kernel developers would want to have a better way to validate the authenticity of changes that eventually go into Linus's tree. An e-mailed pull request asking Linus to pull from a public repository has three weak points:

  • The sender of e-mails can easily be spoofed;
  • Traditionally, a pull-request generated by tools states what commit of Linus the new work is based on and which branch of what repository needs to be pulled to receive it, but it does not even say what commit Linus should expect to see at the tip of the history; and
  • A pull-request could specify a random Git hosting site that gives out repository to anybody. Unless the security of the site is trustworthy and Linus knows the developer who asks him to pull from uses that repository, a pull from such a location is suspect.
A typical reaction to the first point is "Use signed e-mail", and while it is a technically valid statement, in practice GPG e-mails are pain to use for some people (including Linus).

The second point is rectified in the development version of Git, namely by commit cf73166 (request-pull: state what commit to expect, 2011-09-16), which is still cooking in the next branch. I expect this feature will be in the release after the upcoming 1.7.8.

The third point is currently addressed by Linus demanding his lieutenants to send pull-requests for repositories on trusted hosting site, including (updated) kernel.org.

I have been working on this issue for the past months, toying with a few alternative designs. My current thinking is to teach "git commit" an option to embed GPG signature in the commit object (already implemented and cooking in the next branch, expected to be in the release after the upcoming 1.7.8), add "the tip commit to expect has this object name" in the pull-request e-mail (mentioned earlier), and teach "git fetch" to verify the GPG signature of the tip commit. A typical lieutenant-to-Linus communication would probably look like this:

(Lieutenant)
  • Do his/her work normally.
  • When finishing up the work in his/her tree before the final testing s/he usually does before sending out a pull-request, "git commit [--amend] --gpg-sign" the tip of the history.
  • Push out the history to be pulled.
  • Run "git request-pull" to generate the pull-request message, that states what the tip commit should be, and send it to Linus.
(Linus)
  • Read the pull-request.
  • Run "git pull" from the requested repository, which fetches the history, verifies that the tip commit matches what was in pull-request, and verifies that the commit is signed by the developer.
 (Others)
  • Fetch from Linus. If they are inclined to independently validate what Linus pulled in, they can run "git log --show-signature" to view the tips of histories Linus merged are indeed signed.
This does not require signed pull-requests (a spoofed pull-request may cause Linus to fetch and merge, but the commit to be merged wouldn't be signed correctly so no real harm other than a bit of wasted time is done), and also the repository does not have to be hosted on a trusted site.

2 comments:

Aaron Brooks said...

I've thought that using a git-notes like mechanism to provide post-hoc commit signatures would work well. You'd want to use a different ref stream (and not clog up notes) and have a way of automatically pulling signature refs along with the branch.

Gitster said...

That was an alternative we have discussed on the list. Store GPG signature for the commit ("push certificate") somewhere in notes tree and push that out, certifying that the commit indeed came from the pusher, but that would:

- require upstreams to fetch (and possibly suffer from merge conflicts in notes tree) push certificate whenever they pull from their lieutenants; and

- require downstreams to also fetch the notes tree for "push certificates" (especially when the central repository is shared among multiple people) before adding their own signature and then push it back (and possiblly suffer from "non-fast-forward" in notes tree).

both of which are downsides coming from "notes" being not a very good match for what these signatures are trying to achieve.

The thing is, the "notes" mechanism is designed to keep track of history of changes made to notes attached to commits, but for the signature application, we do not care about the order that signatures came to two separate commits. "Non-fast-forward" conflicts while pushing, or having to fetch and merge before adding one's own signature, are unwanted burden imposed only by choosing to use "notes" for storing and conveying the signature.

Also the "notes" approach would end up mixing "push certificates" for different branches into a single "notes" tree.

We just want "a bag of annotations that are attached to commits that matter". Fetch only signatures that pertain to the commits that are fetched, and no other. Use of signed tags (that auto-follows the commits upon fetching) is one way to achieve that. Storing the signatures in commit objects that matter (i.e. signed commits store the signature for themselves, and mergetag store the signature of the parent commit that is merged) is another way. Signature stored in notes tree does not behave that way, and not appropriate for our purpose.