Wednesday, December 28, 2011

Git 1.7.8.2


The latest maintenance release Git 1.7.8.2 is available.

Again, please note that this is not a release with new features (upcoming 1.7.9 is expected to be released late January next year to include the pulling of signed tags and other goodies).

Please upgrade to get the accumulated bugfixes since 1.7.8.1

Enjoy!

Tuesday, December 27, 2011

Hopefully the last push-out of the year

By now I know a bit better than taking the lack of serious regression reports during the holiday weekend as a sign of perfection of the upcoming release, but I will tag 1.7.9-rc0 soonish anyway. As far as I can see, the tip of 'master' branch is now feature complete for 1.7.9, modulo possible bugs and regressions, of course.

We've done quite a lot during this short cycle leading to 1.7.9 and hopefully we can use the next few weeks squashing any regressions, preparing for a solid release by the end of January 2012 (Knock wood...).

A big Thank-you for all the contributors and testers.




Wednesday, December 21, 2011

Git 1.7.8.1

The latest maintenance release Git 1.7.8.1 is available.  Note that this
is not a release with new features (upcoming 1.7.9 is expected to be
released late January next year to include the pulling of signed tags and
other goodies).

Please upgrade to get the accumulated bugfixes since 1.7.8 release from:

  http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:


198e23e6e50245331590a6159ccdbdbe1792422c  git-1.7.8.1.tar.gz
8f674dba39d9ae78928abfe9d924b0855e283e98  git-htmldocs-1.7.8.1.tar.gz
b49ce0b4da4f85671693c9b2c6f6a8b8ee65c809  git-manpages-1.7.8.1.tar.gz


  • In some codepaths (notably, checkout and merge), the ignore patterns recorded in $GIT_DIR/info/exclude were not honored. They now are.
  • "git apply --check" did not error out when given an empty input without any patch.
  • "git archive" mistakenly allowed remote clients to ask for commits that are not at the tip of any ref.
  • "git checkout" and "git merge" treated in-tree .gitignore and exclude file in $GIT_DIR/info/ directory inconsistently when deciding which untracked files are ignored and expendable.
  • LF-to-CRLF streaming filter used when checking out a large-ish blob fell into an infinite loop with a rare input.
  • The function header pattern for files with "diff=cpp" attribute did not consider "type *funcname(type param1,..." as the beginning of a function.
  • The error message from "git diff" and "git status" when they fail to inspect changes in submodules did not report which submodule they had trouble with.
  • After fetching from a remote that has very long refname, the reporting output could have corrupted by overrunning a static buffer.
  • "git pack-objects" avoids creating cyclic dependencies among deltas when seeing a broken packfile that records the same object in both the deflated form and as a delta.
 Also contains minor fixes and documentation updates.

Tuesday, December 20, 2011

"Pulling signed tag" is already in use in the field

[update: with finishing touches, this will be part of 1.7.9 release]

One of the more important features we wanted to have for the next release of Git (1.7.9) is to support a workflow where a pull request for a signed tag is sent to the integrator and git pull in response to the request automatically verifies the GPG signature embedded in the signed tag. See this for the background.

Earlier, a typical pull request was for a branch name in the publishing repository of a contributor, and worse yet, the default request message created by the git request-pull command did not even mention what commit to expect as the result of a requested pull, which meant that it was unnecessarily hard to make sure what was pulled was genuinely what the contributor had produced for the integrator and for third-party auditors.

This feature has been cooking in the next branch of the Git project for a while, and recently graduated to the master branch to become part of the upcoming release. Linus has been using the in-development verison, and I learned that he made a commit (2240a7b (Merge tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4, 2011-12-14)) that pulls a signed tag in response to a pull request from Ted several days ago already!

It always is satisfying to see that the result of our hard work is used in the field.

The workflow between the contributor (or a lieutenant) and the integrator looks like this:

(contributor)

After preparing the work to be pulled, use "git tag -s" to create a signed tag.

$ git checkout work
$ ... "git pull" from sublieutenants, "git commit" your own work ...
$ git tag -s -m "Completed frotz feature" frotz-for-linus work

And push the tag out to your publishing repository. You do not have to push your work branch or anything else.

$ R=example.com:/git/froboz.git/
$ git push $R frotz-for-linus


Then prepare a pull request message.

$ git request-pull v3.2 $R frotz-for-linus >msg.txt

The arguments are
  1. the version of the integrator's commit you based your work on;
  2. the URL of the repository, to which you have pushed what you want to get pulled; and
  3. the name of the tag you want to get pulled (earlier, you could only write branch names here).
The resulting msg.txt file should begin like so:


The following changes since commit 703f05ad5835cff92b12c29aecf8d724c8c847e2:

  Froboz 3.2 (2011-09-30 14:20:57 -0700)

are available in the git repository at:

  example.com:/git/froboz.git/ frotz-for-linus

for you to fetch changes up to 406da7803217998ff6bf5dc69c55b1613556c2f4:

  Add tests and documentation for frotz (2011-12-02 10:02:52 -0800)

----------------------------------------------------------------

Completed frotz feature

----------------------------------------------------------------
A U Thor (7):
      frotz: do not use --index in the short usage output
      frotz: Add tests and documentation for frotz

      ...

The signed tag message is shown prominently between the dashed lines before the short-log, so you may want to justify why pulling your work is worthwhile when creating the signed tag.

Then open your favorite mailer, read msg.txt in, edit and send it to your integrator.

(integrator)

After seeing such a pull request message, fetch and integrate the tag named in the request.

$ git pull example.com:/git/froboz.git/ frotz-for-linus

It will always open an editor to allow the integrator to fine tune the commit log message for the merge when merging a signed tag. In the editor, the integrator will see something like this:

Merge tag 'frotz-for-linus' of example.com:/git/froboz.git/

Completed frotz feature
# gpg: Signature made Fri 02 Dec 2011 10:03:01 AM PST using RSA key ID 96AFE6CB
# gpg: Good signature from "A U Thor <author@example.com>"



provided if the signature in the signed tag verifies correctly. As usual, the lines commented with # are stripped out. The resulting commit records the contents of the signed tag in a hidden field so that it can be used by others to audit in the future, so the integrator does not have to keep a separate copy of the tag in his repository (i.e. "git tag -l" won't list frotz-for-linus tag in the above example).

After the integrator responds to the pull request and the work becomes part of the permanent history, the contributor can remove the tag from the publishing repository if he or she chooses to in order to keep the tag namespace clean.

(contributor)

$ git push example.com :refs/tags/frotz-for-linus


Sunday, December 18, 2011

Moving forward to 1.7.9

A handful topics that have been cooking on the 'next' branch have now been graduated to the 'master' branch, to be included in the upcoming Git 1.7.9 release. They are all relatively minor fixes and a small feature or two, except for one, which is to stream large files directly to a packfile upon "git add", instead of storing them individually in loose object files and having to later repack them into a single pack, which would doubly be time consuming.

As I wrote earlier, we expect to make Git 1.7.9 a fairly focused release with only small number of new features around high impact themes:
  • Better and more auditable communication in pull based workflow by supporting electronically signed pull requests that records more meaningful branch description;
  • More pleasant end-user experience by providing credential helper API to allow platform native keychain implementations to supply authentication material during "git push" and "git pull";
  • i18n of messages out of the end-user facing programs;
  • Better large-contents support.
The "'git add' that streams to pack" in tonight's pushout is about the last theme; there is another topic regarding the same theme already in development, but it is not expected to be complete during the 1.7.9 timeframe.

Of course, there are other miscelaneous features and fixes already in the 'master' branch and also still in flight and cooking in the 'next' branch, but as far as the big changes are concerned, we are about 80% feature complete as of tonight. The credential API is expected to hit the 'master' branch in a few days.

Epson WorkForce 545

My wife wanted a new printer (as the old HP has been acting up), so gave a visit to a local office electronics shop after running a few product searches.

The printer should support Google Cloud Print (as both my wife and I carry Android phones), and also should be supported by CUPS. I ended up getting Epson WorkForce 545 which is an all-in-one wired/wireless unit, simply because HP does not have a good impression on me anymore, and because I never used Kodak printers.

After a few trial-and-error sessions, it was reasonably easy to figure out how to configure it.

I first tried to connect it via WiFi. One glitch was that there didn't seem to be any way to learn the MAC address of the unit (the WiFi router is configured to talk WPA/WPA2 but also to filter connections based on MAC). I however was happy to see that its panel display offered to print diagnostic after it failed to connect and there was its MAC address printed there. After that, it was easy to configure it to authenticate to the WiFi router.

As the unit will sit immediately next to the router, however, I decided to disable WiFi altogether and give it a wired connection with fixed address.

After connecting to the network, interestingly, it was much easier to configure the unit to work with the Google Cloud Print than with CUPS.

A newer Windows box of my wife (I think it runs something called Windows 7) found the printer without me doing anything in particular; just being on the same network segment seemed to be enough, and then the Windows box installed the printer drivers itself.
 
Visiting the IP address I gave to the unit with the web browser, there were a handful of controls, and the top one was to make it work with Google Cloud Print. It just redirected the browser to google.com for OAuth and I had the printer associated with my Gmail account. From there, I can share the access to the printer with my wife's Gmail account and with my work account.

As I do not print much (and nothing at home), this was the first time I added a network printer to CUPS. After blindly trying random URLs like http://ip-address-of-unit:631, ipp://ip-address-of-unit/, etc., finally figured out that this particular model (or perhaps recent Epson in this class in general) wants to be connected with socket://ip-address-of-unit as its URL, but I did not see this documented anywhere.

An older Windows box my wife uses to control her computerized weaving loom (I think it runs Windows XP) was a different issue. It didn't see and did not want to connect to the wired printer, even though it could see my Linux box that is running samba. Adding an entry for it in /etc/samba/smb.conf was a simple task after figuring out what needs to be done (which unfortunately took too long for my liking). In the printer's section I needed to add use client driver = yes for it to work.

By the way, at the office electronics shop, I saw the new models of Kindle family (including the Fire), Nook color and the Nook tablet. Somehow Fire looked too thick and unwieldy to me, while Nook tablet looked slim and very nice. I didn't buy neither, though...

Friday, December 9, 2011

The Disappearing Spoon: And Other True Tales of... by Sam Kean



Every once in a while, I receive gifts from satisfied Git friends, chosen from my Amazon Wish list.

And today was such a day.

I enjoy reading history of science.  Thanks Miro!

Friday, December 2, 2011

Git 1.7.8

Git 1.7.8 is finally out.

The most notable improvement from my point of view is tightening of the checks on the data that go and come over the wire between repositories, but it probably is and should be invisible to the end users. Many other small improvements are all over the place.

The release tarballs are found at:

    http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:

7453e737e008f7319a5eca24a9ef3c5fb1f13398  git-1.7.8.tar.gz
2734079e22a0a6e3e78779582be9138ffc7de6f7  git-htmldocs-1.7.8.tar.gz
93315f7f51d7f27d3e421c9b0d64afa27f3d16df  git-manpages-1.7.8.tar.gz

Have fun.

The development cycle for the next release is expected to last for 8 weeks til the end of January, and some of the major planned topics are listed here.

Wednesday, November 30, 2011

Buying a new Git feature

You are a manager of a technology company, and your engineers love Git in general, but Git is not a perfect fit to your organization. Perhaps some work-flow elements your people are used to are not supported nicely by today's Git. Perhaps some class of assets you want to keep track of are not supported well by today's Git.

You are wealthy enough to pay for a developer or two to identify, design and implement necessary changes to Git, but you are not wealthy enough fork Git to maintain such a change yourself forever while the upstream Open Source community continues to improve Git.

What can you do?

Of course, if the changes you initially develop are good enough, they may be merged to the upstream and then you do not have to worry about maintaining your fork yourself. But how would you ensure that the quality of your changes is good enough for upstreaming? Perhaps withhold the payment to your consultants until the changes hit the upstream?


I do not think this is necessarily limited to Git, but applies equally to any useful and active Open Source project.

Monday, November 28, 2011

Git 1.7.8-rc4 and upcoming cycle

This cycle is taking a bit longer than I had hoped but this should be the last rc before the final.

We had to roll in a fix to the UI for a new feature added in this cycle to "revert/cherry-pick" to avoid costly migration in the later releases (originally, we introduced "--reset" action to discard the in-progress state of a multi-commit revert/cherry-pick sequence, but it was argued that what the action actually did was "--quit" in the sense that it does not reset the state to some known state. Renaming it to "--quit" further opened the door to introduce "--abort" which does revert the state to where the entire revert/cherry-pick sequence started).

The next cycle will have many interesting topics that are already cooking in various doneness, including:
  • local branch description (in addition to a good discipline of giving descriptive branch names) that can be used in various places including pull request messages, local merge messages and format-patch cover letters;
  • electronically signed pull requests by asking to pull a signed tag instead of a branch;
  • signed commit (possibly-if it is found useful, that is);
  • credential helper API to integrate with platform native keychain implementations;
  • progress eye-candy for fsck and repack;
  • "git add" of large contents will send blobs directly to a packfile;
  • side-by-side diff in gitweb; and
  • i18n of Git Porcelain messages.

Thursday, November 24, 2011

PGP Key-signing and CA fire-and-forget

Someone at work used to have a kernel.org account but recently needed to re-establish the presense in the web of trust by getting his PGP key signed, so we met and exchanged our key IDs and fingerprints, to mutually sign our keys. I earlier attended a key-signing party and my key has been signed by many other people, and it was a good place to bootstrap his key.

A PGP key has two parts; the public part that you give to others, and the private part that you keep to yourself. The easiest and most common way to distribute public part of the key is to upload it to public keyservers, where other people can find and retrieve your key by specifying the key ID, your name or e-mail address.

When other people want to send a message to you and preserve the secrecy of the message, they only need to use the public part of your key to encrypt the message for you, and PGP guarantees that the encrypted message can be decrypted and read only by whoever holds the corresponding private part of the key unless a complex math problem that is believed to be practically unsolvable can somehow be solved (in other words, "public crypto-system gets broken"). When you want to prove that a message was written by you, you use the private part of your key to electronically sign the message and make the result public, others can check the authenticity of the electronic signature by using only the public part of your key, and again PGP guarantees that the message couldn't have been signed by a person who does not have the private part of the key.

The public part of your PGP key records your name and e-mail address, among other things. It can and often does record more than one pair of name and e-mail (e.g. work address vs personal address). Anybody can generate a PGP key on his or her own and record any name and e-mail address in its public part. If you see a message signed with a PGP key whose public part records my name and address, unless you somehow know that it indeed is the key I created for me and whose private part I have, such a signature has no value. It may have been created by a random person inpersonating me.
If you encrypt a message you want to show only to me, using a random PGP key that records my name and e-mail address to encrypt it would not guarantee that only I would be able to read it, unless you somehow know that the key belongs to me.

Hence, people need a way to validate the authenticity of public keys. People can add electronic signatures to the public part of a PGP key that belongs to another person, vouching that the signer knows the key belongs to the signee. This signature can be made per name and e-mail pair recorded in the public part of the key.

If you see a signature on an unknown public key, signed by public keys that you know belong to people you trust, you can be as sure as you trust these signers that the unknown publiic key belongs to the person it claims to belong to. This "web of trust" extends recursively and I heard that a recent study indicates that all people in the world are connected by 4.74 hops on average.

The only facts I learned when I met the other person for the purpose of key-signing are:

  • The person looked like his photo in our employee directory, and possessed a photo ID that matches his name;
  • The achievements by the person described in our employee directory matched what the person I was supposed to be meeting who worked in the Linux kernel project had done; and
  • The person claimed that a public key belonged to him, and gave me a way to retrieve the public part of this key.
It is not directly the above that I am vouching for by signing the public part of his key, however. I am vouching for the fact that I somehow know that the public key belongs to the person who is in control of the name and e-mail address pair recorded therein. That is not something I checked by meeting the person and chatting with him. I only checked the "name" part, but not the "e-mail address" part.

CA fire-and-forget is a clever scheme to solve this last bit of the problem. Instead of signing the public part of the key for all the name and e-mail pairs and upload the result myself, I make N separate signatures on his public key, one for each pair of name and e-mail address recorded in it. And then I encrypt these N signatures with his public key and send them to the corresponding e-mail addresses. The recipient of these encrypted signatures then decrypt them and upload the result to the public keyservers to complete the cycle.

If the e-mail address belonged to somebody else who does not have the corresponding private part, the encrypted signature would not reach the intended recipient, and the signature would not be decrypted to be uploaded to the public keyservers. I'll see my signature only if the person sitting behind the e-mail address has the private key that corresponds to the public part I have signed.

It is a clever scheme, even though it is a bit cumbersome to use, even with the use of dedicated tools (caff found in signing-party package on some distributions).

Friday, November 18, 2011

Git 1.7.7.4

Just another maintenance update, this time to fix minor build issues and fix a trivial corner case bug in the git name-rev --all command.

The upcoming feature release 1.7.8 is getting closer, too.

Thursday, November 17, 2011

Git 1.7.8-rc3 and being lenient to others while being strict to self

Hopefully the last release candidate before the real thing.

A big "Thanks" goes to Andrea Arcangeli for reporting an unpleasant regression, me for quickly fixing, and Michael Haggerty for reviewing the proposed fix.


The regression that will not be in the final release was that we broke

  $ git clone --reference=$local_repository $upstream


when the local repository we are borrowing objects from has signed or annotated tags, and the cause of this regression is that a recent topic screwed up implementation that tightens checks for branch and tag (collectively known as "refs") names. When we clone from $upstream while borrowing objects from a $local_repository, we tell the $upstream that objects that are in the $local_repository need not be sent to us, and we discover what objects $local_repository has by reading the output of


  $ git ls-remote $local_repository


and adding the result to the set of "extra refs". We internally keeps track of all the "refs" that exist in our repository, and the code that registers the extra refs share the same codepath as the one that finds the branches and tags by reading from .git/refs/{heads,tags} directories. The problem was that the add_ref() function in this shared codepath had a check to error not (not just warn) when it tries to register any "ref" whose name does not conform to the rule. Because an entry for a signed or an annotated tag in the output from ls-remote denotes the object (typically a commit) the tag points at, and because such an entry is marked by adding ^{} at the end of the name of the tag to make sure it will not collide with names of the real refs (that character sequence is invalid), the new check triggered and made the whole clone command fail.


This episode shows two fundamental failures in the topic:
  • "extra refs" are not real refs, and they shouldn't even need names. The only reason they exist is to let our repository know the objects reachable from them do not need to be transfered into our repository when talking with the outside world. Perhaps we should even consider dropping the name parameter from add_extra_ref() function (but after making sure the code would not make unwarranted assumptions. One such assumption was that they have names and their name must conform to the usual refname rules, which was fixed, but there may be others).
  • The other use of add_ref() function is used to register existing refs that we find in our repository. While we might not like the name of some of them (nobody stops a user or a tool from creating a randomly named file under .git/refs/{heads,tags} directories after all), it is wrong to error out any operation when talking about what already exists in the repository; the damage is already done. Warning against them to help the user notice and correct is a different story.
The code should be lenient to what it receives and strict in what it produces.


For example, a colon is a forbidden character in a branch name, primarily because a branch with such a name, e.g. a:b, cannot be pushed out to another repository. But if you do not ever push such a branch out, it is not that unreasonable to expect that the following to work, at least for some definition of working:


  $ H=$(git rev-parse HEAD~20) && echo $H >.git/refs/heads/inval:id
  $ git show inval:id


It may be OK for the second line to error out (we cannot do much about the manual echo doing damage to the repository), but where there is no ambiguity (i.e. if there is a ref that is called inval, the above could be a request to show a subdirectory called id in that commit), warning that inval:id is a wrong name but still letting the user what s/he wanted to do would be a far nicer way to deal with a problem like this. After the above sequence, if the following fails only because the repository has a ref with an invalid name, it is even worse:


  $ git show master

and I would have to say it is close to inexcusable.

Sunday, November 13, 2011

Git 1.7.8-rc2 and the road forward

The second release candidate, that is not much different from the first one, is out.

The reason why this is not tagged as the 1.7.8 final is because we want to make sure there is no regression since 1.7.7, and the reason why this is not so different from the first one is because no such regression has been found that needs fixing, which is a good thing.

I have been working on a handful of topics for the development cycle after 1.7.8, and these topics all share the same theme: giving better ways to users so that they can assure themselves that their patches that flow over the public channel are not tampered with, and also helping them communicate more clearly among themselves in general.
  • A new change originates from a contributor, who has a theme in mind to achieve a specific goal. There is a new feature in the branch command that allows a descriptive text to be added to a topic branch and this facility can be used to record and update that "theme/goal" when starting to work on the topic and while polishing it.
  • The contributor, after perfecting the topic, would request the resulting history to be pulled by the integrator. This pull-request traditionally gave only the list of commits on the topic and did not encourage the contributor to clearly describe what the topic was about. The branch description will be copied to the resulting message in the updated version of request-pull command.
  • The integrator will receive the pull request in e-mail, but typically PGP signed e-mails are hard to use. The updated version of request-pull command does not use PGP signing on pull-request e-mails, either, but the contributor can ask the integrator to pull a signed tag, instead of the tip of a branch, using the updated pull command.
  • When the integrator records the result of a pull request, traditionally the command did not open editor to encourage the integrator to describe the merge. The updated version of merge does this when responding to a request to merge a signed tag, and shows the result of PGP verification of the tag in the comment to help the integrator.
  • In addition, the contributor can optionally add PGP signature to individual commit with the updated commit command.
So far, things are looking reasonably good for these topics.

Wednesday, November 9, 2011

Sleepless

Somehow I couldn't sleep (no, I am not insomniac) and ended up rising way too early at 4:30 which is too late to go back to sleep.

Which turned out to be a rather productive quality 2-hour Git-morning. A few patches sent, and a few reviews made.

I may not be insomniac, but I sometimes wonder if I am a bit workaholic.

Tuesday, November 8, 2011

Helping the kernel workflow redux

[edit: this is now used in the wild]

 The goal is still to give the kernel developers and its users a better way to validate the authenticity of changes that eventually land on Linus's tree.

The "signed commit" mechanism discussed in a previous post may be useful in some workflows, but not necessarily so in an environment where you would push a commit out, and then decide that the commit is worth including in the upstream history after a long while. If you forgot to sign the commit when pushed it out but otherwise the commit is in good shape, it feels a bit dirty that you would have to either amend it or cap it with a signed empty commit.

The latest round after a lengthy discussion across three mailing lists is to allow the integrator to run "git pull" against a signed tag, e.g.

$ git pull git://.../rusty.git/ rusty-for-linus

When 'rusty-for-linus' is a tag, the above syntax does not work with the current git (and it won't change in the upcoming 1.7.8 as we are deep in the pre-release feature freeze period), but you can instead say 'tags/rusty-for-linus' to do the same thing.

When recording the merge result of such a pull that names a tag, Git will open an editor and ask the integrator to give a merge commit message. So far, 'git merge' never asked for commit log message to be edited, and histories of many projects, especially when 'merge.log' configuration variable is not enabled, are littered with one-liner messages, such as "Merge from origin" that does not tell anything useful - why was this merge made, what changes were brought in, etc. That is going to change as well, as a side effect of this topic.

The integrator will see the following in the editor when recording such a merge:
  • The one-liner merge title (e.g 'Merge tag rusty-for-linus of git://.../rusty.git/');
  • The message in the tag object (either annotated or signed). This is where the contributor tells the integrator what the purpose of the work contained in the history is, and helps the integrator describe the merge better;
  • The output of GPG verification of the signed tag object being merged. This is primarily to help the integrator validate the tag before he or she concludes the pull by making a commit, and is prefixed by '#', so that it will be stripped away when the message is actually recorded; and
  • The usual "merge summary log", if 'merge.log' is enabled.
The contents of the signed tag is also recorded in the header field of the resulting commit object, so that anybody can later retrieve it from the history and validate the signature. The signed tag that was pulled is not stored in the integrator's repository, nor pushed out to the integrator's publishing point.

The primary reason the new mechanism records this information inside the commit instead of leaving the tag around is for convenience. Recent kernel history contains about 400 merges by Linus within 3 months (4 to 5 pulls per day), and that counts only the pulls by Linus. To make the whole merge fabric more trustworthy, the integration made by his lieutenants by pulling from their sub-lieutenants need to be made verifyable the same way, which would (1) make the number of signed tags even larger and (2) make it more likely somebody in the foodchain gets lazy and refuses to push out the signed tags after he or she used them for their own verification.


Git 1.7.7.3

Yet another minor update.

Arguably, the most important fix since 1.7.7.2 is that this one actually identifies itself as 1.7.7.3 (1.7.7.2 release still called itself 1.7.7.1 by mistake).

Monday, November 7, 2011

Git 1.7.8-rc1

The first release candidate for the upcoming release is out. Because there won't be any more new feature merged until the 1.7.8 final, it is a good time for the coolest kids on the block to start using the upcoming release before others do.

The release tarballs are found at:

  http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:

f35e5c4410b21710434cb591f4c89843e75bb793 git-1.7.8.rc1.tar.gz 72e27cd397f5ae7b3c9d8bb030a76d7c99cdbb50 git-htmldocs-1.7.8.rc1.tar.gz 95429858e879df3f9425cf1279e03cdec7832379 git-manpages-1.7.8.rc1.tar.gz

Also the following public repositories all have a copy of the v1.7.8.rc1 tag and the master branch that the tag points at:


  url = git://repo.or.cz/alt-git.git
  url = https://code.google.com/p/git-core/
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git

Tuesday, November 1, 2011

Git 1.7.7.2

This is just the result of applying fixes that are already applied to the 'master' branch for upcoming 1.7.8 release. Nothing earth-shattering, which is the whole point of the maintenance series ;-).


Helping the kernel workflow

[edit: there is an update here]

As many people may have already heard, the kernel developers would want to have a better way to validate the authenticity of changes that eventually go into Linus's tree. An e-mailed pull request asking Linus to pull from a public repository has three weak points:

  • The sender of e-mails can easily be spoofed;
  • Traditionally, a pull-request generated by tools states what commit of Linus the new work is based on and which branch of what repository needs to be pulled to receive it, but it does not even say what commit Linus should expect to see at the tip of the history; and
  • A pull-request could specify a random Git hosting site that gives out repository to anybody. Unless the security of the site is trustworthy and Linus knows the developer who asks him to pull from uses that repository, a pull from such a location is suspect.
A typical reaction to the first point is "Use signed e-mail", and while it is a technically valid statement, in practice GPG e-mails are pain to use for some people (including Linus).

The second point is rectified in the development version of Git, namely by commit cf73166 (request-pull: state what commit to expect, 2011-09-16), which is still cooking in the next branch. I expect this feature will be in the release after the upcoming 1.7.8.

The third point is currently addressed by Linus demanding his lieutenants to send pull-requests for repositories on trusted hosting site, including (updated) kernel.org.

I have been working on this issue for the past months, toying with a few alternative designs. My current thinking is to teach "git commit" an option to embed GPG signature in the commit object (already implemented and cooking in the next branch, expected to be in the release after the upcoming 1.7.8), add "the tip commit to expect has this object name" in the pull-request e-mail (mentioned earlier), and teach "git fetch" to verify the GPG signature of the tip commit. A typical lieutenant-to-Linus communication would probably look like this:

(Lieutenant)
  • Do his/her work normally.
  • When finishing up the work in his/her tree before the final testing s/he usually does before sending out a pull-request, "git commit [--amend] --gpg-sign" the tip of the history.
  • Push out the history to be pulled.
  • Run "git request-pull" to generate the pull-request message, that states what the tip commit should be, and send it to Linus.
(Linus)
  • Read the pull-request.
  • Run "git pull" from the requested repository, which fetches the history, verifies that the tip commit matches what was in pull-request, and verifies that the commit is signed by the developer.
 (Others)
  • Fetch from Linus. If they are inclined to independently validate what Linus pulled in, they can run "git log --show-signature" to view the tips of histories Linus merged are indeed signed.
This does not require signed pull-requests (a spoofed pull-request may cause Linus to fetch and merge, but the commit to be merged wouldn't be signed correctly so no real harm other than a bit of wasted time is done), and also the repository does not have to be hosted on a trusted site.

Sunday, October 30, 2011

Git 1.7.8-rc0

I just tagged 1.7.8-rc0 so that we can have something reasonable by late November before many in the US will stop working and start stuffing themselves. There are a few topics that I would further merge down before -rc1 but from the point of view of new features, this should be pretty much it for the upcoming release. Please test thoroughly to hunt for regressions.


Sunday, October 23, 2011

Git 1.7.7.1


The latest maintenance release Git 1.7.7.1 is available.


The release tarballs are found at:


    http://code.google.com/p/git-core/downloads/list


and their SHA-1 checksums are:


9200e0b8ee543d297952b78aac8f61f8b3693f8e  git-1.7.7.1.tar.gz
b25dacb07ebbfc37e7a90c3d47f76b4c0f0487d9  git-htmldocs-1.7.7.1.tar.gz
419c750617ae0c952e2e43f0357c16de6ebc0a44  git-manpages-1.7.7.1.tar.gz


Also the following public repositories all have a copy of the v1.7.7.1
tag and the maint branch that the tag points at:


  url = git://repo.or.cz/alt-git.git
  url = https://code.google.com/p/git-core/
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git

Have fun...

Tuesday, October 18, 2011

Git calendar redrawn

Updated Git calendar for the current development cycle is available here. Partly because the previous cycle took longer than was planned due to the outage of kernel.org, we already had many well cooked topics held back in the 'next' branch when we released 1.7.7, and this cycle is progressing in a rather rapid pace compared to the previous cycles.


Sunday, October 16, 2011

Final Jeopardy by Stephen Baker

Finished reading Final Jeopardy, covering the popular game show match between IBM's Watson and human champions. The pace of the book was pleasant; not too slow to be boring, not too fast to be sketchy. I do not regularly watch television, but I recall people gathering in front of the large TV in our building in one afternoon watching it.

The author excellently described in easy terms why this "question answering" was a harder problem than just finding documents that contain words with search engines. The machine needs to understand (or at least "pretend as if it understands") synonyms and concepts to a certain degree to give plausible answer to clues expressed in human language.

I however found the "search engines are dumb, machine needs to go one level higher" a somewhat antiquated notion, after recently seeing results from Google and Bing for "cartoon about pc and mac users updating software", "movie in which scientists go to brain in submarine" and such.

Tuesday, October 4, 2011

Beginning 1.7.8 cycle

Now that 1.7.7 release was made, I started looking at the topic branches that have been cooking (and some have been stagnating), to get a better feel of what the next release would look like.

One of the major focus would be robustness and security. Partly because the 1.7.7 cycle overlapped with the much publicised k.org break-in, there have been a lot of discussions, both on and off git@vger.kernel.org mailing list, to offer our users better tools to leave audit trails and help them be more confident about the objects and histories they exchange over the wire.
Some randomly selected topics, either already implemented or still under discussion & consideration, are:
  • Teach "git fetch" and "git push" (the object and history transfer over the wire) to validate the objects transferred from the other side of the network more thoroughly while storing them in the local object store before updating the local history pointer. "git push" already had a support for this (receive.fsckobjects) to protect the server side, but the same check will be supported for "git fetch" to give better assurance to the general public;
  • In addition, teach "git fetch" and "git push" to make sure that the set of objects received from the other side of the network is actually consistent with the history the other end claims to be transferring;
  • Signed push, where the server can require the history being uploaded to be cryptographically signed by the developer's public key;
  • Signed commit, where the developer can cryptographically sign a commit without using a separate signed tag.
As usual, I am sure there will be ideas from different contributors during the development cycle toward 1.7.8, and some of them will be part of 1.7.8 and others may have to wait until the next cycle.

One unrelated area that I would like to see more development is to support "floating" submodules, for which the commit object name recorded in the superproject tree takes lower precedence than the actual branch state of submodules, so that the top level superproject can say "module M must check out the latest and greatest of its B branch". This goes quite against the distributed nature of Git, where "latest and greatest" for a given branch depends on which repository you are talking about, but in a project that uses a central shared repository workflow, it makes sort of sense.

A possible implementation would be to record that branch B in the submodule M should be checked out in .gitmodules of the superproject, and "git submodule update M" would check out the local branch "B" (which must integrate with remotes/origin/B), if exists, instead of what is recorded at path M in the superproject tree. Some codepaths e.g. "git status", "git diff", that are run in the superproject currently assume that they always have to compare .git/HEAD in the submodule M with what is in the superproject tree at M, and need to be updated to compare remotes/origin/B and heads/B in submodule M for such a submodule.

By the way, I'll likely to change the repository signing key in the near future. The current key in use is:

pub   1024D/F3119B9A 2004-01-28
      Key fingerprint = 3565 2A26 2040 E066 C9A7  4A7D C0C6 D9A4 F311 9B9A
uid                  Junio C Hamano <gitster@pobox.com>


and I do not have any reason to believe the key might have been compromised (it never left my home machine), I've updated along with other k.org users. The new GPG key will be:

pub   4096R/713660A7 2011-10-01
      Key fingerprint = 96E0 7AF2 5771 9559 80DA  D100 20D0 4E5A 7136 60A7
uid                  Junio C Hamano <gitster@pobox.com>


You can obtain both of them at http://pgp.mit.edu/ and other quality keyservers.

Friday, September 30, 2011

Git 1.7.7

The latest feature release Git 1.7.7 is available.

The release tarballs are found at:

    http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:

bbf85bd767ca6b7e9caa1489bb4ba7ec64e0ab35  git-1.7.7.tar.gz
33183db94fd25e001bd8a9fd6696b992f61e28d8  git-htmldocs-1.7.7.tar.gz
75d3cceb46f7a46eeb825033dff76af5eb5ea3d9  git-manpages-1.7.7.tar.gz

Also the following public repositories all have a copy of the v1.7.7 tag and the master branch that the tag points at:

  url = git://repo.or.cz/alt-git.git
  url = https://code.google.com/p/git-core/
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git


The release tag and the tarballs can be verified with my GPG key if anybody is so inclined. To get my public key:

  $ git fetch git://repo.or.cz/alt-git.git refs/tags/junio-gpg-pub
  $ git rev-parse FETCH_HEAD
  680865b90b18efbc9402ea979adf0302c6dfe72e
  $ git cat-file blob FETCH_HEAD | gpg --import

and then make sure that you got my key by checking the output from "gpg --fingerprint", which should contain these lines:

  pub   1024D/F3119B9A 2004-01-28
        Key fingerprint = 3565 2A26 2040 E066 C9A7  4A7D C0C6 D9A4 F311 9B9A
  uid                  Junio C Hamano <gitster@pobox.com>

The tarball description at http://code.google.com/p/git-core/downloads/list contains the same list of SHA-1 checksums shown above and signed with my GPG key.

Have fun.

Git v1.7.7 Release Notes
========================

Updates since v1.7.6
--------------------

 * The scripting part of the codebase is getting prepared for i18n/l10n.

 * Interix, Cygwin and Minix ports got updated.

 * Various updates to git-p4 (in contrib/), fast-import, and git-svn.

 * Gitweb learned to read from /etc/gitweb-common.conf when it exists,
   before reading from gitweb_config.perl or from /etc/gitweb.conf
   (this last one is read only when per-repository gitweb_config.perl
   does not exist).

 * Various codepaths that invoked zlib deflate/inflate assumed that these
   functions can compress or uncompress more than 4GB data in one call on
   platforms with 64-bit long, which has been corrected.

 * Git now recognizes loose objects written by other implementations that
   use a non-standard window size for zlib deflation (e.g. Agit running on
   Android with 4kb window). We used to reject anything that was not
   deflated with 32kb window.

 * Interaction between the use of pager and coloring of the output has
   been improved, especially when a command that is not built-in was
   involved.

 * "git am" learned to pass the "--exclude=<path>" option through to underlying
   "git apply".

 * You can now feed many empty lines before feeding an mbox file to
   "git am".

 * "git archive" can be told to pass the output to gzip compression and
   produce "archive.tar.gz".

 * "git bisect" can be used in a bare repository (provided that the test
   you perform per each iteration does not need a working tree, of
   course).

 * The length of abbreviated object names in "git branch -v" output
   now honors the core.abbrev configuration variable.

 * "git check-attr" can take relative paths from the command line.

 * "git check-attr" learned an "--all" option to list the attributes for a
   given path.

 * "git checkout" (both the code to update the files upon checking out a
   different branch and the code to checkout a specific set of files) learned
   to stream the data from object store when possible, without having to
   read the entire contents of a file into memory first. An earlier round
   of this code that is not in any released version had a large leak but
   now it has been plugged.

 * "git clone" can now take a "--config key=value" option to set the
   repository configuration options that affect the initial checkout.

 * "git commit <paths>..." now lets you feed relative pathspecs that
   refer to outside your current subdirectory.

 * "git diff --stat" learned a --stat-count option to limit the output of
   a diffstat report.

 * "git diff" learned a "--histogram" option to use a different diff
   generation machinery stolen from jgit, which might give better
   performance.

 * "git diff" had a weird worst case behaviour that can be triggered
   when comparing files with potentially many places that could match.

 * "git fetch", "git push" and friends no longer show connection
   errors for addresses that couldn't be connected to when at least one
   address succeeds (this is arguably a regression but a deliberate
   one).

 * "git grep" learned "--break" and "--heading" options, to let users mimic
   the output format of "ack".

 * "git grep" learned a "-W" option that shows wider context using the same
   logic used by "git diff" to determine the hunk header.

 * Invoking the low-level "git http-fetch" without "-a" option (which
   git itself never did---normal users should not have to worry about
   this) is now deprecated.

 * The "--decorate" option to "git log" and its family learned to
   highlight grafted and replaced commits.

 * "git rebase master topci" no longer spews usage hints after giving
   the "fatal: no such branch: topci" error message.

 * The recursive merge strategy implementation got a fairly large
   fix for many corner cases that may rarely happen in real world
   projects (it has been verified that none of the 16000+ merges in
   the Linux kernel history back to v2.6.12 is affected with the
   corner case bugs this update fixes).

 * "git stash" learned an "--include-untracked option".

 * "git submodule update" used to stop at the first error updating a
   submodule; it now goes on to update other submodules that can be
   updated, and reports the ones with errors at the end.

 * "git push" can be told with the "--recurse-submodules=check" option to
   refuse pushing of the supermodule, if any of its submodules'
   commits hasn't been pushed out to their remotes.

 * "git upload-pack" and "git receive-pack" learned to pretend that only a
   subset of the refs exist in a repository. This may help a site to
   put many tiny repositories into one repository (this would not be
   useful for larger repositories as repacking would be problematic).

 * "git verify-pack" has been rewritten to use the "index-pack" machinery
   that is more efficient in reading objects in packfiles.

 * test scripts for gitweb tried to run even when CGI-related perl modules
   are not installed; they now exit early when the latter are unavailable.

Also contains various documentation updates and minor miscellaneous
changes.


Fixes since v1.7.6
------------------

Unless otherwise noted, all fixes in the 1.7.6.X maintenance track are
included in this release.

 * "git branch -m" and "git checkout -b" incorrectly allowed the tip
   of the branch that is currently checked out updated.

Friday, September 23, 2011

Git 1.7.7-rc3 and 1.7.6.4 are out

I gave up waiting on kernel.org and really wanted to do the 1.7.7 final toward this weekend, but found a corner case regression in the recursive merge backend (again). 1.7.7-rc3 contains a quick fix for it, and needs to be cooked for a few days at least, so the final will have to wait at least until mid next week.

1.7.6.4 merges a handful of fixes that have already been merged to the master branch and are included in 1.7.7-rc3.

They are both available at

        http://code.google.com/p/git-core/downloads/list

and also are found in my public repositories at various hosting sites:

        url = git://repo.or.cz/alt-git.git
        url = https://code.google.com/p/git-core/
        url = git://git.sourceforge.jp/gitroot/git-core/git.git
        url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
        url = https://github.com/gitster/git

Wednesday, September 21, 2011

Return of html/man branches

I have not built and published these two branches myself for quite some time, as I let the post update hook of my public repository automatically fetch updates from the 'master' branch to another work area, format the documentation, make commits and push the result back to the public repository, all automated at the kernel.org infrastructure.

Unfortunately, kernel.org has been down for quite a while. So as a "time-permits" and "experimental" basis, I've started publishing these two branches after formatting the documentation locally on my primary development machine to the other public repositories.

I won't promise that I'll rebuild the docs every time I push the 'master' branch out, like the system I used to have at kernel.org did for me, but this will have to do in the meantime.

Monday, September 19, 2011

Fun with url.$that.insteadOf

It is not exactly fun that I and others have to do this, but while k.org is down:

$ git -c url.http://code.google.com/p/git-core.insteadof=git://git.kernel.org/pub/scm/git/git.git fetch

lets me reuse all the fetch refspec configuration items I have for the "origin" remote.

To make this semi-permanent, I could have the following in my $HOME/.gitconfig but I have been hoping that k.org would come back before I actually do it and haven't done so:

[url "http://code.google.com/p/git-core"]
  insteadOf = git://git.kernel.org/pub/scm/git/git.git

Not exactly fun, but...

Monday, September 12, 2011

Git User's Survey 2011

Here is from Jakub Narebski who is again running "Git User's Survey" this year for us.


The Git User's Survey 2011 is now up! Please devote a few minutes of
your time to fill out the simple questionnaire; it'll help the Git
community understand your needs, what you like about Git (and what you
don't), and overall help us improve it.

The survey will be open from 5 September till 3 October 2011.

The results will be published at GitSurvey2011 page on Git Wiki.

1.7.7-rc1 is out

As I have been postponing of tagging this feature freeze release candidate for a while, hoping that k.org would regain its health soon enough, we are seriously slipping. I was hoping to tag the final by the end of August, but now it is almost mid September.

In any case, the shape of the upcoming release is fairly clear now. Please help finding regressions so that we can have a solid final release by the end of this month if not earlier.


A release candidate tarball is found at:

   http://code.google.com/p/git-core/downloads/list

and its SHA-1 checksum is:

80dfcce410d2f36ffed4c8b48c8c89
6a45159e41  git-1.7.7.rc1.tar.gz

Also the following public repositories all have a copy of the v1.7.7-rc1
tag and the master branch that the tag points at:

       url = git://repo.or.cz/alt-git.git
       url = https://code.google.com/p/git-core/
       url = git://git.sourceforge.jp/gitroot/git-core/git.git
       url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
       url = https://github.com/gitster/git

Git v1.7.7 Release Notes (draft)
========================

Updates since v1.7.6

--------------------
 * The scripting part of the codebase is getting prepared for i18n/l10n.
 * Interix, Cygwin and Minix ports got updated.
 * Various updates to git-p4 (in contrib/), fast-import, and git-svn.
 * Gitweb learned to read from /etc/gitweb-common.conf when it exists,
  before reading from gitweb_config.perl or from /etc/gitweb.conf
  (this last one is read only when per-repository gitweb_config.perl
  does not exist).
 * Various codepaths that invoked zlib deflate/inflate assumed that these
  functions can compress or uncompress more than 4GB data in one call on
  platforms with 64-bit long, which has been corrected.
 * Git now recognizes loose objects written by other implementations that
  use a non-standard window size for zlib deflation (e.g. Agit running on
  Android with 4kb window). We used to reject anything that was not
  deflated with 32kb window.
 * Interaction between the use of pager and coloring of the output has
  been improved, especially when a command that is not built-in was
  involved.
 * "git am" learned to pass the "--exclude=<path>" option through to underlying
  "git apply".
 * You can now feed many empty lines before feeding an mbox file to
  "git am".
 * "git archive" can be told to pass the output to gzip compression and
  produce "archive.tar.gz".
 * "git bisect" can be used in a bare repository (provided that the test
  you perform per each iteration does not need a working tree, of
  course).
 * The length of abbreviated object names in "git branch -v" output
  now honors the core.abbrev configuration variable.
 * "git check-attr" can take relative paths from the command line.
 * "git check-attr" learned an "--all" option to list the attributes for a
  given path.
 * "git checkout" (both the code to update the files upon checking out a
  different branch and the code to checkout a specific set of files) learned
  to stream the data from object store when possible, without having to
  read the entire contents of a file into memory first. An earlier round
  of this code that is not in any released version had a large leak but
  now it has been plugged.
 * "git clone" can now take a "--config key=value" option to set the
  repository configuration options that affect the initial checkout.
 * "git commit <paths>..." now lets you feed relative pathspecs that
  refer to outside your current subdirectory.
 * "git diff --stat" learned a --stat-count option to limit the output of
  a diffstat report.
 * "git diff" learned a "--histogram" option to use a different diff
  generation machinery stolen from jgit, which might give better
  performance.
 * "git diff" had a weird worst case behaviour that can be triggered
  when comparing files with potentially many places that could match.
 * "git fetch", "git push" and friends no longer show connection
  errors for addresses that couldn't be connected to when at least one
  address succeeds (this is arguably a regression but a deliberate
  one).
 * "git grep" learned "--break" and "--heading" options, to let users mimic
  the output format of "ack".
 * "git grep" learned a "-W" option that shows wider context using the same
  logic used by "git diff" to determine the hunk header.
 * Invoking the low-level "git http-fetch" without "-a" option (which
  git itself never did---normal users should not have to worry about
  this) is now deprecated.
 * The "--decorate" option to "git log" and its family learned to
  highlight grafted and replaced commits.
 * "git rebase master topci" no longer spews usage hints after giving
  the "fatal: no such branch: topci" error message.
 * The recursive merge strategy implementation got a fairly large
  fix for many corner cases that may rarely happen in real world
  projects (it has been verified that none of the 16000+ merges in
  the Linux kernel history back to v2.6.12 is affected with the
  corner case bugs this update fixes).
 * "git stash" learned an "--include-untracked option".
 * "git submodule update" used to stop at the first error updating a
  submodule; it now goes on to update other submodules that can be
  updated, and reports the ones with errors at the end.
 * "git push" can be told with the "--recurse-submodules=check" option to
  refuse pushing of the supermodule, if any of its submodules'
  commits hasn't been pushed out to their remotes.
 * "git upload-pack" and "git receive-pack" learned to pretend that only a
  subset of the refs exist in a repository. This may help a site to
  put many tiny repositories into one repository (this would not be
  useful for larger repositories as repacking would be problematic).
 * "git verify-pack" has been rewritten to use the "index-pack" machinery
  that is more efficient in reading objects in packfiles.
 * test scripts for gitweb tried to run even when CGI-related perl modules
  are not installed; they now exit early when the latter are unavailable.

Also contains various documentation updates and minor miscellaneous

changes.


Fixes since v1.7.6

------------------

Unless otherwise noted, all fixes in the 1.7.6.X maintenance track are

included in this release.
 * The error reporting logic of "git am" when the command is fed a file
  whose mail-storage format is unknown was fixed.
  (merge dff4b0e gb/maint-am-patch-format-
error-message later to 'maint').
 * "git branch --set-upstream @{-1} foo" did not expand @{-1} correctly.
  (merge e9d4f74 mg/branch-set-upstream-
previous later to 'maint').
 * "git branch -m" and "git checkout -b" incorrectly allowed the tip
  of the branch that is currently checked out updated.
  (merge 55c4a67 ci/forbid-unwanted-current-
branch-update later to 'maint').
 * "git check-ref-format --print" used to parrot a candidate string that
  began with a slash (e.g. /refs/heads/master) without stripping it, to make
  the result a suitably normalized string the caller can append to "$GIT_DIR/".
  (merge f3738c1 mh/check-ref-format-print-
normalize later to 'maint').
 * "git clone" failed to clone locally from a ".git" file that itself
  is not a directory but is a pointer to one.
  (merge 9b0ebc7 nd/maint-clone-gitdir later to 'maint').
 * "git clone" from a local repository that borrows from another
  object store using a relative path in its objects/info/alternates
  file did not adjust the alternates in the resulting repository.
  (merge e6baf4a1 jc/maint-clone-alternates later to 'maint').
 * "git describe --dirty" did not refresh the index before checking the
  state of the working tree files.
  (cherry-pick bb57148 ac/describe-dirty-refresh later to 'maint').
 * "git ls-files ../$path" that is run from a subdirectory reported errors
  incorrectly when there is no such path that matches the given pathspec.
  (merge 0f64bfa cb/maint-ls-files-error-report later to 'maint').

1.7.6.3 is out

Git 1.7.6.3 is out with 20 fixes from 8 people.

Git v1.7.6.3 Release Notes
==========================

Fixes since v1.7.6.2
--------------------

 * "git -c var=value subcmd" misparsed the custom configuration when
   value contained an equal sign.

 * "git fetch" had a major performance regression, wasting many
   needless cycles in a repository where there is no submodules
   present. This was especially bad, when there were many refs.

 * "git reflog $refname" did not default to the "show" subcommand as
   the documentation advertised the command to do.

 * "git reset" did not leave meaningful log message in the reflog.

 * "git status --ignored" did not show ignored items when there is no
   untracked items.

 * "git tag --contains $commit" was unnecessarily inefficient.

Also contains minor fixes and documentation updates.

Wednesday, August 31, 2011

How to inject a malicious commit to a Git repository (or not)

[Note: there are follow-up articles here and there]

[Note: some site seems to have misreported that I outlined how one can forge a history stored in Git here, but the point of this article is how impractical and unrealistic it is for anybody to do so without letting other people take notice.]

Suppose if you momentarily gained write access to other people's public repositories at a large distribution point, such as kernel.org. What damage can you inflict on their projects if you wanted to?

You could create a malicious commit on top of the tip of "master" branch of linux.git repository of Linus Torvalds. Nobody prevents you from pretending that you are Linus:

$ GIT_AUTHOR_NAME="Linus Torvalds" \
  GIT_COMMITTER_NAME="Linus Torvalds" \
  GIT_AUTHOR_EMAIL=torvalds@linux-foundation.org \
  GIT_COMMITTER_EMAIL=torvalds@linux-foundation.org \
  git commit -s


Your English may be good enough to fool readers into believing that the log message may have come from Linus himself. Perhaps you may have done this around August 12th, when the tip of Linus's true "master" branch was commit M and X is the malicious commit you created on top of it. The resulting history may look like this:

--M
   \
    X

If an unsuspecting victim pulls regularly from Linus's repository, he may run a git pull before your malicious commit is discovered in security audit. And he may have already based his derivative product based on this malicious version of the kernel.

Is this a big "Oops"? We'll see what happens to this unsuspecting victim later.

When Linus tries to upload his updated work, however, the history on his development machine (which is not the distribution point you managed to add your malicious commit) does not have your commit X. In Git terms, the history you tweaked and the history Linus has now diverged:


--M---o---o---o---o---o---o---o---L
   \
    X

where M is the original tip of the "master" branch at the public repository, X is the malicious commit you created and updated the "master" branch to point at, and L is the tip of the history Linus is about to upload. We say "L does not fast-forward to X", as X is not part of L (time flows from left to right).

What happens now is that "git push" Linus runs to upload to his public repository notices that updating the "master" branch at the public repository with the tip of his history will lose commit X you created (it does not notice that the commit that is about to be lost is a malicious one, nor does it notice it was not made by Linus, but it does not have to notice either at all for this protection to work), and refuses to do so. Linus would definitely notice that something fishy is going on, because he needs to do something he usually never does to push his changes as his next step.

If this were a shared repository setting, Linus may say "Ah, somebody else beat me to it", then runs "git pull" to merge work by other people who share the same public repository (i.e. you) to his tree to create a merge commit Y, and then pushes the result again:


--M---o---o---o---o---o---o---o---L
   \                               \
    X-------------------------------Y


In the end, your malicious commit X could end up in the resulting history this way, provided if he does such a merge, and if he does not inspect the merge Y.

But Linus (or any kernel people with publishing repositories at kernel.org in general) does not work using a shared repository with other people to begin with. The repository at kernel.org is his publishing repository and his alone, so you cannot sneak your malicious commit into his history through this avenue.

Linus could choose to be careless and force his push, without bothering to investigate why his push does not fast-forward (in real life, this is not going to happen, but for the sake of mental exercise, imagine that he chose to be careless and let's see what happens). This will eliminate your malicious commit from his public repository. If he did so, the repository would look like this:


--M---o---o---o---o---o---o---o---L

Your malicious commit X would not have any effect to people who pulled from Linus's public repository after this happens, but what about the unsuspecting victim who pulled X before Linus forced this push? Is he contaminated with your malicious commit and will not notice it forever?

Remember, as far as he is concerned, Linus's history he pulled earlier, which is kept in his origin/master remote tracking branch, was X, and then it is being updated to L, which does not fast-forward. His "git pull" (actually it is "git fetch" that is invoked as part of "pull") will notice and would report:

From git://git.kernel.org/.../torvalds/linux.git/
 + 9d901d9...ad4d968 master     -> origin/master  (forced update)


Notice "forced update"? The unsuspecting victim can notice that the side branch lead to X is no longer part of Linus's history.

One security tip I would offer here is this. If you know that your upstream (in this illustration, Linus) never rewinds his history, you can tweak your .git/config file (open it with your favorite $EDITOR, it is a simple text file and is designed to be editable by hand) and drop the '+' sign from the "fetch" line. Find a line that looks like this:

[remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*


And edit it to make it look like this:

[remote "origin"]
        fetch = refs/heads/*:refs/remotes/origin/*


This will make your "git pull" (again, it is actually "git fetch" that is invoked from the command) to fail when the upstream rewound the history, like this. You will see that the command fails like so when you pull from Linus:

From git://git.kernel.org/.../torvalds/linux.git/
 ! [rejected]        master     -> origin/master  (non-fast-forward)


We might want to revisit the default settings "git clone" leaves in your new repository to make it harder for upstreams to rewind their branches by dropping the '+' (which means "allow non-fast-forward), but that will have to be discussed on the Git mailing list (git@vger.kernel.org), not in this blog post. There is a reason we didn't make it default to insist on fast-forwardness.

By the way, it does not make an iota of difference to the above story if you rewrote the commits that lead to M (i.e. the old tip of the "master" branch of Linus's history) using "rebase" or "commit --amend". The only difference is that such a change will move the fork point of the diverged histories from M (in the above story) further back to a different commit that is older than M in the ancestry chain. The history Linus will try to push to his public repository L will not fast-forward to the commit you place at the tip of the "master" branch that contains your malicious version, and that is the only thing that matters.


Wednesday, August 24, 2011

1.7.6.1 is out

Git 1.7.6.1 is out with 88 small fixes from 29 people.

Git v1.7.6.1 Release Notes
==========================

Fixes since v1.7.6
------------------

 * Various codepaths that invoked zlib deflate/inflate assumed that these
   functions can compress or uncompress more than 4GB data in one call on
   platforms with 64-bit long, which has been corrected.

 * "git unexecutable" reported that "unexecutable" was not found, even
   though the actual error was that "unexecutable" was found but did
   not have a proper she-bang line to be executed.

 * Error exits from $PAGER were silently ignored.

 * "git checkout -b <branch>" was confused when attempting to create a
   branch whose name ends with "-g" followed by hexadecimal digits,
   and refused to work.

 * "git checkout -b <branch>" sometimes wrote a bogus reflog entry,
   causing later "git checkout -" to fail.

 * "git diff --cc" learned to correctly ignore binary files.

 * "git diff -c/--cc" mishandled a deletion that resolves a conflict, and
   looked in the working tree instead.

 * "git fast-export" forgot to quote pathnames with unsafe characters
   in its output.

 * "git fetch" over smart-http transport used to abort when the
   repository was updated between the initial connection and the
   subsequent object transfer.

 * "git fetch" did not recurse into submodules in subdirectories.

 * "git ls-tree" did not error out when asked to show a corrupt tree.

 * "git pull" without any argument left an extra whitespace after the
   command name in its reflog.

 * "git push --quiet" was not really quiet.

 * "git rebase -i -p" incorrectly dropped commits from side branches.

 * "git reset [<commit>] paths..." did not reset the index entry correctly
   for unmerged paths.

 * "git submodule add" did not allow a relative repository path when
   the superproject did not have any default remote url.

 * "git submodule foreach" failed to correctly give the standard input to
   the user-supplied command it invoked.

 * submodules that the user has never showed interest in by running
   "git submodule init" was incorrectly marked as interesting by "git
   submodule sync".

 * "git submodule update --quiet" was not really quiet.

  * "git tag -l <glob>..." did not take multiple glob patterns from the
   command line.