Sunday, August 19, 2012

Git 1.7.12


I just tagged the 1.7.12 final release, which is available in tarball form here, and from the usual copies of my repository.

Some highlights:
  • An experimental support for UTF-8 pathnames on MacOS.
    I think there still are places that need conversion between the system encoding (UTF-8 normalized by decomposing) and the more commonly used encoding (precomposed) that is used internally for better interoperability, but this should be a good start.
  • $HOME/.gitconfig file can be moved to $HOME/.config/git/config, in line with XDG.
    This will also allow $HOME/.config/git/attributes and $HOME/.config/git/ignore, if exist, to be automatically used as core.attributesfile and core.excludesfile, respectively.
  • "git apply" learned the same three-way merge patch wiggling magic "git am" supports, via the "-3" option.
  • "git rebase -i --root" learned how to update the root commit when requested.
  • "git status" can give a more detailed explanation during "intermediate" state of multi-step operations, e.g. "merge", "rebase".
  • A remote "SCM" interface to MediaWiki in contrib/ learned to handle file attachments.
  • "git clone --no-local $path" bypasses the "local directory?  Just cp -R it!" (not quite that aggressive, but still) optimization and invokes the usual object transfer and repacking codepath.
  • The HTMLified documentation pages shown by "git help -w $cmd" could be obtained over the network by setting the help.htmlpath configuration variable.
Enjoy.

Wednesday, August 15, 2012

Git 1.7.11.5 and Git 1.7.12-rc3


The final release candidate for 1.7.12 is found at the usual places. Again, nothing earth-shattering.

For highlights of the upcoming release, please see the previous entry on 1.7.12-rc1.

Also there is an updated maintenance release 1.7.11.5 that fixes some minor build issues, with updates to "git-blame.el" (in contrib/).


Friday, August 10, 2012

Leftover bits

While developing for Git, sometimes I find a topic getting "stuck" for various reasons, e.g. the basic direction is good, but the remainder of the system is not ready for such a change.

Instead of running "git branch -D topic" in such a case, I use a "git hold" alias that looks like this:

[alias]
  hold = "!sh -c 'git update-ref refs/hold/$1 refs/heads/$1 && \
          git branch -D $1' -"

to stash it away. And "git for-each-ref refs/hold/" gives me many such "held" topics.

Of course, without an active effort to go back, reconsider and prune from time to time, these things tend to accumulate. And today was a good day to review them for me. I found a couple of topics that may still be viable as my reward:
  • [DONE - will be in Git 2.0] Make "git add " notice removals of files in the named directories; after all "add" is not about "add files to git", but is "add the current state to the index", and the lack of files that previously existed is part of that state.
    Cf. $gmane/171811

  • [DONE] Allow "git log .." to mean "Give me a log of recent commits that touch my parent directory and what are under it". The current code instead insists that ".." could be both a revision range and a path and asks the user to disambiguate.
    Cf. $gmane/172619
I did not find major downsides on either of these topics after re-thinking the issues, and I didn't see any valid objections from people whose judgement I trust in the old discussions, either.

Perhaps I should try resurrecting them and see what happens.

Tuesday, August 7, 2012

Git 1.7.12-rc2

An updated release candidate is found at the usual places. You should find nothing earth-shattering in there; just a bunch of translation updates, build clean-ups, and minor documentation updates.

For highlights of the upcoming release, please see the previous entry on 1.7.12-rc1.

Saturday, August 4, 2012

Bringing a bit more sanity to "alternates"?


The "alternates" mechanism lets you keep a single object store (not necessarily a git repository on its own, but just the objects/ part of it) on a machine, have multiple repositories on the same machine share objects from it, to save the network transfer bandwidth when cloning from remote repositories and the disk space used by the local repositories.  A repository created by "clone --reference" or "clone -s" uses this mechanism to borrow objects from the object store of another repository.  A user also can manually add new entries to $GIT_DIR/objects/info/alternates to borrow from other object stores.

The UI for this mechanism however has some room for improvement, and we may want to start improving it for the next release after the upcoming Git 1.7.12 (or even Git 2.0 if the change is a large one that may be backward incompatible but gives us a vast improvement).

Here are some random thoughts as a discussion starter (the real discussion is on the git mailing list git@vger.kernel.org; see http://thread.gmane.org/gmane.comp.version-control.git/202902).

By design, the borrowed object store MUST not ever lose any object from it, as such an object loss can corrupt the borrowing repositories.  In theory, it is OK for the object store whose objects are borrowed by repositories to acquire new objects, but losing existing objects is an absolute no-no.

But the UI of "clone -s" encourages users to borrow from the object store of a repository that the user may actively develop in.  It is perfectly normal for users to perform operations that make objects that used to be reachable from tips of its branches unreachable (e.g. rebase, reset, "branch -d") in a repository that is used for active development, but a "gc" after such an operation will lose objects that were originally available in the repository.  If objects lost that way were still needed by the repositories that borrow from it, the borrowing repository gets corrupt immediately.

In practice, this means that users who use "clone -s" to make a new repository can *never* prune the original repository without risking to corrupt its borrowing repository [1].
Some ideas:
  • Make "clone --reference" without "-s" not to borrow from the reference repository.  E.g. if you have a clone of Linus repository at /git/linux.git/, cloning a related repository using it as --reference:

    $ git clone --reference /git/linux.git git://k.org/linux-next.git should still take advantage of /git/linux.git/{refs,objects} to reduce the transfer cost of fetching from k.org, but the resulting repository should not point /git/linux.git with its objects/info/alternates file.
  • Make the distinction between a regular repository and an object store that is meant to be used for object sharing stronger.
    Perhaps a configuration item "core.objectstore = readonly" can be introduced, and we forbid "clone -s" from pointing at a repository without such a configuration.  We also forbid object pruning operations such as "gc" and "repack" from being run in a repository marked as such.

    It may be necessary to allow some special kind of repacking of such a "readonly" object store, in order to reduce the number of packfiles (and get rid of loose object files); it needs to be implemented carefully not to lose any object, regardless of local reachability.
When you have a repository and one or more repositories that borrow from it, you may want to dissociate the borrowing repositories from the borrowed one (e.g. so that you can repack or prune the original repository safely, or you may even want to remove it).

I think "git repack -a -d [-f]" in the borrowing repository happens to be the way to do this, but it is not clear to the users why.

Some ideas:

  • It might not be a bad idea to have a dedicated new command to help users manage alternates ("git alternates"?); obviously this will be one of its subcommand "git alternates detach" if we go that route.
  • Or just an entry in the documentation is sufficient?
When you have two or more repositories that do not share objects, you may want to rearrange things so that they share their objects from a single common object store.

There is no direct UI to do this, as far as I know.  You can obviously create a new bare repository, push there from all of these repositories, and then borrow from there, e.g.

git --bare init shared.git &&
for r in a.git b.git c.git ...
        do
   (
cd "$r" &&
       git push ../shared.git "refs/*:refs/remotes/$r/*" &&
echo ../../../shared.git/objects >.git/objects/info/alternates
       )
done

And then repack shared.git once.

Some ideas:
  • (obvious: give a canned command to do the above, perhaps then set the core.objectstore=readonly in the resuting shared.git)

When you have one object store and a repository that does not yet borrow from it, you may want to make the repository borrow from the object store.  Obviously you can run "echo" like the sample script in the previous item above, but it is not obvious how to perform the logical next step of shrinking $GIT_DIR/objects of the repository that now borrows the objects.

[edit: This is supported as "git repack -a -d -l"]

I think "git repack -a -d" is the way to do this, but if you compare this command to "git repack -a -d -f" we saw previously in this message, it is not surprising that the users would be confused---it is not obvious at all.

Some ideas:
  • (obvious: give a canned subcommand to do this)

[Footnote]

1 Making the borrowed object store aware of all the repositories that borrow from it, so that operations like "gc" and "repack" in the repository with the borrowed object store can keep objects that are needed by borrowing repositories, is theoretically possible, but is not a workable approach in practice, as (1) borrowers may not have a write access to the shared object store to add such a back pointer to begin with, (2) "gc"/"repack" in the borrowed object store and normal operations in the borrowing repositories can easily race with each other, without any coordination between the users, and (3) a casual "borrowing" can simply be done with a simple "echo" as shown in the main text of this message, and there is no way to ensure a backpointer from the borrowed object store to such a borrowing repository.