Git Blame

Wednesday, August 14, 2013

Delaying Git 1.8.4 by a week

It appears that we need to revert two topics that cause regressions before the upcoming 1.8.4 release.

There is a corner case bug in git stash. Suppose you have a path that is a regular file (or a symbolic link) in the committed state. You change it to a directory in your working tree, and have various new files in it. Some of them may be tracked, while others may not be. You issue git stash. The command needs to match the path to the committed state, hence it needs to remove the directory to resurrect the path. The new files in the directory you have git added will be in the stash so they are OK, but what happens to the untracked ones? They are killed. The same issue exists if you turned a tracked directory into a file and run the command without first running git add.
An attempted fix was to ask git ls-files --killed to see if such a path exists that will be lost, but it turns out that this makes the command unusably slow in certain directories with very many untracked files.
There was an attempt to save typing four capital letters "H", "E", "A" and "D" by instead allowing you to type "@", e.g. git log @. The idea may have been a good one, but the change was executed poorly and incorrectly triggered when it shouldn't (e.g. having a branch whose name is @/foo made it into HEAD/foo or something insane).

Because we have already passed -rc3, I'd feel safer to add another rc week before the final. Updated Git Calendar is here.

Both of these changes meant well, and because we are not reverting them due to design mistakes (i.e. we are not saying that "we do not ever want to have such a feature or fix in our system"), hopefully these can be redone properly after the upcoming release is done.

Some leftover bits (I'll add more to this list later).

[DONE] Find out where ls-files --killed is unnecessarily wasting time, and fix it. This is a prerequisite to resurrect the stash corner case fix.
Cf. $gmane/232113
Refactor run_hook() interface to be truly reusable by codepath other than git commit, resurrecting a "how about this" patch sent in the past.
Cf. $gmane/192806, $gmane/212284
[IN PROGRESS] Extend the upload-pack protocol to tell what symbolic ref points at which other ref by resurrecting the idea outlined in 2008.
Cf. $gmane/102039
[IN PROGRESS] Rethink how name-hash keeps track of names of directories and actual files to help case insensitive filesystems. Since 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), there appears to be no reason why a directory name has to be registered to the hash with a trailing slash, which is the root cause why directory_exists_in_index_icase() reads past the end of the buffer.
Cf. $gmane/232822
[DONE] Look into cvsserver permission bits regression between 1.8.1 and 1.8.3.
Cf. $gmane/234476
Look into pathspec-limited revision traversal regression between 1.8.3 and 1.8.4.
Cf. $gmane/234462
Checking out a branch X that does not have directory D (or worse, has a file D), while you are in the directory D, may want to fail.
Cf. $gmane/234905
Allow extra options to "ssh" invocation made from connect.c, in a way that (ideally) does not break backward compatibility.
Cf. $gmane/234624
Perhaps add a --post-service-hook to the git-daemon that can be used after a service finishes? The exit status from the service process means totally different thing from what the user of service perceives because the former has to say "successfully told the requester that the request is denied", it may not be such a useful mechanism as one naïvely would expect, though.
Cf. $gmane/234706
git checkout $commit -- somedir should remove somedir/file that is not in $commit but is in the original index.
Cf. $gmane/234935

Thursday, August 1, 2013

Git 1.8.4-rc1

The first release candidate for Git v1.8.4-rc1 is available for testing at the usual places.
For highlights, please refer to the previous post on v1.8.4-rc0.

Have fun.

Wednesday, July 24, 2013

Git 1.8.4-rc0

A release candidate preview Git v1.8.4-rc0 is now available for testing at the usual places.

As this cycle is a rather large update, please test this thoroughly. It contains 814 non-merge commits, from 90+ contributors (v1.8.3 consisted of 694 changes from 97 contributors).

Here are some highlights:

"git log" learnt the "-Lbegin,end:filename" option. This starts from the specified range and digs through the history. It may still have rough edges and memory leaks, though.
"git clean" learnt the interactive mode, modeled after "git add -i" interface.
"git check-mailmap" is a new command that lets you inquire your .mailmap file for the canonical username and e-mail address.
"git name-rev" learnt to name an annotated tag object name back to its tagname.
Various subcommands of "git submodule" now works even from a subdirectory.
"git submodule update" can optionally clone the submodule repositories shallowly.
The "push.default=simple" mode of "git push" has been updated to behave like "current" when you push to a remote that is different from where you fetch from (e.g. via remote.pushdefault), in order to better support the triangular workflow.
"git log" learnt the "--author-date-order" option.
The configuration variable color.ui defaults to "auto" now.
Instead of typing "HEAD", you can say "@" instead, e.g. "git log @".
"git describe" learnt the "--first-parent" option.
"git fetch $remote $branch" used to avoid touching the remote-tracking branch (you could always be explicit and say "git fetch $remote $branch:refs/remotes/$remote/$branch"). The command now updates the remote-tracking branch (if configured).
Use of platform fnmatch(3) function (many places like pathspec matching, .gitignore and .gitattributes) have been replaced with wildmatch, allowing "foo/**/bar" to match "foo/bar", "foo/a/bar", etc.

Have fun.

Monday, July 22, 2013

Git 1.8.3.4

The latest maintenance release Git v1.8.3.4 is now available at the usual places. This is mostly to propagate documentation fixes and test updates from the master front back to the maintenance track, but there are a handful of minor fixes as well:

The bisect log listed incorrect commits when bisection ends with only skipped ones.
The test coverage framework was left broken for some time.
The test suite for HTTP transport did not run with Apache 2.4.
"git diff" used to fail when core.safecrlf is set and the working tree contents had mixed CRLF/LF line endings. Committing such a content must be prohibited, but "git diff" should help the user to locate and fix such problems without failing.

These fixes are already on the 'master' branch to be included in upcoming Git 1.8.4. Hopefully we can do its zeroth release candidate preview early this week.

Have fun.

Monday, July 15, 2013

Git 1.8.3.3

The third maintenance release for 1.8.3.x series is now available at the usual places. It contains the following fixes that have already been applied to the 'master' branch for 1.8.4.

"git apply" parsed patches that add new files, generated by programs other than Git, incorrectly. This is an old breakage in v1.7.11.
Older cURL wanted piece of memory we call it with to be stable, but we updated the auth material after handing it to a call.
"git pull" into nothing trashed "local changes" that were in the index.
Many "git submodule" operations did not work on a submodule at a path whose name is not in ASCII.
"cherry-pick" had a small leak in its error codepath.
Logic used by git-send-email to suppress cc mishandled names like "A U. Thor" <author@example.xz>, where the human readable part needs to be quoted (the user input may not have the double quotes around the name, and comparison was done between quoted and unquoted strings). It also mishandled names that need RFC2047 quoting.
"gitweb" forgot to clear a global variable $search_regexp upon each request, mistakenly carrying over the previous search to a new one when used as a persistent CGI.
The wildmatch engine did not honor WM_CASEFOLD option correctly.
"git log -c --follow $path" segfaulted upon hitting the commit that renamed the $path being followed.
When a reflog notation is used for implicit "current branch", e.g. "git log @{u}", we did not say which branch and worse said "branch ''" in the error messages.
Mac OS X does not like to write(2) more than INT_MAX number of bytes; work it around by chopping write(2) into smaller pieces.
Newer MacOS X encourages the programs to compile and link with their CommonCrypto, not with OpenSSL.

Friday, June 28, 2013

Git 1.8.3.2

The second maintenance release for 1.8.3.x series is now available at the usual places. It contains the following fixes that have already been applied to the 'master' branch for 1.8.4.

Cloning with "git clone --depth N" while fetch.fsckobjects (or transfer.fsckobjects) is set to true did not tell the cut-off points of the shallow history to the process that validates the objects and the history received, causing the validation to fail.
"git checkout foo" DWIMs the intended "upstream" and turns it into "git checkout -t -b foo remotes/origin/foo". This codepath has been updated to correctly take existing remote definitions into account.
"git fetch" into a shallow repository from a repository that does not know about the shallow boundary commits (e.g. a different fork from the repository the current shallow repository was cloned from) did not work correctly.
"git subtree" (in contrib/) had one codepath with loose error checks to lose data at the remote side.
"git log --ancestry-path A...B" did not work as expected, as it did not pay attention to the fact that the merge base between A and B was the bottom of the range being specified.
"git diff -c -p" was not showing a deleted line from a hunk when another hunk immediately begins where the earlier one ends.
"git merge @{-1}~22" was rewritten to "git merge frotz@{1}~22" incorrectly when your previous branch was "frotz" (it should be rewritten to "git merge frotz~22" instead).
"git commit --allow-empty-message -m ''" should not start an editor.
"git push --[no-]verify" was not documented.
An entry for "file://" scheme in the enumeration of URL types Git can take in the HTML documentation was made into a clickable link by mistake.
zsh prompt script that borrowed from bash prompt script did not work due to slight differences in array variable notation between these two shells.
The bash prompt code (in contrib/) displayed the name of the branch being rebased when "rebase -i/-m/-p" modes are in use, but not the plain vanilla "rebase".
"git push $there HEAD:branch" did not resolve HEAD early enough, so it was easy to flip it around while push is still going on and push out a branch that the user did not originally intended when the command was started.
"difftool --dir-diff" did not copy back changes made by the end-user in the diff tool backend to the working tree in some cases.

Friday, June 21, 2013

Fun with various workflows (2)

As I discussed in a separate post, even though Git is a distributed SCM, it supports the centralized workflow well, to help people migrating from traditional SCM systems. But of course, Git serves the distributed workflow well. The one that is used in the Linux kernel development, where you work based on Linus's or a subsystem maintainer's repository, and publish to your own repository to get it pulled by others (including Linus, if your work is very good).

You would first start by cloning from your upstream:

$ git clone git://git.kernel.org/.../git/torvalds/linux.git

The only difference from the initial step in the centralized workflow is,... nothing. You will get a "linux" directory that becomes your working area, where you will have the standard configuration, perhaps not very different from this:

[remote "origin"]
url = git://git.kernel.org/.../torvalds/linux.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master

And your "master" branch, which was copied from the "master" branch of Linus's repository, is ready for you to build your work on it.

The only difference is that you would not "git push" back to Linus's repository. The "git://" protocol will not usually let you push, and even if it did, Linus would not let you write into his repository.

After working on your changes on "master", the way you would push out what you did is to say something like this:

$ git push git@github.com:me/linux.git master

This might get cumbersome to type every time, so you would add another remote, perhaps like this:

[remote "me"]
url = git@github.com:me/linux.git

By defining a short-hand for that URL, you can now say:

$ git push me master

and push out the work you did on your master branch as the master branch of your public repository, so that other people can pull from it.

If you worked on a topic that was forked from Linus's master to enhance a specific feature or fix a specific bug, you may want to say:

$ git checkout -b fix-tty-bug origin/master

... work work work ...

$ git push me fix-tty-bug

to publish the result in your public repository as a branch.

By the way, do you recall the reason why upstream mode was appropriate when using the centralized workflow from the previous post?

While the purpose of the Linus's master branch is to advance the overall state of the Linux kernel to prepare for the next release, the purpose of your topic branch fix-tty-bug is a lot narrower. And you are usually not integrating the work other people did into your work before you push it out. Indeed, you are encouraged to pick one stable point in the official (i.e. Linus's) history, and build on top of it without rebasing or merging things unrelated to what you are trying to achieve yourself.

Unlike in the centralized workflow where you tentatively play the role of integrator and change the purpose of your topic branch into "advance the overall project status" (which is compatible with the purpose of the "master" branch you will be updating with your work in the centralized workflow) immediately before you push it out, the purpose of your topic branch will stay to be the same as the original purpose of the topic until and after you push it out, when you are working with the distributed workflow.

If you started your topic branch, fix-tty-bug, to fix a bug in the tty subsystem and named it after the purpose of the topic branch, it can and should keep the name in your public repository. There is no reason to publish the result as your master branch. You control the branch names in your public repository, and pushing it out as master will only lose information. The branch name fix-tty-bug told what the branch was about. The name master sounds as if you are trying to make everything better, but that is not what you did.

So in general, you would be pushing out your topic branches to your public repository under the same name. You can use the 'current' mode when push your work out, like this:

$ git config push.default current

And then, you can lose that branch name from the command line when you push your work out:

$ git push me

You run the above command while you have your fix-tty-bug branch checked out, and the current branch is pushed out to the destination repository (i.e. me) to update the branch of the same name.

Recently, we added a mechanism to help those who are too lazy to even type "me", i.e. it let you say:

$ git push

To use this, you configure what remote you push to when you do not say from the command line, with a configuration variable, like this:

$ git config remote.pushdefault me

This feature is available in Git 1.8.3 and later.

Thursday, June 20, 2013

Fun with various workflows (1)

Even though Git is distributed, you can still use it for projects that employ the centralized workflow, where there is a single central shared repository. Everybody pulls from it to obtain everybody else's work, and after integrating his own work with others' work, everybody pushes into it so that everybody else can enjoy the fruit of his work.

In the simplest workflow, you can start by cloning from the central repository:
$ git clone our.site.xz:/pub/repo/project.git myproject
and the myproject directory becomes your working area, where you will have the standard configuration, perhaps not very different from this:
[remote "origin"]
url = our.site.xz:/pub/repo/project.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
and your "master" branch, which was copied from the "master" branch of the central shared repository, is ready for you to build your work on it.

If you run "git pull --rebase" (without any other argument), the configuration above left for you by "git clone" will tell Git that you would want to obtain the latest work from the central shared repository, and you would want to rebase your own work on top of their master branch.

If you say "git push" (without any other argument), the current default mode of pushing is to look at your local branches, and look at the branches the repository you are pushing to has, and update the matching branches. In this "simplest" case, you only have the 'master' branch, and the central repository does have its 'master' branch, so you will update its 'master' branch with the work you did on your 'master' branch.

In Git 2.0, this default mode will change to 'simple', which will push only the current branch to the branch at the central repository you integrate with, but only when they have the same name (so the example of working on 'master' and pushing it back to 'master' will still work).

If your project employs the centralized workflow, after learning Git enough to be comfortable with it, you may want to do
$ git config push.default upstream
to choose to always update the branch at the central repository you integrate with, even if the branch names are different. Note that you can do this (or use 'simple' instead of 'upstream'), and indeed you are encouraged to do so, without waiting for Git 2.0.

That will allow you to work on different things on different branches, e.g.
$ git checkout -b my-feature -t origin/master
$ git push
The first "checkout" will create a new "my-feature" branch, that is set to integrate with the master branch from your central repository. When using the upstream mode, you will push "my-feature" back to update the "master" branch over there.

An interesting thing to notice is that in the centralized workflow, because there is no central project maintainer (aka integrator), everybody is responsible for integrating his own work to advance the mainline of the project. The job of integration is indeed distributed when you use centralized workflow. It is a bit funny when you think about it.

But that is exactly why the upstream mode makes sense. In order to fully appreciate it, you need to realize what it means to have forked the "my-feature" branch out of the "master" branch of the central shared repository.

The purpose of the master branch at the shared central repository is to advance the state of the project in general, but the purpose of your local branch, my-feature, is a lot more specialized. It may be to fix this small bug, or add that neat feature. You would only be working on a small part of the project while on that branch.

But because you are the one who plays the top-level integrator role when you run "git pull --rebase" just before you "git push", when that "git pull --rebase" finishes, the tip of your my-feature branch is no longer about your small fix or neat feature. It temporarily becomes about advancing the state of the overall project. And that is the reason you would "git push" it to update the master branch, not the "my-feature" branch, at the central repository. Of course, if you want to publish it as "my-feature", perhaps because you want to show it to others before really updating the shared master branch, you can explicitly say:
$ git push origin my-feature
Pushing my-feature that was forked from and still integrates with their master is not usually what you want to do every time in the centralized workflow, though. In fact, it often is the case that administrators of a project with centralized workflow flown upon people making random branches at their shared central repository willy-nilly (exactly because the central shared repository is a common resource and a feature branch like "my-branch" is often not of general interest).

Common things require less typing, and uncommon things are possible but you need to explicitly tell Git to do so.

The Git core itself is very much agnostic to what workflow you use, and you can also use it for projects that use "I publish my work to my public repository, others interested in my work can pull my work from there, and there is an integrator who pulls and consolidates good work from others and publishes the aggregated whole" distributed workflow. That will be a topic for a separate post.

Monday, June 10, 2013

Git 1.8.3.1

The first maintenance release 1.8.3.1 is out.

This is primarily to push out fixes to two regressions that seems to have affected many people recently. Sorry about that.

With Git 1.8.3, an entry "!dir" in .gitignore to say "This directory's contents is not ignored, unless other more specific entries tells us otherwise" did not work correctly. This regression has been fixed.
With recent Git since 1.7.12.1 or so, "git daemon", when started by the root user and then switched to an unprivileged user, refused to run when ~root/.gitconfig (and XDG equivalent configuration files under ~root/.config/) cannot be read by the unprivileged user. The right way to start the daemon might be to reset its $HOME (where these configuration files are read from) to somewhere the user the daemon runs as, but it is cumbersome to set up. With 1.8.3.1, failure to access these files with EPERM is treated as if these files do not exist, which is not an error.

The release tarballs are available at the usual places:

https://code.google.com/p/git-core/downloads/list
https://www.kernel.org/pub/software/scm/git/

Checking the current branch programatically

The git branch Porcelain command, when run without any argument, lists the local branches, and shows the current branch prefixed with an asterisk, like this:

$ git branch
* master
next
$ git checkout master^0
$ git branch
* (no branch)
master
next

The second one with (no branch) is shown when you are not on any branch at all. It often is used when you are sightseeing the tree of a tagged version, e.g. after running git checkout v1.8.3 or something like that.

To find out what the current branch is, casual/careless users may have scripted around git branch, which is wrong. We actively discourage against use of any Porcelain command, including git branch, in scripts, because the output from the command is subject to change to help human consumption use case.

And in fact, since release 1.8.3, the output when you are not on any branch, has become something like this:
$ git checkout v1.8.3
$ git branch
* (detached from v1.8.3)
master
next

in order to give you (as a human consumer) a better information. If your script depended on the exact phrasing from git branch, e.g.

branch=$(git branch | sed -ne 's/^\* \(.*\/\1/p')
case "$branch" in
'('?*')') echo not on any branch ;;
*) echo on branch $branch ;;
esac

your script will break.

The right way to programatically find the name of the current branch, if any, is not to use the Porcelain command git branch that is meant for the human consumption, but to use a plumbing command git symbolic-ref instead:

if branch=$(git symbolic-ref --short -q HEAD)
then
echo on branch $branch
else
echo not on any branch
fi