[Note: there are follow-up articles here and there]
[Note: some site seems to have misreported that I outlined how one can forge a history stored in Git here, but the point of this article is how impractical and unrealistic it is for anybody to do so without letting other people take notice.]
Suppose if you momentarily gained write access to other people's public repositories at a large distribution point, such as kernel.org. What damage can you inflict on their projects if you wanted to?
You could create a malicious commit on top of the tip of "master" branch of linux.git repository of Linus Torvalds. Nobody prevents you from pretending that you are Linus:
$ GIT_AUTHOR_NAME="Linus Torvalds" \
GIT_COMMITTER_NAME="Linus Torvalds" \
git commit -s
Your English may be good enough to fool readers into believing that the log message may have come from Linus himself. Perhaps you may have done this around August 12th, when the tip of Linus's true "master" branch was commit M and X is the malicious commit you created on top of it. The resulting history may look like this:
If an unsuspecting victim pulls regularly from Linus's repository, he may run a git pull before your malicious commit is discovered in security audit. And he may have already based his derivative product based on this malicious version of the kernel.
Is this a big "Oops"? We'll see what happens to this unsuspecting victim later.
When Linus tries to upload his updated work, however, the history on his development machine (which is not the distribution point you managed to add your malicious commit) does not have your commit X. In Git terms, the history you tweaked and the history Linus has now diverged:
where M is the original tip of the "master" branch at the public repository, X is the malicious commit you created and updated the "master" branch to point at, and L is the tip of the history Linus is about to upload. We say "L does not fast-forward to X", as X is not part of L (time flows from left to right).
What happens now is that "git push" Linus runs to upload to his public repository notices that updating the "master" branch at the public repository with the tip of his history will lose commit X you created (it does not notice that the commit that is about to be lost is a malicious one, nor does it notice it was not made by Linus, but it does not have to notice either at all for this protection to work), and refuses to do so. Linus would definitely notice that something fishy is going on, because he needs to do something he usually never does to push his changes as his next step.
If this were a shared repository setting, Linus may say "Ah, somebody else beat me to it", then runs "git pull" to merge work by other people who share the same public repository (i.e. you) to his tree to create a merge commit Y, and then pushes the result again:
In the end, your malicious commit X could end up in the resulting history this way, provided if he does such a merge, and if he does not inspect the merge Y.
But Linus (or any kernel people with publishing repositories at kernel.org in general) does not work using a shared repository with other people to begin with. The repository at kernel.org is his publishing repository and his alone, so you cannot sneak your malicious commit into his history through this avenue.
Linus could choose to be careless and force his push, without bothering to investigate why his push does not fast-forward (in real life, this is not going to happen, but for the sake of mental exercise, imagine that he chose to be careless and let's see what happens). This will eliminate your malicious commit from his public repository. If he did so, the repository would look like this:
Your malicious commit X would not have any effect to people who pulled from Linus's public repository after this happens, but what about the unsuspecting victim who pulled X before Linus forced this push? Is he contaminated with your malicious commit and will not notice it forever?
Remember, as far as he is concerned, Linus's history he pulled earlier, which is kept in his origin/master remote tracking branch, was X, and then it is being updated to L, which does not fast-forward. His "git pull" (actually it is "git fetch" that is invoked as part of "pull") will notice and would report:
+ 9d901d9...ad4d968 master -> origin/master (forced update)
Notice "forced update"? The unsuspecting victim can notice that the side branch lead to X is no longer part of Linus's history.
One security tip I would offer here is this. If you know that your upstream (in this illustration, Linus) never rewinds his history, you can tweak your .git/config file (open it with your favorite $EDITOR, it is a simple text file and is designed to be editable by hand) and drop the '+' sign from the "fetch" line. Find a line that looks like this:
fetch = +refs/heads/*:refs/remotes/origin/*
And edit it to make it look like this:
fetch = refs/heads/*:refs/remotes/origin/*
This will make your "git pull" (again, it is actually "git fetch" that is invoked from the command) to fail when the upstream rewound the history, like this. You will see that the command fails like so when you pull from Linus:
! [rejected] master -> origin/master (non-fast-forward)
We might want to revisit the default settings "git clone" leaves in your new repository to make it harder for upstreams to rewind their branches by dropping the '+' (which means "allow non-fast-forward), but that will have to be discussed on the Git mailing list (email@example.com), not in this blog post. There is a reason we didn't make it default to insist on fast-forwardness.
By the way, it does not make an iota of difference to the above story if you rewrote the commits that lead to M (i.e. the old tip of the "master" branch of Linus's history) using "rebase" or "commit --amend". The only difference is that such a change will move the fork point of the diverged histories from M (in the above story) further back to a different commit that is older than M in the ancestry chain. The history Linus will try to push to his public repository L will not fast-forward to the commit you place at the tip of the "master" branch that contains your malicious version, and that is the only thing that matters.