Using git for Pintos

You are highly encouraged to use git for the Pintos projects. Git is an open-source industry standard revision control system that makes collaborating on software much easier.

Git principles

A git repository is a directory storing one or more branches of a project. A branch consists of a series of commits, each of which is a snapshot of your source tree accompanied by a commit message and some other metadata including the date of the commit and, most importantly, a list of parent commits. There can be multiple parent commits, because you can merge two branches of a repository. Since each commit specifies its parent or parents, a branch simply needs to identify its most recent commit, as doing so recursively identifies all prior commits in the history.

One of the great things about git is that it is a peer-to-peer system. Git makes it easy to push and fetch branches between repositories. Conceptually, a branch contains a series of copies of versions your source tree, one per commit. However, through the magic of delta compression and naming objects by collision-resistant message digests¹, in reality a branch consumes much less space than the sum of all its source trees. Similarly, pushing and fetching branches to and from remote repositories consumes much less bandwidth than would naïvely copying source trees back and forth.

There are two kinds of git repositories. A bare repository just contains git branches and their commits. You can use a bare repository as a place for multiple users to push and fetch branches, but you can’t directly work on a project in a bare repository, because the contents of source trees in commits are stored compressed in git’s internal format. Conversely, you can have a non-bare repository that contains a working tree. A working tree is the source tree of a particular commit that has been checked out into the file system, where you can edit it and create a new commit whose parent is the previously checked out commit.

Git takes a toolkit-based approach to revision control. The git command has a bunch of different subcommands for doing various things. There are low-level commands, known as plumbing, that directly manipulate tree objects, commits, and branches. Then there are higher-level, more user-friendly commands known as porcelain, implemented in terms of the low-level plumbing. This is a very empowering architecture because, even though most users employ porcelain commands most of the time, if the porcelain doesn’t do what you need, you have direct access to the plumbing right from the command line. However, this approach does mean that git can have an intimidating number of different subcommands and options, and that generally there are multiple ways to accomplish a particular task.

This guide will give you one suggested way of approaching the Pintos projects, but it is not the only way to do things. We encourage you to understand how these commands work. Learning git now will pay dividends for the rest of your career.

Git workflows

Because git is peer-to-peer, it enables many different workflows. In one workflow, you use a (usually bare) repository as a centralized, authoritative copy of a project. Individual users clone this repository, develop features in their own local copies, and then push an updated branch to the central repository when they are ready to share changes with other developers.

In another workflow, each user has his or her own repository, and users pull changes from one another’s repositories, but no single repository is the authoritative version.

There are hybrid workflows. For instance, it is common to have an authoritative repository to which only a project maintainer can write, but to have many contributors. In that case, contributors send “pull requests” to the maintainer. In industry, you will often find workflows where a code review is required before merging someone’s pull request. In particular, you may have the authority to review and merge your co-worker’s pull requests, and they may be allowed to review and merge yours, but no one is allowed to merge their own changes into the central repository.

For the Pintos projects, because you only have three developers, we recommend that you use a centralized repository that everyone can push to, and that you don’t bother with pull requests. This will be the simplest, and as long as no one does a force-push (which deletes history), you can always go back and look at history to fix any big problems.

Creating a shared git repository on AFS

Before creating a git repository, you need to create a shared directory that your team can access. For this, you will want to use the fs sa and fs la commands that let you set and examine the access control lists on AFS directories. Make sure you are in AFS, i.e., in your home directory on myth, and in not your NFS home directory on rice. On rice, you can find your AFS directory under /afs/ir.stanford.edu/users/a/b/abc if your sunet ID is abc.

Suppose your teammates’ sunet IDs are aaa and bbb. You can set the access control list as follows:

$ cd
$ mkdir pintos.git
$ fs sa -dir . -acl aaa l -acl bbb l
$ fs sa -dir pintos.git -acl aaa rlidwka -acl bbb rlidwka

Line 1 moves to your home directory. Line 2 creates a pintos.git directory. Line 3 allows your teammates to look up (but not read or write) files in your home directory, so they can get to pintos.git. Line 4 gives your teammates complete permissions on the pintos.git subdirectory. At the end of the quarter, if you want to remove permissions, you could say the following in your home directory:

$ fs sa -dir . -acl aaa none -acl bbb none
$ fs la
[verify acl is satisfactory]

Now you can create a bare git repository in pintos.git as follows:

$ cd pintos.git
$ git --bare init -b master
Initialized empty Git repository in /afs/ir.stanford.edu/users/a/b/abc/pintos.git/
$ cd ..

If you get an error about the -b flag being unsupported, then you have an older version of git. No big deal, just run git --bare init. (Newer versions of git have a configuration-dependent name for the default branch, so supplying -b master will avoid unexpected behavior and suppress warning messages.)

You now have a git repository that you and your teammates can clone with the following command:

$ git clone /afs/ir.stanford.edu/users/a/b/abc/pintos.git pintos

The pintos directory will now be a non-bare repository configured with a remote called origin that corresponds to the bare pintos.git repository you set up for your team. However, the working tree in the pintos directory is not going to be very useful yet because the repository you cloned doesn’t have anything in it!

Copying the Pintos source tree into your git repository

Since git is so good at fetching and pushing branches back and forth, we are making the Pintos source code available in a git repository. You don’t have to fetch the code this way (you can download the code as an archive file as well). But if for any reason we need to make bug fixes to Pintos in the middle of the quarter, it will be much easier to merge in our changes if you have fetched the starter code with git.

Git allows you to store the locations of other git repositories, called remotes, in the configuration of your git repository. As previously mentioned, if you cloned a repository, the new repository has a remote called origin corresponding to the source of the clone. However, you can create a second remote corresponding to the official CS212 Pintos starter code (let’s call that remote cs212) as follows:

$ cd pintos
$ git remote add cs212 http://cs212.scs.stanford.edu/pintos.git
$ git fetch cs212
warning: redirecting to https://web.stanford.edu/class/cs212/pintos.git/
From http://cs212.scs.stanford.edu/pintos
 * [new branch]      master     -> cs212/master
$ git checkout --no-track -B master cs212/master
Reset branch 'master'
Your branch is based on 'origin/master', but the upstream is gone.
  (use "git branch --unset-upstream" to fixup)
$ git push -u origin master

Line 10 changes directory into the non-bare repository you created by cloning pintos.git as above. Line 11 adds the git repository through which we are distributing the assignment as a remote named cs212. Line 12 copies all the branches in the cs212 repository into the local repository, under the namespace cs212, so that you can refer to the branch called master in the remote repository as cs212/master in your local repository.² cs212/master is known as a remote-tracking branch, which is just there to supply a cache of the data in a remote repository, not to contain any local work.

Line 16 is where something useful finally happens. After running this command, you will see new directory called src appear that actually contains the files for the assignments. The git checkout command copies the contents of a particular git commit (in this case cs212/master) into your working tree. Generally, git has a notion of a currently active branch, named HEAD³. You can’t point HEAD to a remote-tracking branch, because that’s just a cache of some remote repository, not a branch you should be working on locally. The option -B master says “overwrite the current meaning of the branch called master and set it to the latest commit of the cs212/master branch, and also point HEAD to master.” In general, -B is a slightly dangerous option so use it carefully. More often, you will want to use lowercase -b which will create a branch that doesn’t exist but refuse to clobber an existing branch. Local branches can have a notion of a remote branch they are tracking. By default, if you create a branch from cs212/master, git will think you will ultimately want to synchronize your work to the cs212 remote, but you can’t do that since it’s the main class repository. The --no-track option says don’t adjust master to track cs212/master.

Finally, after line 16 makes master identical to cs212/master, line 20 copies the contents of the master branch into the origin repository, which is the one you are sharing with your teammates. This command is short for git push -u origin master:master, which says take the contents of the local branch called master (to the left of the colon) and copy it on to the branch master in the origin repository. The -u flag says configure origin/master as the upstream remote-tracking branch of the local branch master. (Technically that flag is redundant here, but you usually want to use it when first pushing to a new repository.)

At this point, if your teammates clone your shared pintos.git repository, they will get the contents of the project.

Another way to populate pintos.git

Instead of cloning the shared git repository you will be using with your teammates, another way to get started with git is to clone the (read-only) class git repository, but then reconfigure it to work with your shared repository instead of the class repository. To do this, first create your shared pintos.git repository with appropriate AFS permissions as described above, but do not clone it. Instead, clone the class repository as follows:

$ git clone -o cs212 http://cs212.scs.stanford.edu/pintos.git
Cloning into 'pintos'...
warning: redirecting to https://web.stanford.edu/class/cs212/pintos.git/
$ cd pintos
$ git remote add origin /afs/ir.stanford.edu/users/a/b/abc/pintos.git
$ git push -u origin master
Enumerating objects: 638, done.
Counting objects: 100% (638/638), done.
Delta compression using up to 48 threads
Compressing objects: 100% (548/548), done.
Writing objects: 100% (638/638), 5.32 MiB | 49.48 MiB/s, done.
Total 638 (delta 82), reused 638 (delta 82), pack-reused 0
remote: Resolving deltas: 100% (82/82), done.
To /afs/ir.stanford.edu/users/a/b/abc/pintos.git
 * [new branch]      master -> master
branch 'master' set up to track 'origin/master'.

Here line 9 creates a new (non-bare) git repository by cloning the read-only class git repository. The argument -o cs212 says call the source of the clone cs212 rather than the default origin. Line 13 configures a second remote called origin and points it to the shared repository you created on AFS. Note, however, that at this point the master branch of your repository is tracking cs212/master rather than origin/master. However, in line 14 you push to origin using the -u flag, which tells git to change the “upstream” branch that master is tracking to be the branch and origin you are pushing to, namely origin/master.

Working with git

Once you have set up a git repository and everyone has cloned it, the following commands will be useful. (We assume most students have seen these, if not there are many introductory git tutorials out there.)

git pull: A combination of git fetch and git merge. Use this to fetch work that has been pushed to the main repository by your teammates. If there’s an update conflict, you may need to edit files manually, then run git commit -a to create a merge commit.
git add file.c: Use this to add file.c to the repository (or inform git that you have modified the file).
git commmit -a: Create a new git commit (will open an editor to ask you for a log message). The -a flag says to include all modified files. Without -a, only changed files on which you have run git add will be reflected in the commit, and only the version of the files as of when you ran git add.
git push: copy your latest commit into the remote repository. With no options, defaults to pushing the current branch (named by HEAD) to the branch’s upstream remote. This will only succeed if your local branch contains all commits in the branch of the remote repository, as otherwise you would lose work. If that’s not the case, run git pull to merge remote changes, then try git push again.

Resolving merge conflicts

If you and your teammate edit the same file, you will create a merge conflict. Things will work fine for the first person who pushes the changes to origin. The next person cannot git push because their local repository does not contain all the commits on origin’s master branch. Therefore, that person will first need to do a git pull, and may see something like this:

$ git pull
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 2), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (4/4), 332 bytes | 332.00 KiB/s, done.
From /tmp/212/pintos
   79574e9..e291cbe  master     -> origin/master
Auto-merging src/Makefile
CONFLICT (content): Merge conflict in src/Makefile
Automatic merge failed; fix conflicts and then commit the result.

You can see that there’s a conflict in the file src/Makefile. You can also verify this by running git status and seeing that src/Makefile is an unmerged path.

To fix the problem, you will have to edit every file that has a merge conflict. When you open the file to edit it, you will see conflicts marked as follows:

<<<<<<< HEAD
# This is going to create a conflict
=======
# Added this in directory 2
>>>>>>> e291cbe622aed15b1c9cc568c4fe8a8a6e3d684f

What this means is that in your local branch (HEAD) you had the first comment (# This is going to create a conflict). In the commit that you are trying to merge from origin/master (which in this case has message digest e291cbe622aed15b1c9cc568c4fe8a8a6e3d684f) you had a different comment (# Added this in directory 2). Since each change was made in a version of the file that did not reflect changes from the other, git does not know how to resolve the problem automatically.

You must manually decide how to resolve each such conflict, for instance by keeping one comment or the other or both or doing something else. You must also manually delete the <<<<<<<, ======= and >>>>>>> markers after fixing the conflict. Search through the whole file for any occurrence of ======= (seven equals signs) to find what you need to fix. When you are done, run git add on the file. When you have fixed all conflicts, run git commit to create a merge commit with two parent commits (namely the previous commit at HEAD, and the most recent commit on origin/master, e291cbe622aed15b1c9cc568c4fe8a8a6e3d684f in this case).

Don’t forget to run git push after you have created the merge commit. This time the push will succeed, because the merge commit is now a descendant of the master branch in the origin repository.

For more information, see the Branching and Merging section of the git book.

Changing AFS permissions on a git repository

A git repository, including bare repositories, contains lots of subdirectories. If your team changes membership, you will need to edit the permissions on every single subdirectory of your git repository. Unfortunately, the fs sa command doesn’t have any kind of recursive option. Hence, you will need to use a combination of the find and xargs utilities to do this. To add the user xxx to your repository and remove the user yyy, you can use the following command:

find pintos.git -type d -exec fs sa -dir {} -acl xxx rlidwka -acl yyy none \;

Don’t forget to add and remove the l permission on your home directory as well:

fs sa -dir ~ -acl xxx l -acl yyy none \;

The digests aren’t actually collision-resistant because git’s original author, Linus Torvalds, wasn’t especially forward-looking about crypto agility, but you should think of the digests as collision-resistant to understand the model. It’s probably okay to ignore the possibility of collisions so long as you don’t interact with untrusted repositories.↩︎
Technically the full name of the branch in the remote repository is refs/heads/master and the full name in the local repository is refs/remotes/cs212/master. Git lets you omit the refs/heads and refs/remotes in most cases, but you will see them if you manually examine or edit your .git/config file, which contains directives on where to fetch and push branches.↩︎
HEAD is known as a symbolic reference, which names another branch rather than a specific commit. This level of indirection is similar to symbolic links in the file system, which name other file names rather than specific files. You can use the git symbolic-ref command to examine and modify what symbolic references point to.↩︎