Github introduction for my colleagues¶

Tags: python, nelenschuurmans

We are moving to github. I haven’t done much with git and tend to get confused now and then. And I’m more comfortable with mercurial. But… I get to give an introduction/explanation on git this afternoon for my colleauges anyway.

Perhaps this relatively fresh look at git/github can help others, so I’m writing it down here. Handy in any case as one of my colleagues isn’t here now as he’s preparing for a marathon :-)

Oh, and I’ll probably get things wrong in here as I’m not familiar enough yet. So consider yourself warned.

Basic mental model of how Git/github works¶

The basic concept of git, paraphrased a bit:

You’ve got multiple respositories in various places.
Every repository is basically a big bucket of changesets and a handful of pointers.
Git effectively starts out empty and and applies a string of changesets and ends up at a directory filled with your source code.
Which string of changesets? That is determined by the pointer. You point at a certain changeset and you get that one and its parents. (They all have a pointer to their parent).
A pointer can be a tag or a branch or even a pointer at something in another repository.

“Multiple repositories in various places”? Yep. A project on github is a repository. If you grab a local copy (=you clone it), your local copy is also a repository with all the contents. If you “fork” a project on github? Yep, a full copy. Getting stuff from one to the other means “push” and “pull”.

But if your local thingy is a full repository, including all branches/tags/whatever, what ends up in your actual code directory? This is because git has two layers (and effectively three if you count a remote repository):

Your actual code directory. This is your visible directory. The git name for it is your “working copy”.
The full contents of the repository including pointers and changesets. Hidden in your working copy in the .git/ directory. When you grab stuff from a remote repository, it ends up in here. And when you push, it is the contents of the .git directory that you push over.
One or more remote repositories. (Technically those repositories’ indexes: you can’t muck around directly in someone else’s working copy, of course).

Let me sneak in one extra layer:

The so-called “index”. These are the local changes you’ve collected for your next commit. So if you commit them, the index is empty again and what’s in your index is now in your repository.

So, compared to subversion, you’ve got one extra layer to keep track of, which of course complicates the mental model of what’s happening. With subversion, you only have a central repository and a local working copy. Git (and mercurial, bzr and the rest) add that full local repository (git’s “index”).

Most basic git commands¶

Here are the git commands I’ve personally used in the last few weeks:

git status
git diff
git push
git add
git commit
git show
git mv
git help
git pull

(Output of history|grep git|awk '{print $2 " " $3}'|sort|uniq -c|sort -nr|head).

I won’t repeat git’s own documentation here. You’ll probably have to do quite some googling before you’re comfortable with those commands. I’ll have to do that googling myself, too.

Some comments on most of those commands that might help. Note that I’m using italics explicitly here to differentiate your repository from the remote repository (assuming you have a remote repository at github). Here you go:

git status. Relatively helpful, as it often suggests what you need to do. If files are changed, but they haven’t been “added to the index” yet, it suggests that you might want to git add them. The index is what ends up as your next commit, so this is the set of changes you want to stuff into your repository:
```
$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   MANIFEST.in
#
no changes added to commit (use "git add" and/or "git commit -a")
```

git diff. All by itself, without options, it is the difference between your local working copy and the index. So the changes that you haven’t marked yet for inclusion in your next commit:

$ git diff
diff --git a/MANIFEST.in b/MANIFEST.in
index c57b4ff..fe6332b 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,2 +1,2 @@
 include *.rst
-recursive-include lizard_map *.rst *.py *.html *.css *.js *.jpg *.png *.gif *.pdf *.shp *.json
+recursive-include lizard_map *.rst *.py *.html *.css *.js *.jpg *.png *.gif *.pdf *.shp *.json *.po *.mo

After adding a couple of things to the index, you probably want to review what you’ve “staged for commit in the index”:

$ git diff --staged
diff --git a/MANIFEST.in b/MANIFEST.in
index c57b4ff..fe6332b 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,2 +1,2 @@
 include *.rst
-recursive-include lizard_map *.rst *.py *.html *.css *.js *.jpg *.png *.gif *.pdf *.shp *.json
+recursive-include lizard_map *.rst *.py *.html *.css *.js *.jpg *.png *.gif *.pdf *.shp *.json *.po *.mo

git add. Just add filenames as suggested by git status. A helpful variant is git add -u, this automatically adds changes to all already-known files. (-a adds all changes, but at the risk of including unwanted files).
git commit. This stuffs the changes you added to (=staged in) the index to your repository.
git push. This pushes what’s in your repository to the remote repository. (Assuming git knows what your remote repository is and which branch in the remote repository it has to talk to. And when the push doesn’t break the remote repository. And when you’ve got access to that remote repository.)
git pull. This is basically a combination of git fetch followed by git merge. Fetch fetches the current state of the remote repository (all changesets and pointers), merge merges those changes (=*your* repository) into your working copy. So just do a git pull to grab the changes of the remote repository and get them in your repository and your working copy.
git help. git help some_command or git some_command --help both do the same. Quite decent documentation, actually! Use “git help” often when starting out with git!

Don’t forget you can add -v to get more verbose output of what git is doing. This can be very helpful to see what’s happening! Compare:

$ git pull
Already up-to-date.

with the more explanatory:

$ git pull -v
From github.com:lizardsystem/lizard-map
 = [up to date]      alexandr-workspace-changes -> origin/alexandr-workspace-changes
 = [up to date]      gijs-nepal -> origin/gijs-nepal
 = [up to date]      master     -> origin/master
 = [up to date]      reinout-api -> origin/reinout-api
 = [up to date]      reinout-tastypie -> origin/reinout-tastypie
Already up-to-date.

Github remote repository¶

When you create a new repository on github, you get a “clone url”. Something like https://github.com/jcrocholl/pep8.git. To get a local copy, clone it:

$ git clone https://github.com/jcrocholl/pep8.git
Cloning into pep8...
remote: Counting objects: 861, done.
remote: Compressing objects: 100% (369/369), done.
remote: Total 861 (delta 424), reused 833 (delta 398)
Receiving objects: 100% (861/861), 160.20 KiB, done.
Resolving deltas: 100% (424/424), done.

This sets up some defaults in your repository. “Your repository” means that “pep8” directory that got set up (and more specifically the .git/ directory inside it).

Some of the defaults that are helpful to know and that partially explains why several of the commands I showed don’t need that many options:

git branch shows just a “master” branch. That’s “trunk” in svn-speak. The star in front of it shows it as the default branch:
```
$ git branch
* master
```
git branch -a shows also the remote branches:
```
$ git branch -a
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/master
```
That remotes/origin/master remote is a pointer at the master (so: the trunk) of the remote github repository.
Your respository’s “master” branch has been configured with the remote
repository’s “master” as its default push/pull location. So git pull grabs new revisions from github into your repository and updates your working copy.

Integration with checkoutmanager¶

Many colleages (and others!) use my checkoutmanager tool. Based on a config file, it manages your various svn/git/hg/bzr checkouts. Normally you have a couple of directories with checkouts and doing “svn up” in all of them is a bit of a chore. That’s where checkoutmanager comes in. checkoutmanager up does an svn up in all your svn directories.

And…. for git it does a git pull. Yes, it supports git.

checkoutmanager status does git status in your git directories that it knows about.

Very handy: checkoutmanager out, which tells you which of your respositories have commits that you haven’t pushed to github yet!

As a quick example, here’s a snippet from my .checkoutmanager.cfg:

[git]
vcs = git
basedir = ~/git/
checkouts =
    git@github.com:reinout/pep8.git
    git@github.com:reinout/gitignore.git

[nensgit]
vcs = git
basedir = ~/git/nens/
checkouts =
    git@github.com:lizardsystem/nensskel.git
    git@github.com:lizardsystem/lizard-ui.git
    git@github.com:lizardsystem/lizard-map.git

Integration with buildout¶

We use buildout for all our projects. Handier than pip/virtualenv as it provides some extra functionality like easy django setup, apache config files from templates, cronjob setup and so on. And… it has mr.developer. A buildout extension for managing a couple of git (or svn/bzr/hg) checkouts inside your project.

Because…. git and mercurial don’t have something that works like svn:externals. Or at least that doesn’t work as nice/reliable/integrated/whatever. But mr.developer takes care of that in buildout. Here’s a relevant snippet from a buildout.cfg:

[buildout]
...
extensions =
    mr.developer
    buildout-versions
parts =
    mkdir
    django
    ...
develop = .
...

[sources]
lizard-ui = git git@github.com:lizardsystem/lizard-ui.git

...

This does three things:

It adds a bin/develop command for managing your sources.
It adds a src/ directory where it places any checkouts/clones it makes.
bin/develop now knows that the “lizard-ui” package is a git checkout with a certain clone url.

mr.developer does a couple of things. Here’s the help:

$ bin/develop -h
usage: develop [-h] [-v]  ...

optional arguments:
  -h, --help        show this help message and exit
  -v, --version     show program's version number and exit

commands:

    activate, a     Add packages to the list of development packages.
    checkout, co    Checkout packages
    deactivate, d   Remove packages from the list of development packages.
    help, h         Show help
    info            Lists informations about packages.
    list, ls        Lists tracked packages.
    rebuild, rb     Run buildout with the last used arguments.
    reset           Resets the packages develop status.
    status, stat, st
                    Shows the status of tracked packages.
    update, up      Updates all known packages currently checked out.

Once you want to work on the “trunk” of one of the packages managed by mr.developer, check it out (which also activates it):

$ bin/develop checkout lizard-ui
INFO: Queued 'lizard-ui' for checkout.
INFO: Cloned 'lizard-ui' with git.
INFO: Activated 'lizard-ui'.
WARNING: Don't forget to run buildout again, so the checked out packages are used as develop eggs.

As mentioned in that message, run buildout again and it will be installed as a development egg.

bin/develop status shows the current status of the checkouts, but the output isn’t very clear. Run bin/develop status --help to see what it all means.

bin/develop update does a git pull for every package.

Creating a new project¶

Starting a new project and getting it in github is pretty simple:

First create it in github (just clicky-clicky on the github website).
Then set up your project locally (we use nensskel).
Run git init in that directory to turn it into a git directory. Add/commit/adjust where needed.
Do a git push to the clone url you see in your git webpage!

Upcoming: branches and workflow¶

What I haven’t covered yet: branches and workflow. I’ll do that in a later article.

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):