Stop being a git – Start using Git!

While it took Linus Torvalds just 2 weeks to initially create Git and start using it to manage the Linux Kernel source code,  this mere mortal is still struggling to fully grasp the simplicity of Git after dabbling for 3 weeks.

To-date, I’ve been using a handful of the git commands, but without really understanding how Git works: as such, the consequences of some commands were a bit of a mystery.  I didn’t appreciate how Git handles and tracks changes, nor where everything lived. After 3 weeks and two branching debacles, of my own doing, it’s time to stop hacking around the fringes and get over this learning curve…  Maybe debacle is an exaggeration, realistically I couldn’t follow what was going on when I was cloning, branching, committing, staging or simply working.

Mistake #1

I didn’t realise that my database isn’t in the repository (it’s in .gitignore) – and doesn’t need to be for development.

I did a couple of Rails migrations that I wasn’t happy with, but instead of simply reverting to an earlier commit from Git, I decided to directly modify the Database, without getting help from Rails. And then I deleted the offending migration file, before trying to run the Rails db reset – all in all, not so smart.

Lesson: I should have simply reverted to an earlier commit from my Git Repository using: git checkout. Then I should have used my RoR option rake db:reset which would have restored the database back to the corresponding code version, and re-populated it from my seed data.

Mistake #2

Perhaps less of a mistake, rather than confusion about the untracked files. Having used Rails to create the scaffolding for a new class, I realised I hadn’t followed the conventions for naming standards (neither case nor pluralisation). Without staging the changes, I tried to checkout the master, only to find the untracked files still hanging around.  And this led to the confusion, that led to a chain reaction of actions that weren’t so smart.

Lesson: Don’t panic – work with Git to recover. Either commit to the current branch, then revert to an earlier commit, or stash thee working project without staging or committing. If I want, I can return to the stash, or simply ignore it, and revert to a reliable, committed version of the code!

In this case, ignorance is not bliss! So, it’s time to stop being such a git, and start using Git, through a better understanding!

As always, there’s a huge amount of information freely available – it’s a matter of sifting through those that make sense to me.  I found the following sites helpful to get to this point, and which I expect to continue to use.

I have barely scratched the surface of Git, but I think I’ve garnered enough insight to intelligently use most of the basic commands, and to recognise when I need to extend my learning before tackling the more advanced and more powerful (read dangerous) capabilities!

  1. The git-scm homepage – check out “About” for an overview.  I found this more useful at the end of this post.
  2. The free Pro Git book is excellent (also from git-scm). Chapter 2 in particular clearly explains Git states and transitions.
  3. Watch Linus Torvalds talk about Git. Although, it runs for just over an hour, it is very helpful to understand the tool.
  4. John Wiegly documented his learnings in a paper: Git from the bottom up.
  5. Ongoing, the Git Reference from GitHub, is well organised around actions and commands.
  6. The static (git) heroku cheat-sheet – or the contextual cheat-sheet.
  7. Six revisions provide a nice summary of 7 lessons for Git beginners.

Now, how to distill all that information into something I know how to use… Unfortunately, unlike Git, some of the items committed to my memory can’t be reliably retrieved without corruption or loss of key data!  Therefore this post helps me learn, as well as serving as a repository so that when my memory inevitably fails me, all is not lost.

My key Git learnings:

  • Git is Distributed means:

I have a complete Git Repository on my MacBook, which means that I can revert to any of the committed versions of my codebase, at any time.

If I also decide to store my project on the Server (on the home network), I can clone it to the server, which puts a complete copy of the Git Repository on the Server as well. Not necessary in my case, but nice to have a remote backup. Obviously this can apply in reverse – start on the server then clone to my machine.

There is no central repository from which I check out files – I have a complete copy, even if I have a repository on the server as well.

These advantages probably derive more benefits to projects with multiple developers, multiple releases etc.  But there are advantages for me, with Git supporting branching and merges so I can create multiple playgrounds as required.

  • Files: clarifying Project vs Git files:

Getting my head around the repository versus the project was a bit of a sticking point, in part confused because I can create my Git Repository two ways, and in part because I didn’t understand how the repository files and my project files related:

Scenario 1 – I start or have a local project: I want to put it into Git

I create a new RoR Project, say, <My_New_Project>, which creates a folder of the same name and creates a stack of RoR project files & folders in that self-named folder.

Next, I want to manage my source code via Git, so  I create a Git Repository, using:  git init from within my project folder.

Git creates the repository in my Project folder, and is now aware of every file in the My_New_Project folder.  However, none of my project files are actually in Git yet: not until I add them to the staging area, and then commit them to Git (see comments on Git Status, below).

Scenario 2 – I want to work on an existing project, from a remote or local Git Repository:

There is already a Git Repository somewhere (say, on the server, or in GitHub, etc…), that I want to work on or leverage.

I need to clone the source git repository onto my machine, using:  git clone  naming the source (e.g. source Repository Name, including Server Path, or GitHub reference).

The git clone command, both copies the source repository onto my local target, as well as unpacking the project from the most recent commit point in the repository.

I end up with a local copy of the entire project to work on. I can name my project & repository anything I like (same as the source or not) depending on my intent.

At this time, every file in my cloned project is in synch with the Git Repository.

For the 2nd scenario, I could also have done a copy from the source to my target, then cloned it. Alternatively I could have used Aptana Studio 3 to get the clone, but I’m still unclear where the Aptana Workspace is, and hence where the project code ends up being unpacked to. Therefore, I am more comfortable first doing the cloning via the command line, from the Target folder on my machine, then importing my project into Aptana as an editor.

So  a quick summary of what files live where (in my simple MacBook development world):

    • My Project: comprises files/folders such as: Gemfile, README.textile, config, app, db etc. stored in a self-named project folder.
    • The Git Repository: comprises files/folders such as: HEAD, config, hooks, objects, refs, branches, description, info, packed-refs. My repository is usefully named <project name>.git, when initially created for a project.
    • The .git repository lives immediately under the project folder, but has to be unpacked to make the project files accessible, i.e. via git clone. For example, my new RoR project, committed in Git would contain:

git and project files

  • File Status in for Git  has two perspectives:

Tracked or Untracked:

This is simple, any files floating around in my Project folder (or sub-folders) have either been specifically added to the Git Repository (tracked) or not (untracked).

Once my project is in Git, as I add new files to my project, these need to be specifically added to the repository.  I use:  git add .  to add all untracked files, or I can scope that command to nominate specific files.

Alternatively, if I do need to remove a file from the Git Repository, I use git rm -f <filename>.  Next time I commit, that file will be removed, as I had to do with the .DS_Store file (more than once!!!).  If I plan to keep the file in my project, but simply don’t want it in Git, make sure I list the file in my .gitignore file.

The Tracked files are managed through the following Git file states, under normal flow:

Unmodified -> Modified -> Staged -> Committed (aka Unmodified)

As soon as I change a Tracked file in my project, Git recognises that file as Modified.

I have to tell Git if I intend to update the Repository with the modified file, using git add (either with the . or by file name).  Once I do this, the new/updated file is considered “staged”, which means that the updated/new file has been indexed with git, and is ready to be committed.

To actually commit the staged files, I must use the git commit command using: git commit -m “my commit comments”. It is worth making really useful comments with the commit – because I can always go back and see each of the commits, which allows me to recover the code from an earlier state if I need to.

The commit command (as above) only commits those changes that are “staged”.
So if I stage all my files, then quickly make another change to one of those files before running the commit, that last minute change will not be committed, unless I first “add” it to the git index (aka stage it).

Alternatively, using:  git commit -a -m “my commit comments” will both stage the modifications to tracked files, and commit those changes in a single step.

  • Using the 3 Git States to compare changes:

So I have 3 states for files in my Git Repository: Unmodified (as per committed); Modified; or Staged.

I use git status to determine which files are in which status (which will also show me if I have untracked project files that aren’t in Git).

For a more detailed review, I ask git to list the specific line by line changes in each impacted file.

    • Compare Staged vs Commit => git diff –staged,  (this lists each change sitting in staging, vs most recent commit versions).
    • Compare Unstaged changes vs Staged => git diff  (lists each change in tracked files, that aren’t yet in Staging).

I can use this to understand exactly what I’m doing in a particular commit

  • A quick comment on Branching: 

The way I’m branching with Git:

    • I use the master (default) branch to hold solid state code (i.e. UT & FT complete).
    • I use branches for each new function, or enhancement.
    • Once I’m happy with a branch, I merge that branch back into master (i.e. the code has reached a logical functional goal, and UT & FT are complete)

So this is a pretty simple “run-ahead” branching approach, given I’m the only developer.

Once a branch has been merged back in, remove that branch using git branch -d <branchname>. Using -D is a hard delete, to be used if I have a branch that I want to delete without merging.

  • Committing or reverting to safety:

I finally get this bit… the trick was understanding which Git state the unwanted  / changed files are in, at the time I decide I really don’t want the changes in my code base.  In particular, if I have any uncommitted or untracked changes, Git will prevent me from successfully switching to a different branch. I have to do something with them! In all cases, using git diff will help with the decision making.

I see the following options to recover from unwanted changes:

1. Lose the changes:

Proceed with the commit, then simply revert to an earlier branch. Then (optionally) delete the offending branch (noting it will need to be a force delete given the branch isn’t merged). Simply use git checkout, to go back to a different branch, or even back to an earlier commit on this branch, as explained by iLoggable.

2. Fix the changes, by replacing unwanted files

Use git diff to make sure I know which files I want to recover. Commit the unwanted changes, then checkout the specific files I want to recover, from an earlier save point, using the SHA-1 hash label, into my existing branch. Once committed, all should be good.

3. Stash the changes:

The git stash option takes a snapshot of my project, with all the untracked and uncommitted changes, and puts it on the stack. After git has stashed my project, I’m now working back at the state from my most recent commit in my current branch.  Some programmers use stash at the end of every day, because it does let you get at uncommitted changes at a later time, if needbe. 

git stash save lets me “dump” everything, as-is, but excludes untracked files.  I can then choose to revert to the stash and decide what to do about it, at my leisure (or not).

Alternatively, using git stash save –include-untracked also includes the untracked files in the stash, removing them from the residual project space, which leaves the project really clean.

To find a stash, use git stash list if I have more than one stash, (note that {0} is always the most recently added stash).

To retrieve a stash, use git apply – I only need to scope this if I plan to retrieve an older stash.

Even better, I can retrieve a stash into a new branch, e.g. git checkout -b new-branch stash@{0}.

So, in conclusion… not so many new lines of code this week (or there were, but not so many that I kept!).  But I’ve at least made a bit of progress with Git, so I should be better at recovering from the inevitable upcoming coding errors! I know there is much more to Git than this, but for now this should keep me out of trouble 🙂

On y va!

One thought on “Stop being a git – Start using Git!

  1. Great to see blogs appearing more frequently, although this one was pretty dry Tricks.
    So my first issue was what is this thing called git? I’ve known a lot of gits in my time, some pretty stupid gits (some downright ignorant in fact), and some clever ones.
    However I flicked through the blog and rapidly came to understand Git in this case is a distributed version control and source code management system with an emphasis on speed. Furthermore it appears that every Git working directory is a full-fledged repository with complete history and full version tracking capabilities, not dependent on network access or a central server. Interesting but a logical approach when you think about it. I can see why you chose to use it……………
    Grandpa Lambert

Leave a reply to Grandpa Lambert Cancel reply