Source Code Management

Note: This blog post was originally posted to an internal SRS blog on February 09, 2010. The post was intended to address specific issues, but I do strongly support the idea of “commit early and often” as a general principle.

Source Code Management

Source control is a fundamental part of software development. The benefits of using a source control management (SCM) system are numerous and worthy of their own blog post. But, I have noticed two significant problems with the way that SCM is currently being used on many of our projects:

  1. ChangeSets are frequently too large
  2. ChangeSets often contain code that shouldn’t be committed

ChangeSets Are Too Large

I am frequently guilty of working for days on a particular task without committing any changes to source control. I like to wait until my task is completed. I don’t want to break the build, and I don’t want to commit broken code that might impede others. But, the biggest reason that I avoid committing my working code is that I don’t want anyone to see it until I’m finished.

There are several problems with monolithic commits, including:

  • Integration headaches: large ChangeSets increase the odds that changes will conflict with someone else’s changes
  • Useless file history: comments on large ChangeSets are, of necessity, more vague and less likely to convey useful information

My preferred version control software at the moment is a DVCS. DVCSs offer many benefits over traditional SCMs, but one of the best is easy branching and merging. A DVCS allows me to work like this:

Branch per Task

Every development task is a new, independent branch. Tasks are merged into the permanent main branch as they are completed.

Branch Per Task

(from Coding Horror)

Each time I am ready to begin a new task I create a branch for all work on that task. I generally have several active working branches that I can easily switch between. I check in code often to my working branch, rarely going more than a few hours between commits. Once I have finished working on a particular task, I merge my completed code back into the shared master branch. I am able to make frequent commits without breaking the master branch due to incomplete code.

Unfortunately, TFS does not easily support this style of development. Creating branches is inconvenient, and merging code between branches is torture. Although I would love to recommend that SRS adopt the style of development that I’ve described, I just don’t believe that it is feasible with current versions of TFS. Given the painful nature of branching and merging in TFS, I don’t see a better alternative to our current branch per-release strategy.

Given that TFS doesn’t provide easy branching and merging, here’s what we can do to find a happy medium. We should not check in broken code, but we shouldn’t hesitate to check in code that is incomplete. Especially for new functionality, there shouldn’t be any problem checking in a stub method that doesn’t do anything. There are very few, if any, situations where we would be unable to commit our working code to TFS once a day.

There will no doubt be times where checking in small, granular ChangeSets will not be practical. There will be times when some of our tasks require us to break the application (not the build though!) in order to complete a task. However, it is my belief that with a little planning these times should be brief and infrequent.

It would be foolhardy to ignore the problems that accompany frequent check-ins. As the number of developers working in the same code base increases, so too does the probability that someone will check in something that will disrupt everyone else’s work. This leads directly into my second topic: We must be aware of the code that we are checking in.

ChangeSets Contain Code They Shouldn’t

When it is time to commit code to TFS, it is not uncommon for developers to simply check every file listed in VisualStudio’s “Pending Changes” window and commit all outstanding changes. Although VisualStudio makes it incredibly easy to follow this bad practice (Why are all modified files checked by default?!?), we need to stop doing it. Sometimes debug code is committed and leads to problems that are only discovered after our customers have the release. Sometimes builds are broken as csproj and sln files are inadvertently modified. Sometimes it simply messes up the file’s history (TFS always increments the version and updates the file’s history, regardless of whether anything in the file has changed). These things should not happen. When checking code into SCM it is the developer’s responsibility to verify every change that is being made. The developer should diff all changed files and verify every change that will be committed.

If anyone has found and enabled the option in VisualStudio to “Check in everything when closing a solution or project,” please disable it immediately. No good can come from that option!

Check in everything when closing a solution or project

In some cases you may imagine that these procedures don’t apply to you because you are the only developer on your team. That is a false assumption. Code should always be written for the long-term. Code should always be written and commented in such a way that another developer can pick up your tasks at any time. Julian Bucknall, the CTO of Developer Express, recently posted a thought to his blog that precisely expresses the point I am trying to make: Assume your code will be public. I am as guilty as anyone — probably more so, actually — of some of the bad practices that Mr. Bucknall describes. Edge Legacy is full of funny names and informal comments that I wrote to amuse myself. As we consider publishing more of our APIs for external consumption, it is increasingly important that the code we write properly represents the professional nature of SRS and increases the trust that our customers have in us.

See Also

Installing RMagick on Ubuntu 9.04 (Jaunty)

Installing the RMagick gem can be a huge headache. Reading the HOWTO on the RMagick site is enough to make anyone nervous. Thankfully the process is much easier on ubuntu however; you only need three commands.

DISCLAIMER: I’ve only tested this on Ubuntu 9.04 (Jaunty) server.

$ sudo aptitude install -y imagemagick
$ sudo aptitude install -y libmagick9-dev
$ sudo gem install rmagick

And you’re done! You can verify the installation using this irb command, taken from the RMagick HOWTO:

$ sudo irb -rubygems -r RMagick
irb(main):001:0> puts Magick::Long_version
This is RMagick 2.10.0 ($Date: 2009/06/19 22:07:05 $) Copyright (C) 2009 by Timothy P. Hunter
Built with ImageMagick 6.4.5 2009-06-04 Q16 OpenMP
Built for ruby 1.8.7
Web page:
=> nil

Using P4Merge with Team Foundation Server

I’ve found that the best way to deal with merging is to avoid it completely! Unfortunately that is rarely realistic. So, assuming you don’t want to take any radical measures to completely avoid merging in TFS, you should at least use the best tools available. My favorite merge tool is the freely available (and cross-platform) P4Merge.

Getting TFS to use P4Merge isn’t difficult but neither is it intuitive. For a merge operation P4Merge expects four files to exist:

  1. the original, base file
  2. file with conflicting change #1
  3. file with conflicting change #2
  4. final, merged file

Unfortunately TFS doesn’t create the merged file (#4) until after the merge tool is invoked. A simple batch script will solve the problem though. Save this as p4merge.bat.

COPY /Y NUL ""%4""
START /WAIT /D "C:\Program Files\Perforce\" p4merge.exe ""%1"" ""%2"" ""%3"" ""%4""

This script will create the merge file and invoke p4merge.exe.

Now you can configure TFS to use P4Merge by running this command from a Visual Studio command prompt: tf diff /configure

Visual Studio Command Prompt

That will bring up a dialog:

Configure User Tools

If an entry already exists for the Merge operation you can add it. Otherwise just modify the existing entry to point to the batch file we created:

Configure Tool

Note that you must set the command to be your batch file, not the executable.

And that’s it! Next time TFS launches a merge tool, it will use P4Merge.

Using git to avoid problems with TFS

For the past few months I’ve been using Team Foundation Server (TFS) at work. I’m certainly not a TFS expert; I probably don’t even quality as a power-user. But I’ve used TFS enough to have found a handful of things that I like about it. Revision control is not among those things.

As a software version control system, I dislike TFS intensely.

In the short time I’ve been using TFS I’ve had several problems with code that was merged incorrectly. I’ve seen problems where TFS silently allowed older versions of code to overwrite newer versions. I could probably fill an entire blog post airing grievances with TFS but I thought it would be more interesting to describe how I use git on top of TFS to solve some of these problems.

First, to use git to track a TFS repository it is really important that all your source code be on a Fat32 partition. TFS locks files and NTFS respects that lock. Fat32 will track the lock but doesn’t enforce it. This allows git to modify files (change to different versions of files) without necessarily having those files checked out in TFS.

Using TFS I checked out all my code into s:\src. I then created a new git repository in that same directory and added everything into the git repository.

For working I maintain at least two branches. My master branch always matches TFS. When I need the latest code from TFS I switch to the git master branch, pull from TFS then commit all changes into git. My working branch contains my current code changes. I also have one branch dev that contains a single commit consisting of all my debug code that should never be checked in to TFS.

When I’m ready to start coding I get the latest code from TFS and commit those changes into git’s master branch. I create a new git branch, working. I cherry pick my development code from dev into working. Then I do all my coding on that branch. When I need to get code from TFS I can swtich to master, update from TFS, check that code in to git then either merge or rebase the changes back into working.

Once all of my changes in working are complete I need to merge the changes back into master so that I can commit them to TFS. I can’t do a straight merge becuase my cherry-picked dev code would be included. So I have two ways of doing this:

  1. cherry-pick changes from working, applying them to master
  2. backout the development code (using git rebase -i) then merge changes back into master

After going through one of these two options I end up back on master with all of my code changes. I then commit the changes to TFS. Once that is done I delete working and recreate it from master next time I need it.

Working like this has been great for me. If there are conflicts when merging my code changes, git takes care of it. This way I can almost always avoid having to let TFS merge anything.

This is my general way of working but you can easily see how to apply these same principles when you want to work on multiple different changes using multiple different branches in git.

One thing to note: When you’re working like this git’s history isn’t great. This isn’t like git-svn where you get a seperate git revision for every svn revision. For me, using git with TFS isn’t about being able to track my changes over time. I just want to make sure that my changes aren’t lost and I don’t want to clobber anyone else’s changes.

Thoughts on Doing Contract Work as a Software Developer

As I was recently looking for new employment I spent quite a bit of time deciding wether I might enjoy doing contract work full-time. I enjoy working on different projects and learning new things but there is one major roadblock to becoming a full-time contract developer. My personality doesn’t let me write software that is anything less than my best.

WARNING: Gross generalizations and simplifications below. I’m not trying to offend anyone, just describe my experiences.

Most of the contract work I’ve done has been for people who are not technically savvy. They come to me with a very vague idea of the software they want. They expect me to tell them how much it will cost before we’ve discussed specific requirements. When the requirements are incomplete or incorrect they expect that I’ll just fix it without additional cost to them.

Not all of my contract experience has been negative. In fact, most of it has worked out quite well. Usually both my client and myself are pleased with the software and the cost of building it. But I’ve had enough negative experiences to be careful when considering a new job.

Part of the problem is that it is nearly impossible for anyone to completely define the scope of a project. There is always some miscommunication or misunderstanding, there is always some unforseen problem.

“You want me to setup a blog for your company? No problem, I can get WordPress setup for you in an hour.”

“Wait, I didn’t realize that by ‘blog’ you meant store front application that can accept payments, handle accounts payable, accounts receivable and inventory tracking. That will take slightly more than an hour.”

That kind of situation actually isn’t bothersome to me. As a contractor it is part of my job to understand what you want before making a bid. If a potential client obviously doesn’t know what they want, I can either decline to bid on that job or I can adjust my bid to account for a large amount of unknown. I don’t love it, but that type of risk is manageable.

The part of contract work that I dislike is being forced to compromise quality. When I’m working on a fixed cost contract, it is in my best interest to deliver exactly what is specified, as quickly as possible. As long as my client is reasonably happy with the deliverable, I am going to get paid $20k regardless of whether it took two days, two weeks or two months to create. I don’t get additional money for clean code. I don’t get extra for having good test coverage.

When I complete a project more quickly than I had anticipated there is no problem. I can spend time verifying that the code is tight and that everything is working as expected. But if I am running behind schedule, it becomes more difficult to care about testing the code or fixing “little” bugs.

There may be a bug in the code where order totals aren’t calculated correctly, but what are the odds that my client will notice the bug before he signs off on the project? If he does find the problem and I correct it, will he think to test for that same bug in every release?

This is the dilemma that makes contract work difficult for me. If I see a bug in my code, I’m going to fix it. If I’m writing a tricky or important calculation (like calculating totals), I’m going to write a test. I need to have confidence that my code is doing what I expect. I’ve never shipped any software that didn’t have a list of known bugs but I have also never shipped any software in which I didn’t have a high level of confidence that it was working correctly.

For me, doing the bare minimum isn’t an option for two reasons:

  1. Quality is extremely important to me. I can’t just hack something together that meets the contract requirements. When I write software, I want to deliver my personal best.
  2. Most of the time, the fast/crappy way of implementing something simply doesn’t occur to me.

I understand a company’s need to understand cost before approving custom softare. But if you want me to do contract work, pay me on an hourly basis. I’ll give you a projected timeline for project completion.

With an hourly rate, you only pay me for the time I actually spend working. With an hourly rate I know that I won’t lose money just because I insist on high-quality code. We’ll both be happier in the long run.