Note: This blog post was originally posted to an internal SRS blog on February 09, 2010. The post was intended to address specific issues, but I do strongly support the idea of “commit early and often” as a general principle.
Source Code Management
Source control is a fundamental part of software development. The benefits of using a source control management (SCM) system are numerous and worthy of their own blog post. But, I have noticed two significant problems with the way that SCM is currently being used on many of our projects:
- ChangeSets are frequently too large
- ChangeSets often contain code that shouldn’t be committed
ChangeSets Are Too Large
I am frequently guilty of working for days on a particular task without committing any changes to source control. I like to wait until my task is completed. I don’t want to break the build, and I don’t want to commit broken code that might impede others. But, the biggest reason that I avoid committing my working code is that I don’t want anyone to see it until I’m finished.
There are several problems with monolithic commits, including:
- Integration headaches: large ChangeSets increase the odds that changes will conflict with someone else’s changes
- Useless file history: comments on large ChangeSets are, of necessity, more vague and less likely to convey useful information
My preferred version control software at the moment is a DVCS. DVCSs offer many benefits over traditional SCMs, but one of the best is easy branching and merging. A DVCS allows me to work like this:
Branch per Task
Every development task is a new, independent branch. Tasks are merged into the permanent main branch as they are completed.
(from Coding Horror)
Each time I am ready to begin a new task I create a branch for all work on that task. I generally have several active working branches that I can easily switch between. I check in code often to my working branch, rarely going more than a few hours between commits. Once I have finished working on a particular task, I merge my completed code back into the shared master branch. I am able to make frequent commits without breaking the master branch due to incomplete code.
Unfortunately, TFS does not easily support this style of development. Creating branches is inconvenient, and merging code between branches is torture. Although I would love to recommend that SRS adopt the style of development that I’ve described, I just don’t believe that it is feasible with current versions of TFS. Given the painful nature of branching and merging in TFS, I don’t see a better alternative to our current branch per-release strategy.
Given that TFS doesn’t provide easy branching and merging, here’s what we can do to find a happy medium. We should not check in broken code, but we shouldn’t hesitate to check in code that is incomplete. Especially for new functionality, there shouldn’t be any problem checking in a stub method that doesn’t do anything. There are very few, if any, situations where we would be unable to commit our working code to TFS once a day.
There will no doubt be times where checking in small, granular ChangeSets will not be practical. There will be times when some of our tasks require us to break the application (not the build though!) in order to complete a task. However, it is my belief that with a little planning these times should be brief and infrequent.
It would be foolhardy to ignore the problems that accompany frequent check-ins. As the number of developers working in the same code base increases, so too does the probability that someone will check in something that will disrupt everyone else’s work. This leads directly into my second topic: We must be aware of the code that we are checking in.
ChangeSets Contain Code They Shouldn’t
When it is time to commit code to TFS, it is not uncommon for developers to simply check every file listed in VisualStudio’s “Pending Changes” window and commit all outstanding changes. Although VisualStudio makes it incredibly easy to follow this bad practice (Why are all modified files checked by default?!?), we need to stop doing it. Sometimes debug code is committed and leads to problems that are only discovered after our customers have the release. Sometimes builds are broken as csproj and sln files are inadvertently modified. Sometimes it simply messes up the file’s history (TFS always increments the version and updates the file’s history, regardless of whether anything in the file has changed). These things should not happen. When checking code into SCM it is the developer’s responsibility to verify every change that is being made. The developer should diff all changed files and verify every change that will be committed.
If anyone has found and enabled the option in VisualStudio to “Check in everything when closing a solution or project,” please disable it immediately. No good can come from that option!
In some cases you may imagine that these procedures don’t apply to you because you are the only developer on your team. That is a false assumption. Code should always be written for the long-term. Code should always be written and commented in such a way that another developer can pick up your tasks at any time. Julian Bucknall, the CTO of Developer Express, recently posted a thought to his blog that precisely expresses the point I am trying to make: Assume your code will be public. I am as guilty as anyone — probably more so, actually — of some of the bad practices that Mr. Bucknall describes. Edge Legacy is full of funny names and informal comments that I wrote to amuse myself. As we consider publishing more of our APIs for external consumption, it is increasingly important that the code we write properly represents the professional nature of SRS and increases the trust that our customers have in us.