Tuesday, April 29, 2008

Not Using Source Code Control is Insane!

"Like skydiving without checking your parachute or having sex with a stranger without using a condom." -- me, earlier today
I've worked for a few places that didn't use source code control properly or even at all before I started. But today, I met a new PHP programmer who didn't know what I was talking about when I mentioned source code control. He recently migrated over from writing and graphic design and has no background in computer science or software engineering so it isn't his fault really. The fault lies with those that brought him into the fold.

As far as I can tell, I'm somewhat strange - I'm passionate about source code control (now noted as SCC) and have tried a few new ones just to see if I like them. Process is important and while I don't believe in crazy ISO-9000 level of process (then most of your work is to maintain the process rather than produce value) but you need some or there is only chaos. Go check out "The Joel Test", I'll wait.

I don't necessarily believe that all of them are necessary or sufficient. But what is #1? That's right SCC. SCC is so important, many word processors (like word) embed version tracking into the software.

Why do people insist on using nothing or tarballs/zipfiles to ensure the safety of business/personally critical data? One place I worked used zip files but also required you comment out changed code with a reason plus add comments for the new code. This resulted in files where there were pages of code with ocassional live code (it was unreadable). SCC systems do this for you.

What is SCC?

In essence, a SCC tracks changes made to files and directories and allows users to (among other things):
  • Checkout (aka get) the current version or some version from the past of those files/directories.
  • Check-in (aka save) your current changes to the SCC to be saved for posterity.
  • Revert changes you don't like.
  • Find out who made changes and why.
  • Do differences on what exactly has changed either between past versions or with what you have done since the last check-in.
  • Manage multiple people working on the same files and directories.
  • Give the ability to mark and later retrieve files/directories related to some important event like a release.
There are several main paradigms for SCC:
  • Locking vs Merging - Some SCC tools require you to mark a file for editing such that it is locked and no other user can commit changes until you release the lock. Other SCC tools don't require locking but will require (or do it for you if simple) merging if there are conflicts. I have found the locking model interrupts my workflow because I may not know all the files I'll change ahead of time, stopping to lock a file gets in the way, and someone might lock the file I'll need and then need a file I have locked (classic deadlock problems). Merging might sometimes be a pain, but it is better than the alternative.
  • Client/Server vs Peer to Peer - Most SCC has a server that holds the "one true version" and all clients must submit to that server. A few systems use a peer-to-peer structure where the only "one true version" is decided by social convention not technology.
For more information check out:

SCC and the Individual Developer

Even if you work by yourself for work or fun you should use SCC. The few times I haven't, I often regretted it. The first reason is that this helps develop good habits. Secondly, this makes it so much easier to undo mistakes should they happen.

Imagine you are making some changes to a game you wrote. The changes, while simple, are invasive and touch numerous files over the course of a day and you were interrupted numerous times. Sadly you made an error that causes saved data to become corrupted but the error could have been introduced anywhere and you can't remember exactly what you changed. SCC allows you to quickly find out what files changed, what the changes were, and if necessary revert back to good code and start over. Without SCC, you might be debugging for a long time.

SCC Tool Recommendations

This is based on tools I've used for long periods of time for production code with the exception of Git (many tools are omitted):
  • Perforce - This is a commercial product I used long ago. It can use locking or merging. I don't use it now because it costs money, I'm cheap, and there are great open source alternatives.
  • Visual Source Safe (VSS) - This pile of steaming... rubbish is given away by Microsoft. It uses a locking model by default. The problems with VSS are that it often corrupts large codebases and Visual Studio integration (up too 2005 I believe, or beyond) puts little snippets in your project files to handle integration. If you are ever kidnapped to a tropical island by a criminal mastermind with a claw and a cat and are told you must use VSS or be given to a bunch of cannibals, choose the cannibals because it will be less painful. I have heard from some insiders that Microsoft doesn't use VSS. If the producer of software doesn't use their product (and they could), then I don't feel it is wise using it either.
  • Vault - A VSS commercial clone (plus bonuses) of VSS. While this tool never corrupted my code like VSS did on a regular basis, I dont' much like how Vault works, so I don't like using it.
  • CVS - An open source merging SCC used for years by many open source products.
  • Subversion - A newish SCC that is much like CVS but it offers atomic commits (everything goes or nothing - so you don't have partial commits), versioning directories, better brancing/merging, and other improvements. If you want to use a client/server SCC, I'd recommend this (despite what Linus says).
  • Git - This is the only distributed/peer-to-peer tool on my list. I've only been using it for a few weeks, but thus far I like it. Git was quick and easy to setup and does everything I expect in a SCC with the added bonus of being really fast (partially because there is no server to talk to) One of the selling points is better branching/merging but I have yet to do much of that.
Closing Comments

I'll end with a few recommended best practices:
  • Use source code control
  • Check-in as often as possible without breaking the build.
  • Keep your commits small.
  • Add a useful (but short) message to each commit on why you made the commit (not what you changed since the SCC handles that). This should include a bug number if available.
  • Tag all your releases in SCC
This brings my SCC rant to a close. I feel much better now.

1 comment: