Saturday, December 30, 2006

the case for distributed revision control

in the world of revision control, the debate between centralized (cvs, svn, etc) and distributed (git, mercurial, bzr, etc) has been going on for a while. from the distributed camp i've heard arguments about the blessings of being able to branch wildly, of patch management bliss and of file storage formats.

recently i started using git for a few things and i have to say it is fast, compact and rather easy to use. (this is a combination other distributed rc's i've tried haven't offered) i've come to the personal conclusion that git is a very nice tool that may even be able to meet our demanding and whacky requirements in the kde project in a few years time. maybe sooner. and this got me thinking: why would we use such tools?

the answer came to me as i travelled to various locations on the planet this year.

the internet is absolutely core player in free software development. it is how we collaborate at every stage: irc, email and source sharing. email is just fine over a slow and intermittent connection but something like subversion is hellacious to try and use in such situations. since it requires you to be online to update and commit, it also means that you need to be online whenever you are doing development. there's also a non-trivial amount of network traffic that gets generated.

so where does my travelling fit into all this? i've been in a number of very populous countries where such internet connections simply aren't the norm, particularly not in private homes. which means using tools like subversion is a blocker to participation.

if we really want to reach out to people who have the knowledge, skills and drive to get involved but who have slow (by broadband standards) dial up access that they pay for by the minute we will need to move towards systems that allow people to work offline as much as possible.

for me, this is where tools such as git shine. you can work in your local repository without a network connection just fine, including committing, branching and the whole bit. you only need to go online to send patches and pull updates. patches by email are manageable even on dial up and git's speed means that pulling updates can be economical.

furthermore, git's on-disk storage format makes it friendly for use on systems with smaller disks and its performance makes it more than usable on slower systems. i haven't looked at its memory consumption, so perhaps that an achiles heal.

so, long story short: reaching out to the just-getting-wired world means we need to look beyond centralized online systems. until there's broadband everywhere, and that's not going to happen in the mid term, we may want to think about our tools in that light.

distributed bug systems are probably equally attractive for these same reasons.

10 comments:

Quintesse said...

I absolutely agree. I have been using Mercurial for some months now and I love it. You can just copy repositories around, push and pull changes to/from them and even zip them up and send them by email if necessary (the recipient being able to just unzip and merge the changes with his own repository).

Personally I only need a good plug-in for Eclipse for my Java work and I'd be a happy camper.

Tassilo Horn said...

I have to agree. Since some month I'm working on two projects that use darcs. It's really nifty and offers many possibilities you don't have with centralized systems.

One really big win is that it attracts new developers, too. Checkout the repository, work on it, let darcs create a patch and send it to the mailing list.

Mikael Eriksson said...

With git 1.4.4 you can use svn as a backend using the git-svn command.

That way you can get offline access to your svn repos to.

tim said...

Like everyone else who has commented, I also agree. While distributed version control seems interesting enough to me, all I really want is Subversion but with the ability to do offline versioning that is then committed when I get access to a network again.

Theo said...

Another "semi-decentralised" option compatible with SVN is SVK. I say semi-decentralised as whilst you have a personal repository where you can branch and commit, your changes have to go back to the central svn repository before anyone else can see them; there is no facility for sharing through an ad-hoc network without internet access. The command set is almost identical to svn, so switching from svn to svk is very easy. It also has the advantage of considerably speeding up synchronisation to the main repository.

phaylon said...

For those who'd like a distributed subversion, take a look at SVK

Aaron J. Seigo said...

i've not heard good things about svk ... and as git can now front for svn or cvs (or is it the other way around?) that seems more sane...

Robert Cowham said...

For an alternative viewpoint:

http://www.dwheeler.com/essays/scm.html

<<<<<<<<
A posting by Bastiaan Veelo at Linux Weekly News has a nice summary:

"The most important thing to be aware of though is that Arch and Subversion differ in fundamental ways. Arch works in a decentralized way, while Subversion is designed on a client/server model. Indeed with Arch you can start coding and using version control without first applying for access to the server. However, [merging] your code with the main branch has to be done by the one project maintainer....

Development with Subversion (and CVS for that matter) is centralized in the sense that there is just one repository, but it is actually more decentralized in a social sense since there are as many code integrators as there are developers with write access to the repository.

In short, one could say that Arch is centralized around a code integrator, and that Subversion (like CVS) is centralized around a repository. You decide what fits best. If you are a heavy user of CVS... chances are that Subversion actually fits your needs best.
>>>>>>>

I think it's horses for courses - in some models distributed works very well (particularly open source), but in other areas it is not the best model.

Aaron J. Seigo said...

in the context of this blog entry, it is not distributed versus centralized, it is online versus off-line.

that the off-line methods come with the possibility of decentralized development is, in this case, a coincidence.

there's no need, however, to follow a decentralized or a centralized development model just because one uses, for instance, git.

the whole point of this blog entry is point out that there are other issues than the usual tired saws in this disucssion

Ismail Onur Filiz said...

> in the context of this blog entry, it is not distributed versus centralized, it is online versus off-line.

That is actually where SVK's power lies. I have been using it for some time for KDE development as well, and I am quite happy. They have just released a new version as well;)