Tuesday, October 13, 2009

ah, stats

Cornelius posted a neat summary of the source code heft in KDE's main modules. It was also picked up by some of the general F/OSS community news sites.

Michael Meeks posted an blog entry showing other projects' line count next to KDE's while asking "the real question is not 'is KDE valuable' - of course it is; but how does it compare[?]" In a nicely concise table he shows that Ohloh pegs KDE at 5.5 million lines of code with GNOME at 15.7, Linux at 7.9 and OpenOffice at 8.5 respectively. That makes KDE's code base look smaller in comparison to these other projects, doesn't it?

Unfortunately, statistics are often hard to interpret when aggregated like that. Also unfortunately, Michael's blog doesn't accept comments to it (2000 called and it wants it's blogging software back? ;) so I figured I'd expand a bit on this here.

Cornelius's numbers, as well as the ones on Ohloh, only cover the modules we release as part of the periodic KDE Software Distribution. These are the modules that get known as "KDE x.y". What those numbers don't include is the rest of the code we work on under the KDE community umbrella. That would include all the apps in our Extra Gear repository such as Amarok (175,760 lines), Digikam (478,185 lines), K3B (101,320 lines) and Kaffeine (269,149 lines) to name the four largest. My checkout of Extra Gear contains 1,175,644 lines of code according to sloccount. Also missing from the numbers is the code in kdesupport for things like QCA, Nepomuk and Strigi, Akonadi, Phonon, taglib, KDE/Win utilities, etc. which combined represent some 433,208 lines of code according to sloccount.

Most significantly, however, is the omission of Qt itself. Ohloh weighs it in at 16,028,530 lines of code.

If we look at the Ohloh numbers for GNOME, Linux or OOo they are much more representative of the entire set of code associated with those projects. GNOME, for instance, includes Gtk+ and a comprehensive listing of applications worked on under the GNOME umbrella. That helps explain the 15.7 million number a bit more clearly.

With a similar amalgamation, KDE weighs in somewhere in excess of 23 million lines of code. Even then, nothing is said about code complexity, efficiency granted to applications by sophistication in the lower levels of the stack, etc. That just can't be reflected in such simple numbers as these. Still, it is an interesting data point to look at for what it is worth. (A sentiment that Cornelius shared in his blog entry as well.)

So does this tell us anything useful at all? Well, if nothing else it does show that there is a lot of F/OSS technology out there these days. Between the four projects mentioned here alone there are over 50 million lines of code, and those four projects, while large in their own rights, are hardly the bulk of F/OSS code. That's pretty amazing.

7 comments:

Dan said...

I think you hit it on the head. Gnome having 15m versus 23M or whatever for KDE is meaningless. The only conclusion to draw is that there are a lot of bright people contributing great code for the community.

I think particularly when you look at gnome & kde you can guess that there is probably several million wasted lines of duplicated code. That's unfortunate.

Tom said...

Martins numbers are bordering trolling IMNSHO.

If you look at http://ftp.gnome.org/pub/GNOME/sources/ , which seems to be the basis for the gnome numbers you can Gnomes past, present and future and a lot of apps like gimp, dia, gnomeicu and new stuff like clutter branches and old stuff like esound, gnomeicu. Hell there is so much stuff, some of it hasn't been touched since 2002.

If you add KOffice, Qt2, KDE3, KDE2 Qt3,Arts, Qt4.6 and every big KDE project there is, there was and there ever will be KDE can be one gazillion LOCs too.

Apples and Oranges.

SSJ said...

I may be misremembering, but I'm sure that at one point GNOME's svn (pre-git, of course) contained an entire copy of the forked OO.o. Does this ring any bells with any one?

Tom said...

Just compare https://www.ohloh.net/p/gnome/enlistments to
https://www.ohloh.net/p/kde/enlistments and you can see how utterly stupid it
is to compare 27 active focused with 432 very general, old, obsolete
and very new (from Sawfish to clutter branches just to name a few.)
with a lot of duplication (Galeon, Epiphany, GnomeICU etc.). You have
to add a lot to KDE to not compare Apples and Space Shuttles.

But Martin won't admit that his numbers are just bogus and that he is trolling (he answered my criticism so it is not bordering trolling anymore)

Tom said...

/s/Martin/Michael

mmeeks said...

Since it seems you like comments, I though I'd post here :-) I blogged a response at:
http://www.gnome.org/~michael/blog/2009-10-14.html

The huge LOC number for Qt seems just wrong, so quite possibly many of the Ohloh numbers are even less meaningful than they could be.

Aaron J. Seigo said...

@michael: indeed, ohloh seems off. i just did a sloccount of qt 4.6 and it's ~2.6 million lines of code, all in (the vast bulk being in the main library, but some being in the tools as well.

that doesn't include QtCreator or some of the other software products either, though.

yay for stats. :/