Blog

CLOC

In the consulting space, I’ve been in several situations where there’s a need to get a rough idea of the relative complexity of a product/code base.  I’ve seen this used in acquisition negotiations as well as for estimates to maintain or re-write an existing code base.  One way of doing this is through measuring lines of code.  Enter CLOC (Count Lines of Code).

CLOC is an open source command line tool that will count physical lines of code (as well as comments, blanks, total # of files, etc.), is easy to use, very configurable and work for nearly any language out there. You’ll primarily see this used in Linux-based environments and various derivatives, but it also builds on Windows.

Since I’m in OSX land most of the time, installing CLOC is as easy as using Homebrew. Likewise, most any Linux-equivalent package manager can install it similarly and it can also be downloaded directly:

brew install cloc

A few quick moments and – boom – it’s installed and available.  Using CLOC is incredibly easy; just run it from a project root directory and it does the rest of the work. It recursively scans and prints out a tidy list when it’s finished:

src git:(develop) cloc .

I ran the above command in a medium-size Web project and received the following results:

253 text files.
250 unique files.
2 files ignored.

http://cloc.sourceforge.net v 1.62 T=2.18 s (114.5 files/s, 8110.6 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Javascript 160 2253 3451 9044
Handlebars 72 24 55 1896
LESS 9 159 86 553
HTML 9 13 34 145
-------------------------------------------------------------------------------
SUM: 250 2449 3626 11638
-------------------------------------------------------------------------------

Pretty cool! I also like taking a look at the comment/code ratio.  CLOC can be used as great barometer, as well as to gauge the size/complexity of a project, as well as the different language/technologies used.

However: these figures leave out a lot of other, very pragmatic details; this is just one part of a story. Take these numbers with a HUGE grain of salt. People love numbers. You can make just about any argument with figures. Temper and cross reference these numbers with other data points (size of backlog, bug count, unit tests, etc.) to help round out an opinion on a codebase.

Leave Reply