How do you know when code is good or bad? Or complicated or simple? Chances are you know it when you see it, but that isn’t very quantifiable.
I find that some simple software metrics can help with this. In particular, metrics can help identify the most complicated areas of the code. This is useful in a few different situations:
- When I’m writing code, I can highlight complicated sections that should be refactored.
- When doing code reviews, I can find the most complicated sections to focus on so that my review provides the most value.
- When I’m exploring a new codebase, metrics are a tool for helping me navigate. For example, where are the complicated sections and how is the code distributed — i.e. is it split into many files, or just a few?
You can get this information with just a few simple metrics.
File count
How many files is this code spread across? If it’s one that’s not so great. If it’s one thousand, that’s also probably not so great.
Lines of code
Lines of code is pretty straightforward. This is just the number of non-comment lines in each file. The cloc tool is really great for counting this in variety of languages. It also works well on mixed-language source code repositories.
Complexity
Measuring the complexity of code is helpful because more complex code is more difficult to read, understand, and maintain. In general, you want to keep your code from being too complex. There are many ways to calculate complexity.
Global count
Globals are your worst enemy — even more than function complexity. Globals are like complexity multipliers. When you’re reading code and you come across a global, it’s much harder to understand. Now, the behavior of this code can’t be determined without finding all the other uses of the global, and understanding how it is used in other places. So, counting up the globals – and where they are used can be really effective in assessing complexity.
Koopman Spaghetti Factor (KSF)
Phil Koopman is an embedded systems expert focused on “dependable embedded systems.” This something I can get behind.
He has an article where he “proposes” an new complexity metric called the “spaghetti factor.” The Koopman Spaghetti Factor is computed like this:
KSF = SCC + (Globals * 5) + (SLOC / 20)
- KSF = Koopman Spaghetti Factor
- SCC = The Strict Cyclomatic Complexity
- Globals = The global variable count
- SLOC = The number of lines of non-comment source code lines of non-comment code
This is to be computed for each module (source file) of the code. I really like this metric because it combines a bunch of different metrics – complexity, global count and lines of code – into a single metric.
The KSF also weights those different metrics according to importnace (or impact on complexity). Globals are the most sinister, so they get a 5x muliplier. Complexity (SCC) is next, so it gets no multiplier. Then lines of code are the least significant, so its divided by 20.
Ravioli
I like this KSF metric so much that I wrote a tool -- named ravioli -- to help me calculate it. Ravioli is a simple-to-use tool for calculating complexity metrics — including the Koopman Spaghetti Factor (KSF) — on C source code.
If you run it in your project folder it will calculate the KSF on all of the C files it can find and give you the results sorted by complexity. Here is some example output:
> ravioli .
-------------------------------------------------------------------------------
File complexity globals lines ksf
-------------------------------------------------------------------------------
motobox\Sources\FreeRTOS\tasks.c 12 0 1387 81
motobox\Sources\datapage.c 1 0 1242 63
motobox\Sources\FreeRTOS\queue.c 15 0 930 61
motobox\Sources\command_processor.c 19 2 243 41
motobox\Sources\rtos.c 5 6 135 41
motobox\Sources\vehicle_comm.c 8 1 432 34
motobox\Sources\vehicle_comm_sim.c 11 0 373 29
motobox\Sources\Start12.c 1 1 337 22
motobox\Sources\can.c 7 0 289 21
motobox\Sources\iso15765.c 12 0 187 21
motobox\Sources\flash.c 7 0 268 20
motobox\Sources\j1979.c 10 0 201 20
motobox\Sources\Cpu.C 2 2 40 14
motobox\Sources\leds.c 2 2 26 13
motobox\Sources\log.c 3 1 117 13
motobox\Sources\rti.c 2 2 23 13
During calculation, the SCC is computed for each function. To determine the complexity for an entire file, the maximum value of all the functions in that file is used. To see which functions are the real culprits, us the -f
option to see the complexity of all the functions in your project.
It's designed especially for the C code used in embedded systems because it doesn't try to compile the C, and so won't get tripped up by non-standard extensions.
Be careful out there
I find these metrics to be good tools for identifying – and quantifying – complex code. This allows me to focus on these particular modules during development or a code review as potential areas for problems and refactoring.
But metrics are just tools. It’s easy to go crazy and get too focused on the numbers. Use some common sense and use the metrics to try and make the code better. The future developers working on your code (maybe you!) will appreciate it.