Background: Software practitioners need reliable metrics to monitor software evolution, compare projects, and understand modularity variations. This is crucial for assessing architectural improvement or decay. Existing popular metrics offer little help, especially in systems with implicitly connected but seemingly isolated files.
Aim: Our objective is to explore why and how state-of-the-art modularity measures fail to serve as effective metrics and to devise a new metric that more accurately captures complexity changes, and is less distorted by sizes or isolated files.
Methods: We analyzed metric scores for 1,220 releases across 37 projects to identify the root causes of their shortcomings. This led to the creation of M-score, a new software modularity metric that combines the strengths of existing metrics while addressing their flaws. M-score rewards small, independent modules, penalizes increased coupling, and treats isolated modules and files consistently.
Results: Our evaluation revealed that M-score outperformed other modularity metrics in terms of stability, particularly with respect to isolated files, because it captures coupling density and module independence. It also correlated well with maintenance effort, as indicated by historical maintainability measures, meaning that the higher the M-score, the more likely maintenance tasks can be accomplished independently and in parallel.
Conclusions: Our research identifies the shortcomings of current metrics in accurately depicting software complexity and proposes M-score, a new metric with superior stability and better reflection of complexity and maintenance effort, making it a promising metric for software architectural assessments, comparison, and monitoring.