How we produce metrics in the GMD Working Group
The GMD Working Group
is one of the CHAOSS working groups, tasked with defining
useful metrics relevant for the analysis of
software development projects from the point of view of
GMD (growth-maturity-decline). It also works
in the areas of risk and value. For all of them, we're
intending to follow the same process to produce metrics,
similar to what other CHAOSS working grupos are doing.
This post describes this process, that we have recently
completed for the first metric (many others should follow
during the next weeks).
The process is top down, starting by the definition of
the focus areas of interest. For each of the focus areas,
we define the goals we intend to reach for it, and then
we follow GQM
(goal-question-metric) to first derive questions which,
when answered, should help to reach our goals, and then
metrics that help to answer those questions.
Finally, we explore how those metrics could be implemented,
and produce reference implementations for specific data sources.
During all the process we have into account
which illustrate how metrics are used in the real world.
Currently, the working group is dealing with five
code development, community growth, issue resolution, risk, and value.
We estimate that all of them are relevant to improve our
knowledge of FOSS (free, open source software) projects.
For now, the more complete of these focus areas
is code development,
for which we have identified some goals: activity,
efficiency and quality. For each of them we're in the process
of identifying questions. For example for activity,
we have identified a question "How many changes are happening to the code base, during a certain time period?",
Changes, that we think should help to learn about
the activity of a project.
To help to answer this question,
we have identified some metrics, such as
"Number of changes to the code base", code named as
Code_Changes_No(Period), which tries to capture
how many changes to the source code were done during the
period of interest.
We explain this in detail in the
definition of the Code_Changes metric.
This definition tries to be neutral with respect to the
specific data source (in this case a source code management
repository, such as git, Subversion, or Mercurial),
but also includes specific sections for specific data sources.
Finally, to clarify the metric, and provide a definition which
is not ambiguous and can be checked for conformance,
we also provide an implementation of it for a certain data source.
In our case, we implemented
it for git, as a Python notebook
(check it in Binder).
It includes documentation on the details of the implementation,
and an actual implementation of the metric as a Python class.
It also includes examples of how to use it with real repositories,
and an exploration of some details of the specific data source,
relevant for implementations and comparison between different implementations.
Reference implementations are based on
Perceval output, which is a collection of JSON documents,
one per item obtained from the data source, as much similar
as the data produced by the data source as possible.
When producing the first reference implementation for
Code_Changes, we completed the first full process,
from focus area and goal to questions and metrics.
Now, we intend to complete this process for the rest
of goals in all our focus areas. Do you want to join us in this travel?
If so, you are welcome! We are ready to review your pull requests,
and work with you towards having useful definitions and
implementations of metrics that help us all to better understand