Blog Post

Contributing to the GMD Working Group

By January 30, 2019June 9th, 2020No Comments

How we produce metrics in the GMD Working Group

The GMD Working Group is one of the CHAOSS working groups, tasked with defining useful metrics relevant for the analysis of software development projects from the point of view of GMD (growth-maturity-decline). It also works in the areas of risk and value. For all of them, we're intending to follow the same process to produce metrics, similar to what other CHAOSS working grupos are doing. This post describes this process, that we have recently completed for the first metric (many others should follow during the next weeks).

The process is top down, starting by the definition of the focus areas of interest. For each of the focus areas, we define the goals we intend to reach for it, and then we follow GQM (goal-question-metric) to first derive questions which, when answered, should help to reach our goals, and then metrics that help to answer those questions. Finally, we explore how those metrics could be implemented, and produce reference implementations for specific data sources. During all the process we have into account use cases, which illustrate how metrics are used in the real world.

Currently, the working group is dealing with five focus areas: code development, community growth, issue resolution, risk, and value. We estimate that all of them are relevant to improve our knowledge of FOSS (free, open source software) projects.

Goals for the code development focus area

For now, the more complete of these focus areas is code development, for which we have identified some goals: activity, efficiency and quality. For each of them we're in the process of identifying questions. For example for activity, we have identified a question "How many changes are happening to the code base, during a certain time period?", code named Changes, that we think should help to learn about the activity of a project.

To help to answer this question, we have identified some metrics, such as "Number of changes to the code base", code named as Code_Changes_No(Period), which tries to capture how many changes to the source code were done during the period of interest. We explain this in detail in the definition of the Code_Changes metric. This definition tries to be neutral with respect to the specific data source (in this case a source code management repository, such as git, Subversion, or Mercurial), but also includes specific sections for specific data sources.

Python notebook with Code_Changes implementation for git

Finally, to clarify the metric, and provide a definition which is not ambiguous and can be checked for conformance, we also provide an implementation of it for a certain data source. In our case, we implemented it for git, as a Python notebook (check it in Binder). It includes documentation on the details of the implementation, and an actual implementation of the metric as a Python class. It also includes examples of how to use it with real repositories, and an exploration of some details of the specific data source, relevant for implementations and comparison between different implementations. Reference implementations are based on Perceval output, which is a collection of JSON documents, one per item obtained from the data source, as much similar as the data produced by the data source as possible.

When producing the first reference implementation for Code_Changes, we completed the first full process, from focus area and goal to questions and metrics. Now, we intend to complete this process for the rest of goals in all our focus areas. Do you want to join us in this travel? If so, you are welcome! We are ready to review your pull requests, and work with you towards having useful definitions and implementations of metrics that help us all to better understand FOSS projects.