You are here:

Code Changes Lines

Question: What is the sum of the number of lines touched (lines added plus lines removed) in all changes to the source code during a certain period?

Description

When introducing changes to the source code, developers touch (edit, add, remove) lines of the source code files. This metric considers the aggregated number of lines touched by changes to the source code performed during a certain period. This means that if a certain line in a certain file is touched in three different changes, it will count as three lines. Since in most source code management systems it is difficult or impossible to tell accurately if a lines was removed and then added, or just edited, we will consider editing a line as removing it and later adding it back with a new content. Each of those (removing and adding) will be considered as "touching". Therefore, if a certain line in a certain file is edited three times, it will count as six different changes (three removals, and three additions).

For this matter, we consider changes to the source code as defined in Code Changes Commits. Lines of code will be any line of a source code file, including comments and blank lines.

Objectives

Although code changes can be a proxy to the coding activity of a project, not all changes are the same. Considering the aggregated number of lines touched in all changes gives a complementary idea of how large the changes are, and in general, how large is the volume of coding activity.

Implementation

Potential aggregators for the Code Changes Lines metric include:

  • Count. Total number of lines changes (touched) during the period.

Potential parameters for the Code Changes Lines metric include:

  • Period of time: Start and finish date of the period. Default: forever. Period during which changes are considered.
  • Criteria for source code; Algorithm Default: All files are source code. If we are focused on source code, we need a criterion for deciding whether a file is a part of the source code or not.
  • Type of source code change:
    • Lines added
    • Lines removed
    • Whitespace

Filters

  • By actors (author, committer). Requires actor merging (merging ids corresponding to the same author).
  • By groups of actors (employer, gender...). Requires actor grouping, and likely, actor merging.
  • By tags (used in the message of the commits). Requires a structure for the message of commits. This tag can be used in an open-source project to communicate to every contributors if the commit is, for example, a fix for a bug or an improvement of a feature.
  • Count per month over time
  • Count per group over time

Tools Providing the Metric

Data Collection Strategies

Specific description: Git

In the cases of git, we define "code change" and "date of a change" as we detail in Code Changes Commits. The date of a change can be defined (for considering it in a period or not) as the author date or the committer date of the corresponding git commit.

Since git provides changes as diff patches (list of lines added and removed), each of those lines mentioned as a line added or a line removed in the diff will be considered as a line changed (touched). If a line is removed and added, it will be considered as two "changes to a line".

Mandatory parameters:

  • Kind of date. Either author date or committer date. Default: author date.
    For each git commit, two dates are kept: when the commit was authored, and when it was committed to the repository. For deciding on the period, one of them has to be selected.

  • Include merge commits. Boolean. Default: True.
    Merge commits are those which merge a branch, and in some cases are not considered as reflecting a coding activity

References

To edit this metric please submit a Change Request here: https://github.com/chaoss/wg-evolution/blob/main/focus-areas/code-development-activity/code-changes-lines.md

To reference this metric in software or publications please use this stable URL: https://chaoss.community/?p=3591

The usage and dissemination of health metrics may lead to privacy violations. Organizations may be exposed to risks. These risks may flow from compliance with the GDPR in the EU, with state law in the US, or with other laws. There may also be contractual risks flowing from terms of service for data providers such as GitHub and GitLab. The usage of metrics must be examined for risk and potential data ethics problems. Please see CHAOSS Data Ethics document for additional guidance.

Tags:
Was this article helpful?
Dislike 0