Blog Post

Metrics for OSS Viability

By December 7, 2023No Comments

In the last post, we gave a background on Viability in Open Source. We covered the motivation, and implementation plan of how we’ll collect and measure metrics about open source that we use or might use at Verizon.

Metrics on a list next to a laptop, ruler, and calendar
Considering all these metrics together takes a list and a laptop, and maybe a ruler.
Photo by Marissa Grootes on Unsplash

In this post, we will cover the nitty-gritty details of which metrics we’re using in our model, and why they fit together. Rather than covering each metric individually, I’ll summarize what metrics are in each model, and give an overview on why they fit together for a good picture of the model. We’ll also cover the value proposition of metrics that cross between the categories comprising the full model.

What follows is the list of metrics comprising the Viability model, and why they are included:

Compliance + Security

Just for this model:

OpenSSF Best Practices

  • This is a proxy metric for us to ensure that a project responds to security incidents, and has enough protections in place to generally interpret a reliable Compliance and Security strategy.
  • This allows us to avoid using costly SCA/SAST scanning on every open source project we consider.

License Coverage

  • We use this to make decisions about risk of using unlicensed software, or determine if our use case of the software is compatible with the license provided.

Licenses Declared

  • This lets us compare our intended usage and project policy against the policy of our dependencies.

OSI Approved Licenses

  • Knowing we’ve reviewed the implication of each OSI license provides confidence that we understand how to use the software in compliance with those licenses.

Defect Resolution Duration

  • This metric allows us to consider, apples to apples, radically different response rates to defects when they occur between projects. 
  • Our understanding of this metric across projects and across a suite of dependencies calibrates our risk profile and tolerance.

Upstream Code Dependencies

  • Where we intend this metric for the model of viability is to ensure that dependencies of a project are also included in any viability evaluation we perform. 
  • It is important to consider an application alongside dependencies it shares to give a full picture of a particular project’s risk portfolio.

Shared between models:

Libyears

  • “A simple measure of software dependency freshness. It is a single number telling you how up-to-date your dependencies are.”
  • This metric allows for apples-to-apples comparisons between projects of freshness. 
  • Scales to show complex projects with many dependencies, and the risk associated with using those projects with the massive maintenance cost behind the scenes.

What the metrics for Viability

Overall, we use this metrics model to gauge how well both the community and the maintainers of the application consider the security and compliance of their application. We expect to use these indicators to gauge risk. We have showstoppers like licensing, where a license can be flatly incompatible with our intended use case, through security-centric badges and metrics, to how fast and regularly a team maintains the dependencies and defects reported to their application. 

Additionally, like in other models, some metrics are very tricky to trace or visualize. We leave a healthy amount of flexibility in how we rank applications against tricky-to-gather metrics, and we recommend that users of our models do the same. For example: Much like the Defect Resolution Duration; the appetite for how many Libyears is appropriate for a project will always be up to maintainers. Depending on how or where an app may run, and how frequently we can update it, we think about Libyears critically. 

Following other metrics and models, Libyears notably contributes to three of our metrics models: Compliance/Security, Governance, and Community. We believe that it fits particularly well in Compliance + Security as it gives an indicator not only about how critically maintainers consider compliance and security in their own project, but in the projects they’re dependent on.

Governance

Just for this model:

Issues Inclusivity

  • Provides an effective measurement for intentional aggregation of issues
  • Indicates how community skills are applied to project responsibilities.

Documentation Usability

  • Strong, usable documentation is required.
  • Though this can include a lot of manual effort, this is a very important metric to attempt to collect.

Time to Close

  • How long it usually takes for a contribution to the project to make its way to the codebase
  • Will give us an idea of consistency in the project (median, mean, mode)
    • This is not to be confused with defect resolution (we hold a higher standard for)
  • Other processes may occur alongside opening and closing a PR, for example, but this provides enough of an indicator to be inherently useful to the Governance of a project.

Issue Age

  • How long questions / suggestions / etc. generally hang around a project.
  • Simple to understand, easy to dive further into by looking at what issues are about.

Shared Between Models:

Change Request Closure Ratio

  • Compare the drift of new requests to their rate of closure. 
  • Gives us an idea of how the project is maintained – or if more maintainers might be needed to keep up with demand for new features.

Project Popularity

  • Aggregate of other smaller metrics one might expect to find in a cursory glance over a project landing page. 
  • Likes, stars, badges, forks, clones, downstream dependencies, mentions on social media, and more.

Release Frequency

  • Knowing the timing of regular releases, and being able to understand frequency and cadence we may expect security patches and new features to identify how well our project’s release cadence and strategy fits with potential dependencies. 
  • This is somewhat a proxy for LTS / release strategy that may otherwise be available for larger projects.

Libyears

  • “A simple measure of software dependency freshness. It is a single number telling you how up-to-date your dependencies are.”
  • This metric allows for apples-to-apples comparisons between projects of freshness. 
  • Scales to show complex projects with many dependencies, and the risk associated with using those projects with the massive maintenance cost behind the scenes.

What the metrics mean for Viability

These metrics are useful to show the intention or lack of intention in the project Governance. For example, If there’s a lack of inclusive labels on issues: it identifies a gap in welcoming new contributors and softing existing contributors through workstreams. The Governance of a project is reflected in turn. Same goes for many of these metrics. The ability to contribute, understand, or depend on a project is highly coupled to the effort behind Governance.

This isn’t to say poor Governance metrics indicate that a project is governed by fools. Low CRCR, for example, may simply indicate that there are not as many maintainers to support a contributing community. A lack of new issues could be the result of a recent large release that addresses many recurring issues. These metrics are important to aggregate these reasons not to cast doubt on maintainers of projects. Only to identify the Governance capacity and effort across projects in a software portfolio.

If some of these metrics feel like they could be strong community metrics, I think they can be. Many of the shared metrics here are a combination of the effort a community has with a project, and the effort of the body governing a project. We think the overlap of shared metrics captures this relationship well, considering the responsibility contributors and maintainers share in creating OSS.

Community

Just for this model:

Clones

  • How many times a project has been pulled from a repository into a local machine. 
  • Indicator of how many people are using or evaluating the project.

Technical Forks

  • How many forks have been made of a given project.
  • Forks, in our estimation, are normally performed to create contributions through changes or to take the project in a new direction in their own community.

Types of Contributions

  • Not all contributions are code, strategy / issues / Reviews / events / Writing articles / etc. gives a strong indication of the maintainer’s ability to grow or continue building a project.
    • Likewise, if contributions are coming in only as requests with no coding alongside it – we can assume the project doesn’t have active contributors. 
    • Any large ratio distribution might tip the scales of if we should or should not recommend a project as viable.

Change Requests

  • The volume of regular requests, or the emergence of a pattern of change requests (around holidays, weekends, weekdays) can tell us a lot about a project. 
  • By identifying trends, we can make many educated guesses about the strength, patterns, and sustainability of a project Community.

Committers

  • We don’t say “no projects under x committers is viable” – because the two are not related.
    • We care about committer trends

Shared Between Models:

Change Request Closure Ratio

  • Compare the drift of new requests to their rate of closure. 
  • Can help indicate a cooling or heating contribution community by monitoring merged community requests.

Project Popularity

  • Aggregate of other smaller metrics one might expect to find in a cursory glance over a project landing page. 
  • Likes, stars, badges, forks, clones, downstream dependencies, mentions on social media, and more.

Libyears

  • “A simple measure of software dependency freshness. It is a single number telling you how up-to-date your dependencies are.”
  • This metric allows for apples-to-apples comparisons between projects of freshness. 
  • Scales to show complex projects with many dependencies, and the risk associated with using those projects with the massive maintenance cost behind the scenes.

What the metrics mean for Viability

With Community, we seek to understand the “tinkering” that happens with a project, as well as being able to measure the contributions that are made. Clones and forks indicate how many users of software have pulled it to build from source, inspect the source code, submit a contribution, or take the project in a new direction. That flavor of popularity feels meaningful to trace community engagement in a project. 

With committer trends, types of contributions, and change requests, we can see how a Community is interacting. Maybe more markdown RFC’s are created than features, maybe vice-versa. With an understanding on what types of contributions are made, and how regular they are, we make a more informed judgment on project viability. In an example: we think it’s reasonable to expect that a project which has shed 90% of its committers in a three month period is less viable than a stable (flat) committer trend. The inverse could indicate a growing or stable project gaining popularity around a particular technology trend. Where some “tinkering” metrics feel micro, other metrics take a macro lens.

By measuring some shared metrics, we give this model an opportunity to be viewed from the perspective of how much the community maintains a project, and how much interest there is generally. We find this distinct from the Governance angle, even with significant overlap, as trends in these metrics are almost never entirely at fault of the community or in the maintainers of a given project. The numbers could be meaningful for either space, so they exist in both models.

Strategy

Just for this model:

Programming Language Distribution

  • We have strong opinions on which languages are viable at Verizon.
    • Many companies have similar standards and expectations – or normally center around a particular language. 
  • Unsupported or unused languages are a strong indicator of project viability. 

Bus Factor

  • A count of the fewest number of committers that comprise 50% of activity over a period. 
  • We can better understand the risk of using a particular project or set of projects, regarding how much support the project would get if top contributors left.

Elephant Factor

  • Elephant factor is a lot like bus factor – but it counts the fewest “entities” that comprise 50% of activity on a project. 
  • We use this to infer the influence companies have on a project, or how detrimental it would be if that company shifted priority.

Organizational Influence

  • Organizational Influence measures the amount of control an organization may have in a project. This is an estimate, and an aggregation of several other metrics. 
  • Details in the link, but organizational diversity is one example of a metric that can aggregate to create an imprint of influence. 

Shared Between Models:

Release Frequency

  • Knowing the timing of regular releases, and being able to understand frequency and cadence we may expect security patches and new features identifies how well our project’s release cadence and strategy fits with potential dependencies. 
  • This is somewhat a proxy for LTS / release strategy that may otherwise be available for larger projects.

What the metrics mean for Viability

Metrics we trace in this model trace the strategy, or expected influence from individuals and organizations. For example: With a bus factor of 1, it’s very possible that burnout or other factors could pull that one person away. With a more resilient count of folks, we are more likely to see a stable and viable maintenance strategy. As a highly regulated and large entity, Verizon considers which other entities might be developing critical infrastructure for our applications. We consider our risk appetite and tolerance in the scope of a project we use, to ensure we don’t rely too heavily on one particular provider. These metrics continue our mission of managing that risk profile.

We share release frequency between Strategy and Governance. This categorizes the overlap of how the maintainers of a project provide both a governance plan and a maintenance strategy.

Wrap Up

Compliance + Security, Governance, Community, and Strategy. These are the tenets we use for our Viability Metrics (Super) Model. I’m excited to share this model with the broader software community for input and feedback, and to make it better over time. We will share our lessons learned and what practices we find the most effective for maintaining a viable software portfolio as we iterate.

Tune in next time for us to share a guide on OSS viability. We include recommended tools to set up Viability monitoring on projects. If you’d like to connect to talk about this post, join the CHAOSS community slack and find me, Gary White!

 

Author