Practitioner Guide: Organizational Participation

Primary metrics:

If you haven’t already read the Practitioner Guide: Introduction - Things to Think about When Interpreting Metrics, please pause now and read that guide.

Organizations can have a significant impact on the health and sustainability of an open source project. On the one hand, organizations can help sustain projects over time by employing people to work on the open source projects that they use or by contributing other resources to those projects (Egbahl 2016). This positive effect is especially true for projects that are well supported by a variety of organizations where no one company dominates or controls a project, and the project is also hosted by a neutral foundation.

Miller et al. (2019) found that contributors who make most of their contributions during work hours were less likely to leave a project; however, when they do leave, they are more likely to leave projects for occupational reasons (e.g., getting a new job). So on the one hand, organizational participation can help increase the sustainability of open source projects, especially when there is participation from a variety of organizations. The increased risk of decreased sustainability comes from having a single dominant organization in a project, since employee retention within that organization can also impact the sustainability of the project if employees leave the organization.

If all or most of the contributions are from the employees at a single company, what happens when that company has a shift in strategy, or gets acquired, or runs out of money and goes out of business? Would the project be able to continue if the leading company pulled all of its employees out of the project? Companies often make business decisions that can result in no longer funding employees to work on an open source project, which can result in projects losing large numbers of maintainers at one time (Egbahl 2016). In particular, single vendor open source projects, especially ones backed by large technology companies, might not seem risky, but they can quickly become unviable for other organizations after a licensing change or when the leading company (or some key contributors from that company) stop working on the project.

From a contribution standpoint, it can be difficult to move up within a project or even to get your contributions merged if one organization has all of the influence and other people don’t feel like they are participating as equals.

Step 1: Identify Trends

The biggest challenge with identifying organizational influence in open source projects is that the organizational affiliation data is almost never accurate enough to use without doing some manual cleanup, since it most often comes from imperfect data sources (e.g. email domains). See the Diagnosis section later in this document for more details about this issue.

A good starting point is to look at the Elephant Factor to determine how the work is distributed among multiple organizations along with Organizational Diversity to look at which organizations are making contributions. Finally, it’s also important to think about Organizational Influence to understand which organizations have employees in leadership or other decision making positions.

Elephant Factor

The Elephant factor looks at the distribution of the work within a community across organizations. The primary goal of this metric is to see whether the work within a project is completed by people working for a single organization or a small number of organizations. If most of the work is being done by employees who work at a single company (as in the example below), the project might be riskier to use and harder to contribute to than a project with contributions that are spread out over many organizations with no single organization being dominant.

Commit Activity by Domain pie chart showing VMware at 70.5%, gmail.com at 16.3%, users.noreply.github.com at 4.52, other at 4.32%

Organizational Diversity

Often the same visualizations used to determine Elephant Factor can also be used to look at Organizational Diversity. For example, the previous graph for Elephant Factor also tells you which organizations are the dominant forces within a community. It can also help to look at Organizational Diversity across different elements of your community to determine whether some areas within your community have more or less organizational diversity than others.

Active Organizations over Time by Data Source across GitHub PRs / Issues, Slack, and pipermail

Organizational Influence

Organizational Influence is a measure of the influence that an organization has on an open source community. It can be harder to measure than some other metrics because it usually involves looking at the organizations that employ people in leadership roles (e.g., boards, working groups, maintainers, committers). In most cases this will require a manual assessment by looking at the governance or other documentation that has details about leadership positions. For a few projects, this can be easier when the leadership data is stored in files with structured data, like in the following example of Istio Leadership Positions. As you can see in the Istio example, it can be useful to look at how organizational influence evolves over time.

April 2022 before Istio joined the CNCF as an Incubating project on September 30:

Istio Leadership Positions (SC, TOC, WG Lead) 26 people across 9 companies - 12 at Google, 4 IBM

June 2023 just before they were accepted as a CNCF Graduated project on July 12:

Istio Leadership Positions (SC, TOC, WG Lead) 26 people across 11 companies - 7 at Google, 5 at Solo.io, 4 IBM

Step 2: Diagnosis

The biggest challenge with identifying trends for organizations in open source projects is that the organizational affiliation data is almost never accurate enough to use without doing some manual cleanup. Most tools, including CHAOSS’s GrimoireLab and Augur rely mostly on the email address domain (e.g., google.com, microsoft.com) to determine where someone works, but people often use neutral email addresses (e.g., gmail.com) or otherwise obfuscate their email (e.g., users.noreply), so many charts with organizational data just aren’t useful out of the box.

Example of data that requires significant clean up: gmail.com 42.6%, users.noreply.github.com 14%

Another method is to use the organization that people have added to their profiles, but from experience we know that very few people populate that data and because it’s freeform text, even when they do populate it, it’s often inconsistent enough that it can’t be easily used (e.g., IBM, International Business Machines, IBM GmbH). It’s also very common for people to change jobs, so not only do you need to know where someone works, you also need to know when they worked there. Unfortunately, if you truly want to understand the organizational impact on an open source project, the most reliable way to do this is by manually verifying the data, cleaning up inaccurate affiliations, and storing those in your metrics tool or other data sets. The GrimoireLab Sorting Hat tool is one option for managing and storing cleaned up affiliation data. The CNCF maintains a relatively good (but not perfect) dataset for organizational affiliations for developers contributing to their projects, and they do this partly by encouraging developers / organizations to update their information, but they also employ a full-time contractor to manually look for changes and make updates.

As mentioned in the Practitioner Guide Introduction, you should start by talking to a few people who are intimately involved in the project, since they are likely to know something about where people work and other organizational dynamics that might not be obvious from the raw data.

Once you’ve cleaned up the organizational affiliation data, then you can start interpreting it. If most of the work is being done by employees who work at a single company (as in this example), the project might be riskier to use and harder to contribute to than a project with contributions that are spread out over many organizations with no single organization being dominant.

If this is a project being driven by your own organization and you are the dominant contributor, you should think about whether you really want to encourage contributions from people at other organizations or how you want others to contribute.

Step 3: Gather Additional Data if Needed

CHAOSS has other metrics related to organizational impact that can help diagnose specific problems within your community.

Additional Metrics:

Step 4: Make Improvements

When you are faced with a project that has a single (or small number) of organizations whose employees make the majority of contributions, how you improve this depends on whether employees at your own organization are the dominant contributors or whether the project is dominated by employees who work for another organization.

Your Organization is Dominant

In this case, the first step is to think hard about whether you really want to solicit contributions from employees who work for other organizations and whether your maintainers are set up to successfully manage those contributions.

While not ideal, in some cases it might be ok not to solicit contributions from people outside of your organization. However, if this is the case, then you need to be very clear about this in your contribution guides and other project documentation. Few things are more frustrating than making a contribution only to find out that they only accept contributions from employees. In some cases, there may be areas within your project where you prefer to have people contribute and areas that are more difficult for non-employees to contribute. You should also be transparent about if / how people from other organizations can move into maintainer or other leadership roles within your project documentation. Being clear and transparent about these expectations reduces frustration and confusion for employees and other contributors.

If you do want contributions from others, you should spend some time thinking about why you aren’t getting those contributions already. A common deterrent is a lack of transparency when it’s clear that decisions and discussions are happening in private employee channels, when you are using internal bug trackers and roadmapping tools that only employees can access, or when it isn’t clear how contributors can take on more responsibility and leadership within the project. This lack of transparency makes it very difficult for maintainers to be successful. Maintainers need to be able to share links to previous discussions, decisions made, issues, and other work with outside contributors when discussing why a contribution should be revised or not accepted. If the maintainer needs to dig through internal tools, collect the information, and decide which parts can / can’t be shared, then they aren’t likely to respond to outside contributors in a timely and effective manner. It may or may not be practical to move past work into the open, since there could be sensitive information about customers and other private information that employees might post internally. However, this can be resolved over time by making sure that future project work happens only in the public channels. This is easy to say, but often difficult to do, and you might need to educate some employees on how to do this work in the open. For example, maintainers might need training for having difficult conversations and responding with empathy while leading a project; and product managers who are used to gathering requirements in customer meetings might need help adapting their process and tooling to do this in the open. It’s also likely that you’ll need to police this for a while by having key contributors and maintainers redirect private conversations into the public channels because habits are difficult to break.

If you want contributions and have the project set up so that the work is happening transparently and in the open, then it’s time to start recruiting. Projects with a strong user base that have quite a few adopters are more likely to attract contributors, so if you don’t have people outside of your company using your project, you should start by marketing your project to an appropriate user base. This can be done via social media, presentations at conferences, blog posts and other standard marketing channels.

Assuming you have existing users, it’s likely that you know at least some of the people who are using your project. It’s important to remember that not every user will be interested in contributing, so it might take some time to find the right people, but don’t hesitate to use the relationships that your organization has with customers and other organizations working in related areas. In many cases, by having employees contribute, you might be inadvertently setting the expectation that your employees will always be the ones doing the work, so recruiting that first major contributor from another organization is a first step toward resetting those expectations. Mentoring has been shown to be an effective and efficient way onboard people from outside of your organization to help them be productive in your project more quickly (Fagerholm et al. 2014). Good first issues and help wanted labels are a good start, but you’ll also want to proactively reach out to potential contributors to ask for their help with specific project work, and you’ll want to have documentation about how others can move into maintainer or other leadership roles. All of this is described in more detail in the Contributor Sustainability Practitioner Guide.

Another Organization is Dominant

In other cases, you might be interested in using and contributing to a project where most / all of the contributions come from employees at another organization. You should start by engaging within the community to better understand whether they want contributions from employees who work for other organizations, and if so, whether there are any restrictions on how your employees might contribute. In some cases, this information can be found in governance or contribution documentation, but if not, you may just need to engage in the project’s communication channels. One way to test this is to ask them to document the existing governance and contribution process, and if they aren’t willing to document it, the project probably isn’t one that would be welcoming to contributors. Using that project might put your organization at risk because it increases the chances of a licensing change or becoming obsolete if the leading company abandons the project.

In most cases, projects are eager to have contributors from other organizations. If your organization is using a project, one of the best ways to help make that project more sustainable is by having some of your employees contribute to it as part of their job. Having your employees contribute gives your organization a seat at the table when decisions are being made, and it reduces your risk because those employees are more likely to know about big project changes in advance.

Step 5: Monitor Results

How you monitor the results will depend on what improvements you decide to make. Continuing to monitor these 3 metrics is a good start. If you used other data from Step 3, you should also monitor those metrics.

It can take time for projects to recruit users and contributors, so it is particularly important to look at how organizational impact changes over time as in the earlier Istio example. Don’t be discouraged if it takes a while to get more people contributing to your projects.

Cautions and Considerations

  • When looking at the impact of organization on open source projects, being transparent is critical. Saying one thing in your documentation and doing another can damage your organization’s reputation more than just being honest and transparent about how people can (or cannot) contribute to your project.
  • If you are considering using an open source project as a key component of your products or infrastructure, you should think very carefully about that decision when that project is controlled by a single organization.

Additional Reading

Feedback

We would love to have feedback to learn more about how people are using the CHAOSS Practitioner Guides and how we can improve them over time. Please complete this short survey to provide your feedback.

Contributors

The following people contributed to this guide:

  • Dawn Foster
  • Luis Cañas Díaz

References

CHAOSS Practitioner Guides are living documents and we welcome your feedback and input. To suggest edits to this document, please visit this link on GitHub: https://github.com/chaoss/wg-data-science/blob/main/practitioner-guides/organizational-participation.md