Updates from the CHAOSS Data Science Working Group: Practitioner Guides, Research, and More!

Close up of a window with reflections of buildings. Window has a sign that reads "DATA *"

Our last Data Science Working Group update here on the CHAOSS blog was in June, so we thought it would be a good time to talk more about what we have been doing in the past 5 months!

Practitioner Guides

Until recently, all of our Practitioner Guides were designed to help people get started on a topic, but in the past couple of months, we’ve written 2 new guides on more complex, advanced topics. The new Demonstrating Organizational Value guide has a framework for using criticality and project health to show how your open source work adds value and helps your organization achieve their goals. We also have a new Assessing Viability guide that is based on the viability metrics models to help people assess the viability of the open source projects that they consume across 4 separate categories.

These two new guides join the 7 existing Getting Started guides: Introduction, Contributor Sustainability, Responsiveness, Organizational Participation, Security, Sunsetting an Open Source Project, and Building Diverse Leadership.

We also have other guides currently being written, and we have an issue template if you want to propose a new guide. You can propose a guide that you want to write or a guide that you want to use, but don’t have the expertise to write yourself, and we’ll try to find someone else to write it.

Research

We continue to run several research projects from within the CHAOSS Data Science WG.

We’ve made major progress on our Foundation Statistics Project, which aims to build a reproducible, cross-ecosystem view of community health across CNCF, Apache, and Eclipse. Since June, we have a unified, CHAOSS-aligned dataset for CNCF and Eclipse projects, along with new notebooks for metadata auditing, completeness benchmarks, and outlier detection thanks to excellent work by Ernest Owojori.

Our remaining challenge is Apache metadata. Many Apache projects don’t report communication platforms (Slack, Discord, Zulip, mailing lists, etc.) in a consistent way, so we’re still filling these gaps through manual collection and validation. This is a great entry point for new contributors, especially anyone looking to make a first data science contribution to Open Source. If you’d like to help, we’ve opened an issue with instructions and sample tasks.

While we haven’t done much new analysis work on the research project that looks at open source projects that have been relicensed and the forks that resulted from that event, we have spent more time talking about that work at various events. The talks were both titled Power Dynamics, Rug Pulls, and Other Corporate Impacts on OSS Sustainability delivered at FOSSY in Portland in August 2025 and again at the Linux Foundation Open Source Summit EU in August in Amsterdam (video).

Join Us!

All you need to join the Data Science WG is an interest in using data to understand the open source world around us. Most of our work is analysis of data, writing guides, and discussions about using metrics. You don’t need any special skills, and you don’t need to know any advanced statistics, machine learning, or AI. To learn more, visit our repository, join our meetings, or reach out to us in the #wg-data-science channel in CHAOSS Slack. We hope you’ll join us!

Photo by Claudio Schwarz on Unsplash

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Updates from the CHAOSS Data Science Working Group: Practitioner Guides, Research, and More!

Authors