
Our last Data Science Working Group update here on the CHAOSS blog was in June, so we thought it would be a good time to talk more about what we have been doing in the past 5 months!
Practitioner Guides
Until recently, all of our Practitioner Guides were designed to help people get started on a topic, but in the past couple of months, we’ve written 2 new guides on more complex, advanced topics. The new Demonstrating Organizational Value guide has a framework for using criticality and project health to show how your open source work adds value and helps your organization achieve their goals. We also have a new Assessing Viability guide that is based on the viability metrics models to help people assess the viability of the open source projects that they consume across 4 separate categories.
These two new guides join the 7 existing Getting Started guides: Introduction, Contributor Sustainability, Responsiveness, Organizational Participation, Security, Sunsetting an Open Source Project, and Building Diverse Leadership.
We also have other guides currently being written, and we have an issue template if you want to propose a new guide. You can propose a guide that you want to write or a guide that you want to use, but don’t have the expertise to write yourself, and we’ll try to find someone else to write it.
Research
We continue to run several research projects from within the CHAOSS Data Science WG.
We’ve made major progress on our Foundation Statistics Project, which aims to build a reproducible, cross-ecosystem view of community health across CNCF, Apache, and Eclipse. Since June, we have a unified, CHAOSS-aligned dataset for CNCF and Eclipse projects, along with new notebooks for metadata auditing, completeness benchmarks, and outlier detection thanks to excellent work by Ernest Owojori.
Our remaining challenge is Apache metadata. Many Apache projects don’t report communication platforms (Slack, Discord, Zulip, mailing lists, etc.) in a consistent way, so we’re still filling these gaps through manual collection and validation. This is a great entry point for new contributors, especially anyone looking to make a first data science contribution to Open Source. If you’d like to help, we’ve opened an issue with instructions and sample tasks.
While we haven’t done much new analysis work on the research project that looks at open source projects that have been relicensed and the forks that resulted from that event, we have spent more time talking about that work at various events. The talks were both titled Power Dynamics, Rug Pulls, and Other Corporate Impacts on OSS Sustainability delivered at FOSSY in Portland in August 2025 and again at the Linux Foundation Open Source Summit EU in August in Amsterdam (video).
Join Us!
All you need to join the Data Science WG is an interest in using data to understand the open source world around us. Most of our work is analysis of data, writing guides, and discussions about using metrics. You don’t need any special skills, and you don’t need to know any advanced statistics, machine learning, or AI. To learn more, visit our repository, join our meetings, or reach out to us in the #wg-data-science channel in CHAOSS Slack. We hope you’ll join us!
Photo by Claudio Schwarz on Unsplash